Senior Platform Engineer

anaqua

Hyderabad NM Years Exp Posted 10h ago

Job Description

  • Own the GCP infrastructure with the team — GKE clusters, multi-region setup, global load balancing, autoscaling, VPC networking, DNS, firewall rules and IAM.
  • Build and maintain GitLab CI/CD pipelines and shared CI templates that every service team consumes — build, scan, deploy, promote across Dev / QA / Staging / Pre-Prod / Production.
  • Help shape the company-wide standards for how services get deployed, secured, monitored and rolled back.
  • Operate and harden the cluster — node pool upgrades, namespace / RBAC / resource-quota design, rolling updates, health probes, base images and supply-chain security.
  • Run the platform security stack — gateway policies, API-key and JWT issuance, secret rotation, OWASP and dependency scanning, workload identity, IAM least-privilege.
  • Own observability and incident response on GCP — structured logging, metrics, dashboards, SLIs / SLOs / error budgets, alerting, post-mortems and on-call runbooks.
  • Build internal developer tooling — CLIs, self-service workflows and golden-path automation that make the next service easy to ship.

What you will need to be successful:

  • Strong production ownership on GCP — operating real workloads, not just standing up demos. GCP is the cloud we run on.
  • Kubernetes in production (GKE) — deployments, Helm, namespaces, RBAC, resource quotas, rolling updates, health and readiness probes, multi-region setups and rollbacks.
  • Terraform as a daily tool — modular, reusable modules with remote state, drift detection and clean management of IAM, networking, Pub/Sub, Cloud SQL and secrets.
  • CI/CD pipeline depth — GitLab CI (or equivalent) at scale; reusable templates, fast feedback loops, security and dependency scans as pipeline stages, deploy promotion across Dev / QA / Staging / Pre-Prod / Production.
  • Git workflow fluency — GitFlow or trunk-based branching, tagging and release strategies that fit a multi-service org.
  • Cloud networking depth — VPC design, load balancing (global and regional), DNS, firewall rules and network security groups.
  • Hosting and application security ownership — gateway and edge policies, secret rotation, OWASP and dependency scanning, workload identity, IAM least-privilege hygiene.
  • Production observability and reliability on GCP — structured logs, metrics, dashboards, alerting, SLIs / SLOs / error budgets, on-call rotations, post-mortems.
  • Performance work — load testing, capacity planning and operational tuning of services under real traffic.
  • Operational PostgreSQL — migrations under load, backups, restores, replication basics, query plans and indexing.
  • Asynchronous messaging on GCP Pub/Sub — topology, subscriptions, dead-letter handling and operational tuning. Pub/Sub is our primary message bus.
  • Scripting and automation — Bash plus one of Python or Go for internal tooling and platform automation.
    • Excellent written and spoken English; comfortable working across time zones with engineers in EU and the US.

Similar Openings for You