Work Experience
DevOps Engineer (2023 – Present)
Architected Hub-and-Spoke centralised observability platform aggregating telemetry from
distributed Kubernetes workload clusters using Grafana, Loki, Thanos, and Tempo.
Engineered production CI/CD pipelines with GitHub Actions, embedding Trivy, CodeQL, KICS,
and TruffleHog security scanning. Maintained Helm charts and Terraform/Terragrunt code
across five isolated environments. Designed on-premises Kubernetes multi-cluster
architecture using Kubeadm with Calico CNI.
Junior DevOps Engineer (2021 – 2023)
Owned end-to-end AWS cloud infrastructure for a production AI conversational chatbot.
Managed MongoDB Atlas, MySQL RDS, Redis, and ElastiCache clusters. Built Jenkins CI/CD
pipelines with auto-scaling groups and ALBs. Executed complex database migrations to
AWS Aurora and Mongo Atlas. Developed AWS Lambda functions for off-hours environment
scaledown, reducing monthly cloud costs significantly.
Blog — AI-Driven Infrastructure: The Shift from Ops to AI-Ops
For most of the past decade, infrastructure operations ran on a simple contract: humans
defined the desired state, automation enforced it, and alert managers woke someone up when
things went wrong. That model worked well at modest scale. It breaks catastrophically at
cloud-native scale, where a single distributed application can generate tens of thousands of
metrics, hundreds of log streams, and dozens of trace spans every second.
The core problem is signal-to-noise ratio. A production Kubernetes cluster running 50
microservices across three availability zones might fire 300 alerts on a busy Tuesday. A
human on-call engineer cannot meaningfully triage 300 simultaneous alerts. Alert fatigue
combined with manual correlation across disconnected dashboards is consistently identified
as a root cause in production post-mortems. This is not an operations problem — it is a
data problem. AIOps addresses this gap through three capabilities: ML-driven anomaly
detection that replaces static thresholds with learned baselines, alert correlation that
reduces an alert storm of 40 notifications to a single root cause event, and predictive
remediation that identifies failure patterns before they manifest. Teams that adopt AIOps
consistently report MTTD reductions of 50-80% and MTTR reductions of 30-60%.
Blog — Maximizing Cloud Efficiency with Predictive Scaling
Reactive auto-scaling operates in the past. By the time your monitoring system detects
that CPU has breached 75%, users are already experiencing degraded performance. By the
time a new EC2 instance launches, passes health checks, and registers with the load
balancer, two to five minutes have elapsed. For high-traffic workloads, those minutes
represent thousands of failed requests. The standard workaround — keeping 30% spare
capacity at all times — is expensive and only partially effective against true demand spikes.
Predictive scaling inverts the model. AWS Predictive Scaling uses ML models trained on
your CloudWatch metrics history to generate capacity forecasts up to 48 hours ahead,
identifying diurnal and weekly traffic patterns and pre-provisioning capacity before peaks
arrive. On Kubernetes, KEDA combined with cron-based scaling triggers and custom
Prometheus metric forecasting achieves the same effect. Organizations implementing
predictive alongside reactive scaling typically reduce baseline instance counts by 15-25%
and eliminate error spikes during planned demand peaks entirely.
Blog — Why Firebase is the Secret Weapon for AI Landing Pages
Firebase Hosting sits on Google's global CDN infrastructure, distributing static assets
across Points of Presence worldwide. For React SPAs, this means LCP times consistently
below 1.5 seconds and TTFB under 50ms — metrics that directly influence both user
experience and Google Search rankings. Every deployment is atomic and creates an
immutable versioned snapshot. Rollback takes under 30 seconds. Preview channels provide
shareable staging URLs per pull request without impacting production. SSL certificates
are provisioned and renewed automatically. For portfolios and landing pages serving under
10 GB monthly transfer, Firebase Hosting costs nothing, while eliminating all the
operational overhead of VPS management, certbot, and manual CDN configuration.
Blog — Building a Hub-and-Spoke Observability Platform with Thanos
Running six Kubernetes clusters — production, staging, UAT, QA, development, and a
management plane — without centralised visibility means engineers tab between six separate
Grafana instances, correlating incidents manually. The Hub-and-Spoke pattern solves this:
a dedicated management cluster (the Hub) hosts Thanos Query, Loki, Tempo, Alertmanager,
and Grafana. Spoke clusters run lightweight agents — Prometheus Agent or Sidecar, Fluent
Bit, and an OpenTelemetry Collector — that push telemetry to the Hub. The total agent
overhead per spoke is under 500m CPU and 2Gi memory. Engineers interact exclusively with
the Hub, gaining cross-cluster dashboards, unified alert routing, and the ability to
correlate cascading failures across environments from a single pane of glass.
Blog — GitOps with ArgoCD: A Production Deployment Playbook
GitOps defines the desired state of infrastructure and applications declaratively in Git,
with an automated system continuously reconciling live cluster state to that definition.
ArgoCD watches one or more Git repositories and syncs the cluster state to match the
manifests. Separating application source code from deployment configuration repositories
enforces clean boundaries between developer and platform engineer concerns. ApplicationSet
controllers eliminate per-environment Application resource definitions. Sync waves manage
deployment ordering: database migrations in wave -1 complete before application servers
in wave 0 receive traffic. The argocd-vault-plugin integrates secret injection from
HashiCorp Vault, AWS Secrets Manager, or GCP Secret Manager without storing secret
values in Git. Rollback is a git revert: instantaneous and fully auditable.
Blog — Kubernetes Secrets Management with HashiCorp Vault
Kubernetes Secrets are base64-encoded, not encrypted. Anyone with namespace read access
can retrieve and decode them. Without encryption at rest configured on etcd, secrets are
stored in plaintext — and etcd backups frequently contain secret data. HashiCorp Vault
addresses these gaps with fine-grained access control policies, a complete audit log of
every secret access event, automatic secret rotation, and dynamic secrets with
configurable TTLs. The Kubernetes auth method allows Pods to authenticate using their
automatically-mounted Service Account token, eliminating the bootstrap secret problem.
The Vault Agent Injector injects secrets as files into Pod volumes without application
code changes. The database secrets engine generates unique, time-limited credentials per
requester, making credential exposure time-bounded and fully attributable.