Senior DevOps Engineer

Remote / Full-time

We are looking for a Senior DevOps Engineer who knows how to run production systems properly.

Not someone who only writes Dockerfiles and sets up one pipeline and disappears. We need someone who understands infrastructure, reliability, deployments, and what actually happens when systems go down at 2am.

If you have never owned production infrastructure, worked with Kubernetes seriously, or supported ML workloads in any capacity, this role is not for you.

Core Responsibilities

Build and maintain CI/CD pipelines across services and environments
Design, deploy, and operate containerized services using Docker and Kubernetes
Manage cloud infrastructure and production deployments across environments
Improve system reliability, uptime, and recovery mechanisms
Implement logging, monitoring, metrics, tracing, and alerting across services
Manage Kubernetes clusters including scaling, networking, and resource optimization
Handle environment management, secrets management, and access control properly
Support ML model deployments, versioning, and rollout strategies
Work with AI engineers to operationalize model serving and evaluation workflows
Optimize infrastructure for cost efficiency without sacrificing stability
Debug production incidents across infrastructure, networking, and application layers
Gradually introduce better automation and internal tooling as systems mature
Think ahead about how infrastructure can evolve into a more structured internal platform over time

Required Technical Skills

5+ years of experience in DevOps or Infrastructure roles
Strong hands-on experience with Docker and Kubernetes in production
Strong experience with AWS and at least one other major cloud provider
Experience building and maintaining CI/CD pipelines for multi-service environments
Strong scripting or programming skills in Python, Bash, or similar
Experience setting up logging, monitoring, and observability tooling
Familiarity with infrastructure as code approaches
Experience managing production databases, networking, and security configurations
Basic to intermediate experience supporting ML workloads or model deployments
Comfort handling production incidents and conducting postmortems

Nice to Have

Experience running GPU workloads or AI-heavy services
Exposure to model monitoring, drift detection, or ML evaluation pipelines
Experience with distributed systems or microservice-heavy architectures
Startup or fast-moving product experience

What We Expect From a Senior Engineer

You take ownership of infrastructure and do not wait for outages to force action.

You think in terms of reliability, scalability, and failure modes.

You understand trade-offs between speed, cost, and operational stability.

You document and automate instead of building fragile one-off fixes.

You are comfortable working at the intersection of DevOps and ML Ops when required.

What You Will Get

Ownership of production infrastructure that powers real backend and AI systems
Real influence over Kubernetes, deployment strategy, and operational maturity
Work closely with backend and AI teams on systems that actually ship
A team that values pragmatic engineering over buzzwords

If you want a DevOps role where Docker and Kubernetes are core, ML workloads are part of the reality, and production responsibility is real, this is it.

How to Apply

Email [email protected] with your CV and briefly describe a Kubernetes-based production system you operated and one incident you had to resolve.

Back to all roles