Senior DevOps Engineer
We are looking for a Senior DevOps Engineer who knows how to run production systems properly.
Not someone who only writes Dockerfiles and sets up one pipeline and disappears. We need someone who understands infrastructure, reliability, deployments, and what actually happens when systems go down at 2am.
If you have never owned production infrastructure, worked with Kubernetes seriously, or supported ML workloads in any capacity, this role is not for you.
Core Responsibilities
- Build and maintain CI/CD pipelines across services and environments
- Design, deploy, and operate containerized services using Docker and Kubernetes
- Manage cloud infrastructure and production deployments across environments
- Improve system reliability, uptime, and recovery mechanisms
- Implement logging, monitoring, metrics, tracing, and alerting across services
- Manage Kubernetes clusters including scaling, networking, and resource optimization
- Handle environment management, secrets management, and access control properly
- Support ML model deployments, versioning, and rollout strategies
- Work with AI engineers to operationalize model serving and evaluation workflows
- Optimize infrastructure for cost efficiency without sacrificing stability
- Debug production incidents across infrastructure, networking, and application layers
- Gradually introduce better automation and internal tooling as systems mature
- Think ahead about how infrastructure can evolve into a more structured internal platform over time
Required Technical Skills
- 5+ years of experience in DevOps or Infrastructure roles
- Strong hands-on experience with Docker and Kubernetes in production
- Strong experience with AWS and at least one other major cloud provider
- Experience building and maintaining CI/CD pipelines for multi-service environments
- Strong scripting or programming skills in Python, Bash, or similar
- Experience setting up logging, monitoring, and observability tooling
- Familiarity with infrastructure as code approaches
- Experience managing production databases, networking, and security configurations
- Basic to intermediate experience supporting ML workloads or model deployments
- Comfort handling production incidents and conducting postmortems
Nice to Have
- Experience running GPU workloads or AI-heavy services
- Exposure to model monitoring, drift detection, or ML evaluation pipelines
- Experience with distributed systems or microservice-heavy architectures
- Startup or fast-moving product experience
What We Expect From a Senior Engineer
You take ownership of infrastructure and do not wait for outages to force action.
You think in terms of reliability, scalability, and failure modes.
You understand trade-offs between speed, cost, and operational stability.
You document and automate instead of building fragile one-off fixes.
You are comfortable working at the intersection of DevOps and ML Ops when required.
What You Will Get
- Ownership of production infrastructure that powers real backend and AI systems
- Real influence over Kubernetes, deployment strategy, and operational maturity
- Work closely with backend and AI teams on systems that actually ship
- A team that values pragmatic engineering over buzzwords
If you want a DevOps role where Docker and Kubernetes are core, ML workloads are part of the reality, and production responsibility is real, this is it.
How to Apply
Email [email protected] with your CV and briefly describe a Kubernetes-based production system you operated and one incident you had to resolve.