Job Details

DevOps & Infrastructure
Senior
Full time
Apr 15

Senior DevOps Engineer (DWH/ML Platform)

Senior DevOps Engineer wanted to build a scalable data platform using AWS, Kubernetes, and IaC. Responsibilities include developing infrastructure for Trino, Spark, and ML models. Requires expert-level Kubernetes, IaC, GitLab CI, and AWS experience.

We are looking for a DevOps Engineer who not only "keeps the production running" but also builds a scalable data platform. You will develop the infrastructure on which Trino, Spark, and ML models run, using best practices of IaC and Kubernetes. Technology Stack Core Infra: AWS (EKS, VPC, IAM), Kubernetes, Terragrunt CI/CD: GitLab CI Compute & Query: Trino, Apache Spark Storage: S3 (Data Lake, Apache Iceberg), ClickHouse, ScyllaDB Orchestration: Apache Airflow (Kubernetes Executor) Observability: Prometheus, Grafana, ELK Deployment: Helm What you will do: Development of Kubernetes platform (EKS) Writing and supporting complex Helm charts for stateful applications (Trino, ClickHouse, Solr, ScyllaDB). Resource management, autoscaling (HPA/VPA, Cluster Autoscaler, Karpenter). Configuring network policies, Ingress, service mesh if necessary. Infrastructure as Code Complete infrastructure description via Terragrunt: EKS clusters, VPC, IAM, S3, RDS, etc. Support for the DRY principle, state management in AWS S3. Code structuring for multiple environments (dev/stage/prod). CI/CD pipelines Building code and data delivery processes through GitLab CI. Configuring GitLab Runners (including on Kubernetes), artifact caching. Automation of infrastructure, Helm charts, and Terraform module testing. Observability Configuring metrics collection via Prometheus (ServiceMonitors, PodMonitors, Exporters). Visualization in Grafana: dashboards for Data components and infrastructure. Configuring alerting for critical scenarios: replication lag, Spark/Airflow task queue, resource utilization. Centralized log collection. Data Ops Support for Trino and Spark clusters, ensuring their interaction with S3 (Iceberg) and databases (Solr, ScyllaDB). Configuring Airflow on Kubernetes Executor, assisting the DE team with infrastructure issues. Participation in incident response. ML Support Ensuring stable operation of ML services (Solr, ScyllaDB, Redis) in production. Supporting infrastructure for MLflow, Feast, inference services.

What we want to see: If you haven't worked with the tools below, unfortunately, it will be difficult for us: Kubernetes (Expert level): You don't just use ready-made manifests, but understand the internals of EKS, can write your own Helm charts from scratch, and debug complex problems (OOMKilled, Pending pods, PVC issues, networking). IaC: Ability to structure code for multiple environments (dev/stage/prod). GitLab CI: Deep understanding of .gitlab-ci.yml, experience configuring pipelines with complex logic. AWS: Understanding of network interaction and permissions management (IAM Policies/Roles, IRSA). Experience operating a Big Data stack: e.g., Trino (Presto), Spark, Airflow. Experience with columnar NoSQL (ScyllaDB/Cassandra). Experience with search engines (Solr or Elasticsearch). Understanding of MLOps processes and experience with ML infrastructure (MLflow, Feast, KServe). Experience with GitOps (ArgoCD, Flux). Experience with Apache Iceberg and Data Lake architecture.

ScyllaDB
AWS
Kubernetes
Grafana
IAM
KServe
Prometheus
Solr
Trino
Spark
Presto
Terragrunt
Cassandra
EKS
Helm
ClickHouse
MLflow
Airflow
MLOps
ArgoCD
Flux
ELK
GitOps
Iceberg
Feast
S3
Elasticsearch
Gitlab CI
VPC

Don't miss a single job

Subscribe to our Telegram channel

Subscribe

Similar jobs

Middle DevOps Engineer — AI/ML API

RUB 300,000

Middle DevOps Engineer for AI/ML API project at Boiler Lab in Saint Petersburg. Salary: 250-300k RUB. Highload service, top-1 in Google search.

Russia
B
Boiler Lab

Middle DevOps Engineer — AI/ML API

RUB 300,000

Middle DevOps Engineer for AI/ML API project in Saint Petersburg. Salary: 250-300k RUB. Highload service, AI model aggregator.

Russia
B
Boiler Lab

Devops Engineer (Middle/Senior)

Devops Engineer (Middle/Senior) at M Tech in Moscow, Russia. Full-time, remote options available. Requires Middle level skills and GitLab.

М
М Тех