Job Details
MLOps Engineer
MLOps Engineer at Kaspersky Lab. Location: Moscow. Salary is negotiable. Design of AI system architecture, implementation of GPU scheduler, support for ML pipelines, CI/CD for models, monitoring of production models, LLM deployment.
• Design of AI system architecture (from prototype to production); • Implementation of GPU scheduler (Kueue, Volcano, or similar) for load sharing on the same hardware; • Design and support of ML pipelines (training, validation, model deployment); • CI/CD for models: versioning of data, models, experiments; • Monitoring of production models (drift detection, performance tracking); • Deployment and optimization of LLM / inference servers (vLLM, TGI, Triton); • Containerization and orchestration of services (Docker, K8s); • CI/CD (GitLab CI, Jenkins); • IaC (Terraform, Ansible); • Monitoring and observability (Prometheus, Grafana); • Automation of routine operations; • Ensuring information security requirements for infrastructure; • Maintaining technical documentation for assigned resources.
• Background in ML/DS — understanding of training, inference, and data handling processes; • 2+ years of experience in MLOps / DevOps with ML specifics (would be a huge plus); • Docker, Kubernetes (Helm, cluster management) — production experience; • Python — confident proficiency; • CI/CD (GitLab CI, Jenkins, GitOps methodology); • Deep Linux knowledge; • Terraform / Ansible for IaC; • Experience building or managing GPU clusters (NVIDIA, CUDA, nvidia-container-toolkit); • Experience with GPU schedulers (Kueue, Volcano, Run:ai); • Experience with MLflow, Kubeflow, Airflow, or similar; • Higher technical education.
Don't miss a single job
Subscribe to our Telegram channel