MLOps Engineer (SRE)

Metova
Full-time
On-site

A leading company in Mexico specializing in accounting software is looking for a highly skilled MLOps Engineer (SRE) to join the team.

REQUIREMENTS:

  • 4+ years of experience as an SRE, DevOps, or Platform Engineer with ML projects.
  • Fluent technical English.
  • Experience with orchestrators such as Airflow, Kubeflow, or experiment tracking tools (MLflow, Weights & Biases).
  • Experience in high-transaction environments such as banking, accounting, payroll, or logistics. (Nice to Have).

KNOWLEDGE AND SKILLS:

  • Knowledge of model monitoring frameworks such as Evidently, Arize AI, WhyLabs, or similar.
  • Proficiency in Prometheus, Grafana, ELK/EFK, OpenTelemetry, or Datadog.
  • Proficiency in Kubernetes, Docker, Helm, and infrastructure automation tools (Terraform, Pulumi). 
  • Solid fundamentals in CI/CD for ML pipelines (testing, validation, rollback).

RESPONSABILITIES:

  • Design and operate observability solutions for ML models in production (monitoring, alerts, traceability). 
  • Develop dashboards and metrics to evaluate model performance, cost, and stability. 
  • Implement structured logging, drift monitoring, data quality, and inference error tools. 
  • Collaborate with data science and product teams to detect and mitigate incidents related to models in production. 
  •  Apply SRE practices such as chaos engineering, stress testing, staging testing, and continuous integration.