A leading company in Mexico specializing in accounting software is looking for a highly skilled MLOps Engineer (SRE) to join the team.
REQUIREMENTS:
- 4+ years of experience as an SRE, DevOps, or Platform Engineer with ML projects.
- Fluent technical English.
- Experience with orchestrators such as Airflow, Kubeflow, or experiment tracking tools (MLflow, Weights & Biases).
- Experience in high-transaction environments such as banking, accounting, payroll, or logistics. (Nice to Have).
KNOWLEDGE AND SKILLS:
- Knowledge of model monitoring frameworks such as Evidently, Arize AI, WhyLabs, or similar.
- Proficiency in Prometheus, Grafana, ELK/EFK, OpenTelemetry, or Datadog.
- Proficiency in Kubernetes, Docker, Helm, and infrastructure automation tools (Terraform, Pulumi).
- Solid fundamentals in CI/CD for ML pipelines (testing, validation, rollback).
RESPONSABILITIES:
- Design and operate observability solutions for ML models in production (monitoring, alerts, traceability).
- Develop dashboards and metrics to evaluate model performance, cost, and stability.
- Implement structured logging, drift monitoring, data quality, and inference error tools.
- Collaborate with data science and product teams to detect and mitigate incidents related to models in production.
- Apply SRE practices such as chaos engineering, stress testing, staging testing, and continuous integration.