Senior Machine Learning Engineer (GCP)

Tiger Analytics
Full-time
On-site

Tiger Analytics is looking for a skilled and innovative Machine Learning Engineer with hands-on experience in Google Cloud Platform (GCP) and Vertex AI to design, build, and deploy scalable ML solutions. You will play a key role in operationalizing machine learning models and driving the end-to-end ML lifecycle, from data ingestion to model serving and monitoring.

Key Responsibilities:

  • Develop, train, and optimize ML models using Vertex AI, including Vertex Pipelines, AutoML, and custom model training.
  • Design and build scalable ML pipelines for feature engineering, training, evaluation, and deployment.
  • Deploy models to production using Vertex AI endpoints and integrate with downstream applications or APIs.
  • Collaborate with data scientists, data engineers, and MLOps teams to enable reproducible and reliable ML workflows.
  • Monitor model performance and set up alerting, retraining triggers, and drift detection mechanisms.
  • Utilize GCP services such as BigQuery, Dataflow, Cloud Functions, Pub/Sub, and GCS in ML workflows.
  • Apply CI/CD principles to ML models using Vertex AI Pipelines, Cloud Build, and GitOps practices.
  • Implement model governance, versioning, explainability, and security best practices within Vertex AI.
  • Document architecture decisions, workflows, and model lifecycle clearly for internal stakeholders.

Requirements

1. Advanced Generative AI
    - Advanced RAG including Graph based hybrid retrieval
    - Multimodal agent

  • Deep knowledge on ADK , Langchain Agentic Frameworks
  • Fine tuning and Distillation 

2. Python Expertise
    - Expert in Python with strong OOP and functional programming skills
    - Proficient in ML/DL libraries: TensorFlow, PyTorch, scikit-learn, pandas, NumPy, PySpark
    - Experience with production-grade code, testing, and performance optimization
 
3. GCP Cloud Architecture & Services
    - Proficiency in GCP services such as:
      - Vertex AI
      - BigQuery
      - Cloud Storage
      - Cloud Run
      - Cloud Functions
      - Pub/Sub
      - Dataproc
      - Dataflow
    - Understanding of IAM, VPC

6. API Development & Integration
    - Designs and builds RESTful APIs using FastAPI or Flask
    - Integrates ML models into APIs for real-time inference
    - Implements authentication, logging, and performance optimization
 
7. System Design & Scalability
    - Designs end-to-end AI systems with scalability and fault tolerance in mind
    - Hands-on experience in developing distributed systems, microservices, and asynchronous processing

Benefits

This position offers an excellent opportunity for significant career development in a fast-growing and challenging entrepreneurial environment with a high degree of individual responsibility.