AI Engineer/ Data Scientist

Qode
Full-time
On-site

About the Role

We are building next-generation AI assistants that combine real-time responsiveness, multilingual multimodal understanding, and deep personalization. As an LLM Engineer - AI Assistant & RAG Systems, you’ll be leading initiatives that bridge Retrieval-Augmented Generation (RAG), vector search, latency optimization, and long-term memory for highly scalable consumer-facing applications. You'll work on architecting intelligent, efficient, and privacy-aware voice and text-based assistants.
We're proud to share that Lenskart is now our strategic investor, backing our vision to make conscious technology accessible at scale. If you're someone who thrives at the intersection of research and product, we want you on our team.


Minimum Work Experience Required

5+ years of experience in ML/NLP roles with strong hands-on expertise in building large-scale AI/LLM systems for production.


Top 3 Daily Tasks

Design and optimize LLM-powered assistant systems including RAG, vector databases, rerankers, and latency-aware inference pipelines.
Build feedback loops and observability layers to evaluate and improve assistant quality in production.
Collaborate with product, mobile, and infra teams to enable seamless multilingual + multimodal assistant experiences minimal latency.


Top 5 Skills You Should Possess

Proven experience working with LLMs, RAG pipelines, and vector search systems (e.g., FAISS, Qdrant, Milvus).
Deep understanding of latency optimization, streaming token responses, and caching strategies in LLM deployment.
Experience with retriever-reranker tuning, LLM evaluation metrics, prompt engineering, and hallucination mitigation techniques.
Strong foundation in Python, PyTorch/TensorFlow, FastAPI, and orchestration tools like Airflow, Docker, and Kubernetes.
Ability to design memory modules using long-term embeddings, user vectors, and strategies like memory decay and context truncation for scalable personalization.


Cross-Functional Collaboration Excellence

Work closely with front-end, infra, and product teams to deliver cohesive assistant interactions.
Collaborate with UX teams to define feedback capture, user adaptation mechanisms, and privacy-aware memory usage.
Interface with Data and MLOps teams for scalable training, evaluation, and deployment pipelines.


Bonus Points For

Experience in Agentic systems, autonomous workflows, or fine-tuning LLMs with LORA/QLORA.
Publications or writing in the domain of LLMs, GenAI, or retrieval architectures.
Contributions to open-source projects in RAG/LLM/prompt engineering or published tools for LLM deployment.
Exposure to building voice interfaces or multimodal input pipelines using tools like Whisper or CLIP.


What You’ll Be Creating

A real-time, multimodal and multilingual assistant that adapts to user preferences and evolves with usage.
Low-latency, scalable backend for LLM-powered interactions under minimal latency SLA.
A robust feedback and retraining loop enabling continuous improvement of LLM outputs.
Privacy-aware long-term memory system with vectorized personalization and memory decay.