This role is for one of the Weekday's clients
Location: Chennai
JobType: full-time
Requirements
What You’ll Do
- Design, develop, and enhance core features and capabilities of a high-performance observability platform using modern, scalable technologies.
- Serve as a technical expert while mentoring and guiding a team of engineers to deliver high-quality, maintainable code.
- Partner with cross-functional teams—including product, design, and operations—to architect reliable and efficient backend systems.
- Stay ahead of industry trends and experiment with emerging tools and frameworks to continuously improve system performance and reliability.
- Translate business requirements into technical solutions, championing engineering best practices across the organization.
- Monitor system health, identify performance bottlenecks, and implement optimizations to ensure stability and scalability.
- Foster a culture of innovation, continuous learning, and technical excellence within the engineering team.
What Makes You a Great Fit
- 3+ years of experience in backend, DevOps, or platform engineering roles.
- Proficiency in one or more of the following languages: Golang, Java, Python, or Rust.
- Strong understanding of data structures, algorithms, and system design principles.
- Experience working with distributed systems, microservices, and messaging queues (e.g., Kafka, RabbitMQ).
- Practical knowledge of cloud platforms such as AWS, GCP, or Azure.
- Solid grasp of Linux internals, networking, and container technologies like Docker and Kubernetes.
- Familiarity with CI/CD pipelines using tools like GitHub Actions, Jenkins, or ArgoCD.
- Experience with monitoring and observability tools such as Prometheus, Grafana, or Datadog.
- Knowledge of Infrastructure as Code tools (e.g., Terraform, Pulumi).
- Strong debugging skills—capable of diagnosing issues using logs, metrics, and traces.
- Comfortable with Git workflows and collaborative development environments.
Nice to Have
- Experience with service mesh technologies (e.g., Istio, Linkerd), API gateways, and load balancing strategies.
- Understanding of Site Reliability Engineering (SRE) principles, including incident response and on-call best practices.
- Familiarity with performance profiling tools (e.g., pprof, perf, flame graphs).
- Exposure to secrets management tools like Vault or AWS Secrets Manager.
Why You’ll Love Working Here
- Real ownership: You’ll have a seat at the table and the ability to drive impactful changes.
- Work with smart, passionate people: Join a high-performing team where innovation is celebrated and learning is constant.
- Celebrate wins together: From product milestones to fun team activities, success is a shared experience.
- Continuous growth: Invest in your development with challenging work, learning budgets, and career advancement opportunities.
- Flexible culture: While the team collaborates in-office five days a week, there's room for flexibility in how and when you get your best work done.
Skills Required
Distributed Systems, Golang, Java, Python, Rust, PostgreSQL, Kafka, Redis, Docker, Kubernetes, AWS, CI/CD, Prometheus, Grafana, System Design, Data Structures and Algorithms