Cognizant logo

Site Reliability engineer Lead

Cognizant
Full-time
On-site
Victoria
Information Technology

Position Summary:

As a Site Reliability Engineer Lead, you will have the opportunity to lead a team of SRE engineers and manage the unique challenges of scaling our client's digitization program. Your expertise in coding, algorithms, complexity analysis, and large-scale system design will be crucial in providing scalable, reliable, durable, and secure applications for our customers and internal users. You will build highly reliable applications using a customer-first approach while innovating technically and understanding our customers' needs.

Mandatory Skills:

· Strong experience in Java, Spring Boot, Node.js, microservices, RDBMS, NoSQL

· Proficiency with AWS services such as EC2, S3, Lambda, IAM, ECS, EKS, SQS, Kinesis

· Observability using Splunk, NewRelic

· Infrastructure as Code using Terraform

· APIs and event-driven approaches

· Security patterns

· Unix/Linux systems administration, with familiarity in Docker

· Strong experience in analyzing and troubleshooting large-scale distributed systems

· Ability to debug and optimize code and automate routine tasks

· Familiarity with containerization and orchestration technologies such as Docker and Kubernetes

· Knowledge of modern software engineering practices and tools - Agile and DevOps

· Strong communication skills and the ability to explain complex technical matters in an easy-to-understand way

· Strong domain knowledge of telecom billing and charging rating systems

Duties and Responsibilities:

  • Within the Site Reliability Engineering team, you will be working with various development team, and other partners teams to ensure that applications reliability, efficiency, and performance meet our customer's needs, while keeping the service's operation's reliable, scalable, and automated.
  • Develop tools and automation to streamline operations and improve system reliability, efficiency, and performance.
  • Partner with development teams on feature launches to ensure our customers are delivered reliable and scalable functionality.
  • Build a deep knowledge on production infrastructure and use that to debug distributed systems problems and identify improvements to the system.
  • Operations, SLO, SLA management
  • Metrics reporting and progress tracking.
  • Manage infrastructure costs and optimize resource utilization.
  • Work with security teams to ensure compliance with security policies and procedures.
  • Participate in on-call rotations to provide 24/7 support for our systems.
  • Observability (Alarms, monitoring, synthetics).
  • Error management

Qualifications & Certifications (Optional):

· Bachelor’s degree in computer science or a related engineering degree

20+ years of IT industry experience

Salary Range: >100,000

Date of Posting: 25/September /2025

Next Steps: If you feel this opportunity suits you, or Cognizant is the type of organization you would like to join, we want to have a conversation with you! Please apply directly with us.

For a complete list of open opportunities with Cognizant, visit http://www.cognizant.com/careers. Cognizant is committed to providing Equal Employment Opportunities. Successful candidates will be required to undergo a background check.

#LI-CTSAPAC