Cognizant logo

00065246571 - Site Reliability Engineering (SRE) Lead with Azure

Cognizant
Full-time
On-site
Ontario
Technology & Engineering

We are seeking a highly experienced SRE Lead to oversee the reliability, scalability, and performance of our cloud-based systems. This role will lead a team of engineers, drive incident management, and champion continuous improvement across our infrastructure and operations.

Location - Mississauga, Canada

Key Responsibilities:

  • Lead day-to-day operations monitoring and incident management, including driving resolution for P1/P2 incidents.
  • Implement and maintain observability and monitoring solutions.
  • Support internal and external users by monitoring performance, troubleshooting issues, and conducting root cause analysis across production, staging, and development environments.
  • Build and maintain operational and user statistics dashboards using tools like Grafana.
  • Drive continuous improvement initiatives to reduce incidents and alerts and automate manual processes.
  • Define and track business metrics as OKRs.
  • Lead testing efforts as part of the Handover to Support process.
  • Administer cloud environments including Azure, WCNP, and Edge.
  • Collaborate with cross-functional teams to understand environments and plan for system stability.
  • Troubleshoot and debug software issues, ensuring timely resolution.
  • Promote and implement SRE best practices across the team.

Required Skills:

  • Strong background in SRE principles and practices.
  • Proficiency in Java, MVC Pattern, JDBC, RESTful APIs, and Spring Boot.
  • Experience with Azure Cloud and scripting in Python.
  • Knowledge of observability tools.
  • Hands-on experience with tools like ServiceNow, Slack/Teams, Xmatters, and Grafana.

Benefits:

Cognizant offers the following benefits for this position, subject to applicable eligibility requirements:

Medical/Dental/Vision/Life Insurance

• Paid holidays plus Paid Time Off

• 401(k) plan and contributions

• Long-term/Short-term Disability

• Paid Parental Leave

• Employee Stock Purchase Plan

Disclaimer: The salary, other compensation, and benefits information is accurate as of the date of this posting. Cognizant reserves the right to modify this information at any time, subject to applicable law.

Certifications Required

  • Required Skills: SRE Knowledge Java Azure Cloud Python
  • Web application frameworks (preferably Spring Boot)
  • Automated testing platforms and unit tests
  • Tools: Service Now Slack/Teams XMatters