Our US based client is looking for a mission-driven Site Reliability Engineer to support and scale the infrastructure powering their secure, mission-critical SaaS platform.
You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to respond to incidents quickly, support ongoing automation, and scale systems reliably.
Responsibilities
- Be part of the team that owns the uptime and performance of our core backend infrastructure (Windows + Linux)
- Maintain and enhance observability across systems using Kibana, CloudWatch, and custom telemetry
- Manage CI/CD pipelines, infrastructure as code (Terraform, Ansible), and deployment automation
- Support and maintain production Windows environments:
- .NET Framework/Core apps running in IIS
- SQL Server with AlwaysOn replication and Service Broker-based messaging
- Support and operate cloud-native services:
- AWS Lambdas, DynamoDB, Postgres/Aurora, Redshift, Redis, and containerized workloads in Docker
- Participate in on-call rotation and incident response
- Collaborate closely with engineering teams to improve system reliability and deployment workflows
Requirements
- 5+ years of SRE, DevOps, or WebOps experience supporting production SaaS systems
- Strong experience with Windows Server, IIS, and .NET applications in production
- Hands-on experience with SQL Server administration, including AlwaysOn and Service Broker
- Proficiency in AWS operations, including Lambda, DynamoDB, CloudWatch, and IAM
- Familiarity with Postgres, Redis, Kibana/ElasticSearch, and centralized logging
- Experience with Docker, Terraform, and Ansible for infrastructure management
- Strong scripting skills (PowerShell, Python)
- Experience running and debugging containerized and distributed systems in production
- Excellent incident response and debugging skills
Benefits
Salary: $6,000 USD/month + Holidays
Unlimited PTO