SRE EngineerLead I - DevOps Engineering

Posted May 6, 2026

As a Site Reliability Engineer (SRE), your role involves combining software engineering and systems engineering to build, operate, and support large-scale, distributed, fault-tolerant systems. Your focus will be on ensuring high availability, performance, security, and reliability across cloud-native and hybrid environments through automation, observability, and operational excellence. Key Responsibilities:

Manage system uptime and reliability across cloud-native (AWS, GCP) and hybrid architectures
Design and implement Infrastructure as Code (IaC) solutions meeting security and engineering standards using tools like Terraform, cloud CLIs, and cloud SDKs
Build and maintain CI/CD pipelines for application and infrastructure deployment using tools such as Jenkins and cloud-native toolchains
Develop automated tooling for deploying production changes and managing service requests effectively
Create and maintain comprehensive runbooks for detecting, remediating, and restoring services
Troubleshoot and triage complex issues in distributed systems, including participation in on-call rotations for high-severity incidents
Continuously improve runbooks and operational processes to reduce Mean Time to Recovery (MTTR)
Lead blameless postmortems for availability incidents and own remediation actions to prevent recurrence

Key Skills to Develop:

DevSecOps
Operational Excellence
Systems Thinking

SRE EngineerLead I - DevOps Engineering

More jobs like this

Site Reliability Engineer/Lead

Explore more

More jobs like this

Site Reliability Engineer/Lead