Posted Apr 13, 2026
As a Service Delivery Manager (SDM) with a solid Application Support background and deep exposure to Site Reliability Engineering (SRE) principles, your role will involve end-to-end service delivery of large-scale, mission-critical applications. You will be responsible for ensuring high availability, reliability, and performance through proactive monitoring, observability, incident management, and continuous improvement. Key Responsibilities:
Own end-to-end service delivery for multiple critical applications, ensuring high availability, stability, and performance across production environments. - Lead 24/7 application support operations (L2/L3), including on-call rotations, incident bridges, escalations, and stakeholder communications. - Act as the primary escalation point for major incidents, driving resolution until closure and RCA sign-off. - Ensure consistent adherence to SLAs, OLAs, and KPIs, with continuous tracking and reporting. Apply SRE principles to application support, including:
Definition and tracking of SLIs, SLOs, and Error Budgets
Improving reliability, scalability, and fault tolerance
Drive initiatives to reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through automation, improved alerting, and runbooks. - Partner with engineering teams to eliminate toil and improve operational efficiency. Lead the design and adoption of monitoring and observability platforms using tools such as:
Prometheus, Grafana, ELK, Dynatrace, OpenTelemetry (OTel)
Ensure end-to-end visibility across infrastructure, applications, and business transactions. - Implement proactive monitoring, intelligent alerting, and anomaly detection to prevent incidents before business impact. - Drive adoption of AIOps and automated incident management (auto-remediation, automated runbooks where applicable). Lead Major Incident Management (MIM) including war rooms, stakeholder updates, and executive reporting. - Ensure timely and high-quality Root Cause Analysis (RCA) with preventive action plans. - Govern problem management to identify recurring issues and drive long-term fixes. - Oversee change and release management, ensuring minimal risk to production systems. Act as a trusted partner for business stakeholders, engineering teams, vendors, and clients. - Communicate service health, risks, and improvement plans to senior leadership. - Manage third-party vendors and support partners, ensuring contract compliance and service quality. Drive operational excellence initiatives to improve uptime, performance, and customer satisfaction. - Lead transformation programs involving cloud migration, DevOps, and SRE adoption. - Identify automation opportunities to reduce manual effort and operational cost. - Support DR planning, testing, and compliance with RTO/RPO requirements. Required Experience & Skills:
15+ years of experience in Application Support / Production Operations/Platform Operations, with 5+ years in a Service Delivery Manager / SRE / Operations Leadership role. - Proven experience managing large application portfolios (50+ applications) in enterprise environments. - Strong background in banking, financial services, or large regulated enterprises preferred. In conclusion, your role as a Service Delivery Manager with a strong Application Support background and exposure to SRE principles will involve ensuring the high availability, reliability, and performance of critical applications through proactive monitoring, observability, incident management, and continuous improvement. You will also be responsible for applying SRE best practices, leading support operations, and driving initiatives to enhance system resilience and customer experience. As a Service Delivery Manager (SDM) with a solid Application Support background and deep exposure to Site Reliability Engineering (SRE) principles, your role will involve end-to-end service delivery of large-scale, mission-critical applications. You will be responsible for ensuring high availability, reliability, and performance through proactive monitoring, observability, incident management, and continuous improvement. Key Responsibilities:
Own end-to-end service delivery for multiple critical applications, ensuring high availability, stability, and performance across production environments. - Lead 24/7 application support operations (L2/L3), including on-call rotations, incident bridges, escalations, and stakeholder communications. - Act as the primary escalation point for major incidents, driving resolution until closure and RCA sign-off. - Ensure consistent adherence to SLAs, OLAs, and KPIs, with continuous tracking and reporting. Apply SRE principles to application support, including:
Definition and tracking of SLIs, SLOs, and Error Budgets
Improving reliability, scalability, and fault tolerance
Drive initiatives to reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through automation, improved alerting, and runbooks. - Partn
Don't want to apply yourself?
Our team writes your resume, applies for you, preps you for interviews, and negotiates your offer.
Browse Jobs
By Role
By City