Director, Site Reliability Engineering

Posted Apr 22, 2026

We’re a global team of over 400 people, working together to push the boundaries of open-source technology and multi-cloud solutions. Our vision is to help developers, builders, and creators bring their ideas to life with speed and simplicity, by providing a cloud data platform that makes open-source databases, search, streaming, and application infrastructure easily accessible to everyone. ### The Role:

We are seeking a Director of Site Reliability Engineering to lead a global organization responsible for the reliability and operational excellence of the Aiven platform globally. You will lead a high-performing SRE team, setting the vision and strategy to ensure resilient, scalable, and highly automated systems across our 24/7/365 operations. Your team will proactively manage platform health, lead incident response and cross-functional coordination, and drive continuous improvement in reliability and performance. As a senior leader, you will partner closely with engineering, product, and support teams worldwide, influence system architecture, and invest in tooling and automation to reduce toil and enhance production reliability. This role combines strategic leadership, customer centricity, and deep operational accountability, with a focus on delivering reliable services at global scale while developing strong technical leaders within your organization. ### What You'll Do:

Define and drive global SRE operating strategy in partnership with regional SRE leaders across EMEA, AMER and APAC, ensuring alignment on reliability goals, operating models, and execution across a 24/7/365 follow-the-sun organization. - Build and lead a multi-regional SRE organization through managers, developing leadership capability, mentoring team, and ensuring consistent performance, culture, and delivery across geographies. - Set the vision and roadmap for reliability engineering, enabling teams to deliver high-impact tools, automation, and process initiatives that improve platform resilience, scalability, and efficiency. - Own global incident management strategy and operating model, including on-call design, coverage, and escalation frameworks, ensuring seamless coordination and high availability across regions. - Establish a metrics-driven operating cadence, defining KPIs/SLIs/SLOs/Error Budget, driving data-informed prioritization, and embedding operational rigor and continuous improvement across the SRE organization. ###

Director, Site Reliability Engineering

More jobs like this