As an Engineering Manager, you serve as the technical anchor for the platform engineering team. You create, own, and are responsible for the application architecture that best serves a product in its functional and non-functional needs. **Key Responsibilities:**
Define and own the cloud and platform architecture for large-scale containerized microservices and Agentic AI/LLM workloads, ensuring scalability, reliability, and cost efficiency. - Lead CI/CD platform engineering, enabling automated build, test, security scanning, and deployment for backend services, React-based web applications, and mobile app backends. - Enable production-grade AI platforms, supporting agent frameworks, vector databases, prompt pipelines, and inference. - Define Infrastructure as code standards, cloud account structures, networking, and environment provisioning across AWS and secondary clouds. - Implement and enforce SRE practices: define SLIs/SLOs, error budgets, capacity and reliability targets, and lead incident response and post-incident reviews. - Ensure end-to-end observability across services and AI workloads, including logs, metrics, traces, model performance, and cost visibility. - Embed security, compliance, and governance by design, including IAM, secrets management, network security, vulnerability management, and AI-specific controls. - Make informed build vs. buy decisions, evaluate emerging cloud and AI infrastructure technologies, and drive continuous platform modernization. **Qualifications Required:**
10+ years of experience in DevOps / Cloud / Platform Engineering, including people management and technical leadership. - Deep hands-on expertise with AWS, with working exposure to GCP and Azure in multi-cloud or hybrid environments. - Proven experience operating large-scale, production-grade containerized workloads, with a strong understanding of high availability, fault tolerance, and capacity planning in global teams. - Practical experience supporting AI/ML or LLM workloads in production environments. - Strong expertise in Kubernetes and Docker, including cluster operations, workload isolation, ingress, service meshes, and deployment strategies. - Advanced experience with 'Infrastructure as Code' for cloud provisioning, networking, security controls, and environment standardization across multiple stages. - Solid understanding of observability and reliability engineering, including metrics, logging, tracing, alerting, and defining SLIs/SLOs for distributed systems and AI services. - Hands-on exposure with cloud security and compliance practices, including IAM design, secrets management, vulnerability scanning, and secure deployment patternsespecially for AI platforms. - Knowledge of cloud cost optimization (FinOps), especially for AI workloads. - Background in strong product-based organizations solving real customer-facing problems. *Note: Additional details of the company are not included in the provided Job Description.* As an Engineering Manager, you serve as the technical anchor for the platform engineering team. You create, own, and are responsible for the application architecture that best serves a product in its functional and non-functional needs. **Key Responsibilities:**
Define and own the cloud and platform architecture for large-scale containerized microservices and Agentic AI/LLM workloads, ensuring scalability, reliability, and cost efficiency. - Lead CI/CD platform engineering, enabling automated build, test, security scanning, and deployment for backend services, React-based web applications, and mobile app backends. - Enable production-grade AI platforms, supporting agent frameworks, vector databases, prompt pipelines, and inference. - Define Infrastructure as code standards, cloud account structures, networking, and environment provisioning across AWS and secondary clouds. - Implement and enforce SRE practices: define SLIs/SLOs, error budgets, capacity and reliability targets, and lead incident response and post-incident reviews. - Ensure end-to-end observability across services and AI workloads, including logs, metrics, traces, model performance, and cost visibility. - Embed security, compliance, and governance by design, including IAM, secrets management, network security, vulnerability management, and AI-specific controls. - Make informed build vs. buy decisions, evaluate emerging cloud and AI infrastructure technologies, and drive continuous platform modernization. **Qualifications Required:**
10+ years of experience in DevOps / Cloud / Platform Engineering, including people management and technical leadership. - Deep hands-on expertise with AWS, with working exposure to GCP and Azure in multi-cloud or hybrid environments. - Proven experience operating large-scale, production-grade containerized workloads, with a strong understanding of high availability, fault tolerance, and capacity planning in global teams. - Practical experience supporting AI/ML or LLM workloads in product