Browse Jobs
By Role
By City
Posted May 20, 2026
Lightning AI is seeking an experienced Infrastructure Operations Engineers to help scale and operate our next-generation AI infrastructure platform. Our InfraOps team sits at the center of reliability, automation, and operational scale for GPU infrastructure. This team owns break/fix operations, incident response, customer provisioning, observability, and the automation systems that keep complex infrastructure running efficiently. In this role, you’ll work hands-on with large-scale GPU environments, Linux systems, bare metal infrastructure, provisioning workflows, and platform reliability. You’ll partner closely with Infrastructure Engineering, Network Operations, and Software Platform teams to troubleshoot issues, improve operational efficiency, and build automation that reduces manual toil over time.
8+ years working with Linux as a server / hosting platform, extra points for Ubuntu experience. - 5+ years experience with AWS. - 2+ years experience with Kubernetes and strong container fundamentals. - 2+ years experience with Terraform and Ansible
2+ years with network attached storage management (via NFS, ceph, or other protocols). Extra points for experience with VAST storage systems. - Experience with monitoring systems (Prometheus, ELK stack). - Familiarity with the gitops workflow. - Software development experience using Python, Go, bash, or other languages for the purposes of automation & connecting systems & APIs together. - Deep networking fundamentals, extra points for experience with datacenter level networks, 400Gb ethernet, and Infiniband. - Experience building and delivering complex systems. - Effective at navigating tradeoffs between design, risk, cost, and outcomes. - Comfortable with navigating ambiguity. - Strong written and oral communication. ### Nice-to-Haves
Experience with bare metal hardware troubleshooting and provisioning, extra points for working with Dell hardware. - Experience with GPU servers, both in bare metal form or under virtualization. - Deep experience with network switches, routers, and firewalls, particularly SONiC switches, Palo Alto firewalls and Juniper Networks as vendors. - Experience with VAST storage systems
We are committed to offering competitive compensation that reflects the value each team member brings to our mission. Final offers are based on factors such as experience, skills, geographic location, and role expectations. In addition to base salary, our total rewards package for eligible roles includes a discretionary bonus, a meaningful equity component, and comprehensive benefits. The anticipated annual base salary range for this role is:
$160,000—$200,000 USD
We offer a comprehensive and competitive benefits package designed to support our employees’ health, well-being, and long-term success. Benefits may vary by location, team, and role. Benefits include:
At Lightning AI, we are committed to fostering an inclusive and diverse workplace. We believe that diverse teams drive innovation and create better products. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic. We are dedicated to building a culture where everyone can thrive and contribute to their fullest potential.
Don't want to apply yourself?
Our team writes your resume, applies for you, preps you for interviews, and negotiates your offer.