This is a U.S. based position. All of the programs we support requireU.S. citizenship to be eligible for employment. All work must be conducted within the continental U.S.
Who we are:
Raft (https://TeamRaft.com) is a customer-obsessed non-traditional defense tech company dedicated to empowering U.S. military and government agencies with cutting-edge AI/ML and data solutions. We are a leader in autonomous data fusion and Agentic AI, with a purposeful focus on Distributed Data Systems, Platforms at Scale, and Complex Application Development. With headquarters in McLean, VA, our range of clients includes innovative federal and public agencies leveraging design thinking, cutting-edge tech stack, and cloud-native ecosystem. We build digital solutions that impact the lives of millions of Americans. Our flagship AI platform, [R]AIMS (Raft AI Mission System), enables operators and engineers to rapidly build, deploy, and govern AI-powered mission workflows across highly dynamic operational environments. We are looking for an experienced Lead AI/ML Software Engineer to help shape the next phase of [R]AIMS: a technical builder-leader with deep experience designing and scaling complex production systems, someone who can make hard architecture decisions, simplify complexity, lead major engineering efforts, and raise the technical bar across the platform.
About the role:
As Lead AI/ML Software Engineer for [R]AIMS, you will serve as a senior technical leader responsible for evolving the architecture, execution, and engineering rigor of Raft’s AI Mission System. You will be hands-on in the codebase, leading by doing, while also setting technical direction and raising the quality of engineering across the team. You will partner closely with platform leadership, product, and delivery teams to drive architectural decisions, lead major technical epics from conception through delivery, and establish the engineering patterns that the platform will grow on. You will operate at the intersection of distributed systems, AI/ML platform engineering, Kubernetes-native infrastructure, and data-intensive application development, balancing rapid mission delivery with long-term platform integrity. This role requires someone equally comfortable writing production systems with complex multi-vendor integrations, debugging difficult distributed systems issues, leading design reviews, and making pragmatic tradeoff decisions under ambiguity. What you'll do:
Drive architectural decisions across the [R]AIMS platform, evaluating tradeoffs across performance, scalability, security, and maintainability and building alignment across engineering and product stakeholders
Lead major technical epics from conception through delivery, decomposing ambiguous problems into executable plans and keeping cross-functional teams moving with clarity and momentum
Simplify and rationalize distributed system architecture as the platform scales, reducing incidental complexity and improving operational reliability without sacrificing capability
Optimize platform performance across both edge and cloud deployment targets, identifying and resolving bottlenecks in data-intensive, latency-sensitive operational environments
Establish strong engineering foundations and reusable technical patterns that improve developer productivity and code quality across the team
Mentor engineers at multiple levels, conducting design reviews, providing substantive code feedback, and actively elevating technical execution across the platform
Partner with AI/ML engineers on model integration, inference optimization, and the operational deployment of agentic workflows within [R]AIMS
Engage directly with customers and program stakeholders at operationally demanding environments across the Department of Defense, representing Raft’s technical capabilities with credibility and clarity
What we are looking for:
6+ years of hands-on experience building and shipping production software systems across the full stack (frontend, backend, infrastructure, and ML)
Deep software engineering fundamentals with demonstrated ability to design, build, and evolve complex systems that perform reliably at scale
Exceptional technical communication skills; able to lead through influence across engineering, product, and leadership stakeholders without requiring direct authority
Proven experience designing and evolving distributed systems, including service decomposition, inter-service communication patterns, fault tolerance, and observability
Strong hands-on experience with Kubernetes and cloud-native platform architecture in production environments
Experience building data-intensive or AI-enabled production systems with real operational users and real performance constraints
Demonstrated technical leadership over large, cross-functional engineering initiatives with clear ownership and accountability for outcomes
Strong system design and architecture decision-making ability, with a track record of making the right call under incomplete information
Some experience or exposure to training, fine-tuning, or deploying machine learning models in production contexts
Ability to obtain Security+ certification within the first 90 days of employment
S. citizenship required; ability to obtain and maintain a Top Secret/SCI clearance
Highly preferred:
Experience building AI/ML infrastructure or agentic systems, including orchestration frameworks, tool-use patterns, and LLM integration in production
Experience with streaming and event-driven architectures, particularly Kafka, Kafka Streams, or Apache Flink
Experience with platform engineering and internal developer tooling, including golden-path frameworks, shared libraries, and developer experience improvements
Experience with real-time inference or operational AI systems in latency-sensitive environments
Experience building secure, compliant systems for regulated or mission-critical environments, including familiarity with IL4/IL5/IL6 requirements or RMF processes
Prior work in defense, national security, or classified program environments