Posted Mar 3, 2026
We are looking for GenAI Software Engineers of all levels who are passionate about making a positive impact. You’ll collaborate closely with a cross-functional team of researchers, clinicians, and engineers to translate cutting-edge language model capabilities into dependable, real-world clinical systems. Your focus will be on designing advanced LLM-driven workflows that can reason through complex clinical contexts, leverage agentic capabilities and structured tool use, navigate branching chains of LLM calls, integrate seamlessly with retrieval systems, and consistently generate outputs that meet the highest standards of clinical reliability and trust. A major part of this role will involve developing and applying rigorous evaluation frameworks (both automated and human-in-the-loop) to continuously assess accuracy, robustness, multilingual capabilities, and more. This is an opportunity to design experiments to probe failure modes, simulate edge cases, and stress-test LLM workflows under realistic load and challenging real-world conditions. You’ll apply a disciplined, data-driven approach to understanding model behavior—developing tools to measure system performance, conduct A/B tests against established baselines, and generate clear, actionable insights that inform deployment decisions. This high-impact role will own the end-to-end productionization of LLM workflows: deploying models into low-latency, high-uptime environments, building monitoring and observability systems, implementing post-processing guardrails, and managing workflow versioning. ## What You’ll Do
Design and build agentic systems that turn LLMs into composable, dependable tools—leveraging retrieval, tool use, agentic reasoning, and structured outputs. - Collaborate with ML and infra engineers to scale and optimize agentic workflows, managing latency, context windows, and model choice. - Write high-quality, modular code that’s graceful under failure, flexible to change, and easy to iterate on. - Own major architectural decisions—how we architect workflows, define data flow, cache intermediate state, and structure generative outputs. - Drive rigorous evaluation: build benchmark datasets, develop automated and human-in-the-loop frameworks, design experiments to surface failure modes and edge cases, run A/B tests to inform deployment, and distill insights from clinician feedback to evaluate and guide model improvement. - Leverage frontier capabilities: rapidly prototype with new models and model capabilities, open-source tools, and novel prompting techniques. ## What You’ll Bring
3+ years of experience building production-grade systems, with 1–2+ years focused on LLM-powered or agentic products. - Deep fluency with LLM APIs, prompting strategies, and orchestration patterns (e.g., LangChain, LlamaIndex, custom pipelines). - Experience with retrieval systems (e.g., semantic and lexical retrieval, vector DBs, efficient kNN), function calling, tool-use, or agentic workflows. - Working knowledge of model evaluation, experience building diverse datasets, conducting both automated and human-in-the-loop evaluations, running A/B tests, and working with subject matter experts to guide model improvement. - Strong Python fundamentals—including ability to write clean code, design comprehensive test-cases, and familiarity with core language features and standard libraries; experience with async programming, performance profiling, packaging, and deployment tooling is strongly preferred. - Good taste and intuition: You know when to move fast, ship, and iterate and also when to take a beat to tackle tech debt. We value people who are eager to learn new things and recognize that great team members might not perfectly match a job description. If you’re interested in the role but aren’t sure whether or not you’re a good fit, we’d still like to hear from you. Must be willing to work from our SF or NYC office at least 3x per week.
This position requires a commitment to a hybrid work model, with the expectation of coming into the office a minimum of (3) three times per week. Relocation assistance is available for candidates willing to move to San Francisco. ## How we take care of Abridgers:
We are aware of individuals and entities fraudulently representing themselves as Abridge recruiters and/or hiring managers. Abridge will never ask for financial information or payment, or for personal information such as bank account number or social security number during the job application or interview process. Any emails from the Abridge recruiting team will come from an @abridge.com email address. You can learn more about how to protect yourself from these types of fraud by referring to this article. Please exercise caution and cease communications if something feels suspicious about your interactions.
Don't want to apply yourself?
Our team writes your resume, applies for you, preps you for interviews, and negotiates your offer.
Browse Jobs
By Role
By City