Posted May 10, 2026
Experience designing evaluations, benchmarks, or metrics for AI systems. - Strong written and verbal communication skills, particularly in explaining complex concepts to diverse stakeholders. - Ability to manage multiple concurrent projects in a fast-moving environment. - Strong experience with Perplexity or other frontier AI models in production settings. - Demonstrated experience with Python — you'll prototype, debug, automate, and build systems at scale. - 3+ years of experience working with LLMs in a product or research setting. ### Preferred
Experience with A/B testing or experimentation frameworks. - Track record of improving AI system performance through systematic evaluation and iteration. ## This Role May Be a Great Fit for You if You
Get excited about edge cases in model behavior and love digging into how an answer could be better. - Enjoy turning qualitative "this feels off" intuitions into quantitative metrics and systematic fixes. - Want to work at the intersection of research and product, where your work ships to real users same-day. - Are comfortable with ambiguity and can define what "good" looks like for novel AI features. - Have a hacker spirit — you'd rather build a quick prototype to test a hypothesis than debate it in a doc. - Care deeply about making AI more reliable and useful for our users.
Don't want to apply yourself?
Our team writes your resume, applies for you, preps you for interviews, and negotiates your offer.
Browse Jobs
By Role
By City