Senior Software Engineer — AI Evaluation & Benchmarks (Python)

MiamiEst. $166K – $208K

Posted May 12, 2026

Before Applying

This role is open to contractors in accepted locations only. Please confirm your country is on the list before applying — we're unable to process applications from unlisted locations. List of accepted countries and locations. For US applicants: This is a 1099 independent contractor role. It is not compatible with F-1 OPT, STEM OPT, or any visa status that requires W-2 employment, guaranteed hours, or employer sponsorship. We are unable to provide offer letters or employment verification for this role. ## What You'll Be Doing

Design and build the coding benchmarks and evaluation pipelines used to test frontier AI models on real software engineering work:

Design coding benchmarks that evaluate frontier models on real-world programming tasks — reasoning, debugging, and production-quality code
Build and maintain scalable data pipelines for evaluation workflows

Senior Software Engineer — AI Evaluation & Benchmarks (Python)

Before Applying

Logistics

More jobs like this

Senior QA Engineer - AI Model Evaluation

Senior Engineer, Design Evaluation Engineer

Research Lead - AI Cyber Testing & Evaluation

Explore more

More jobs like this

Senior QA Engineer - AI Model Evaluation

Senior Engineer, Design Evaluation Engineer

Research Lead - AI Cyber Testing & Evaluation