You will be joining MillionLogics, a global IT solutions leader, as a software practitioner to evaluate and improve datasets for agentic coding models. Your responsibilities will include:
Working with realistic coding tasks in an agentic coding harness, reviewing model trajectories, verifying solutions, and producing high-quality annotations
Online evaluations: Interacting with blinded models on predefined tasks, ranking and grading resulting trajectories
Offline evaluations: Designing realistic coding tasks, calibrating through user simulation, writing task-specific rubrics, and grading trajectories
Reading and debugging code, validating behavior, following detailed process rules, and making consistent judgment calls
Independently working on realistic software tasks, not just toy problems or shallow code-review exercises
Qualifications Required:
Software Engineering fluency with 5+ years of experience in software engineering, QA, developer tooling, or similar roles
Strong hands-on experience in at least 12 programming languages like Python, JavaScript/TypeScript, Rust, Java, C/C++, etc. - Ability to read and understand unfamiliar codebases, run tests, debug issues, and evaluate implementations
Mandatory Terminal & Tooling Skills:
Comfortable working in Linux/Ubuntu-like environments
Familiarity with Docker and reproducible environments
Additional Preferred Qualifications:
Strong Docker skills and experience in large, complex repositories
Demonstrated originality and sound engineering judgment in defining technical problems
Ability to design realistic, non-trivial tasks beyond tutorials or simple bug fixes
The role requires 8 hours per day with a 4-hour overlap with PST as a contractor position without medical/paid leave for a 5-week duration. To apply, send your updated CV to easyhiring.pro with job ID 75232 in the email subject line. You will be joining MillionLogics, a global IT solutions leader, as a software practitioner to evaluate and improve datasets for agentic coding models. Your responsibilities will include:
Working with realistic coding tasks in an agentic coding harness, reviewing model trajectories, verifying solutions, and producing high-quality annotations
Online evaluations: Interacting with blinded models on predefined tasks, ranking and grading resulting trajectories
Offline evaluations: Designing realistic coding tasks, calibrating through user simulation, writing task-specific rubrics, and grading trajectories
Reading and debugging code, validating behavior, following detailed process rules, and making consistent judgment calls
Independently working on realistic software tasks, not just toy problems or shallow code-review exercises
Qualifications Required:
Software Engineering fluency with 5+ years of experience in software engineering, QA, developer tooling, or similar roles
Strong hands-on experience in at least 12 programming languages like Python, JavaScript/TypeScript, Rust, Java, C/C++, etc. - Ability to read and understand unfamiliar codebases, run tests, debug issues, and evaluate implementations
Mandatory Terminal & Tooling Skills:
Comfortable working in Linux/Ubuntu-like environments
Familiarity with Docker and reproducible environments
Additional Preferred Qualifications:
Strong Docker skills and experience in large, complex repositories
Demonstrated originality and sound engineering judgment in defining technical problems
Ability to design realistic, non-trivial tasks beyond tutorials or simple bug fixes
The role requires 8 hours per day with a 4-hour overlap with PST as a contractor position without medical/paid leave for a 5-week duration. To apply, send your updated CV to easyhiring.pro with job ID 75232 in the email subject line.