Job Overview
-
Date PostedJanuary 28, 2026
-
Country
-
Expiration date--
Job Description
Mindrift connects experienced engineers with project-based AI work for top tech companies. We are currently seeking Freelance Agent Evaluation Engineers to help test, evaluate, and improve AI agents through structured, real-world scenarios. This role is ideal for engineers who enjoy flexible, project-based work and want to contribute directly to the development of cutting-edge AI systems.
Your Role
As a Freelance Agent Evaluation Engineer, you will:
- Design test cases and establish gold-standard evaluation criteria for AI agents
- Analyze agent behavior, logs, and failure modes to identify areas for improvement
- Build, iterate, and optimize prompts, evaluation scenarios, and testing logic
- Work with code repositories, test frameworks, and structured formats like JSON/YAML
Required Skills and Experience
To succeed in this role, candidates should have:
- 3+ years of software development experience, with strong Python skills
- Proficiency with Git, JSON/YAML, and Docker
- Understanding of LLM limitations, AI behavior, and evaluation design
- English proficiency at B2 level or higher
How It Works
- Freelance, project-based work with ≈6–10 hours per week during active phases
- Flexibility to choose projects and work on your own schedule
- Paid per project/task, with rates up to $80/hour depending on scope and expertise
This Freelance Agent Evaluation Engineer role is perfect for software engineers passionate about AI, testing, and iterative improvement, looking for flexible, high-impact work with leading technology teams.