Shape the Benchmark
4 min readAgent-HLE is Defined
by Domain Experts
- Insight into how agents perform in your workflow
- Co-authorship on manuscript
- Monetary awards for high-impact contributions
The Task We Are Looking For
Industry production tasks executed using professional-grade tools or software, not simplified chat interactions.
We are collecting industry tasks that are:
Complex
Takes experts days, not minutes—substantial domain expertise and effort.
Representative
Workflows used in real industry with the right professional tools.
Verifiable
Outputs that are deterministic or scored against a clear rubric.
We strongly recommend watching the video below before submitting.
How to Make Your Output Verifiable
Follow this decision tree to decide what to upload for objective evaluation. Click a question to expand.
- •Medical imaging: A task that generates a report identifying tumor slice(s) in a brain MRI. If the key step is adjusting contrast / viewing layers to localize the tumor, a verifiable output can be a coordinate tuple—check whether it falls within an allowed region.
- •Earth science: A task that retrieves data from a designated database. Because historical data is fixed, you can verify with a query like: “What is the average rainfall from 2010–2013 in Wisconsin?”
Video editing: A task like “change the watermelon from green to black”—take a screenshot and ask: “Is the watermelon black or green?”
Upload your project file (the one opened in the professional tool or software) that generates the output. We'll help identify a verifiable subtask.
How to Prepare Your Submission
Every task submission follows a consistent structure. Here are the five components you need to define, followed by example tasks from different industries.
Demo Tasks
See more on the demo pageReproduce a Reproduce a Flash game using other engine (RPGMaker XP)

Flash game (.swf) file and template project
Playable game in exe
Reproduced game; Screenshots of all levels match the reference
Construct a 3D industrial part from 2D input specifications using SolidWorks

2D blueprint with dimensions and specifications (PNG)
3D model file (.obj) matching all specified dimensions and features
Dimensional accuracy within tolerance; all features present; correct topology
Ready to contribute? Submit your task and help build the benchmark.
Evaluate your task idea
Describe the task idea in plain language. We will assess whether it fits AgentHLE's criteria, but this step does not create or save a submission.
Plain English is fine. Focus on the goal, required tools, expected outputs, and how the result should be checked.
Have Questions?
See more in FAQ and the Documentation.