Humanity's Last Exam for Agents
Challenge and measure AI agents on economically valuable and real-world tasks.
Agent-HLE (Humanity's Last Exam for Agents) aims to build the largest-scale, broadest-coverage agent evaluation benchmark to date, measuring performance on long-horizon, economically valuable tasks with verifiable outcomes. Led by Berkeley RDI and 100+ industry experts, Agent-HLE aims to cover over 90% of non-physical industries, keeping scores objective, comparable, and meaningful across domains.

Motion & VFX
Agents complete animation and visual effects production tasks in Adobe After Effects.
Animation

3D Modeling
Agents perform 3D model creation and editing tasks in Siemens NX.
Engineering

Game Development
Scene setup, asset placement, and rendering tasks in Unreal Engine.
Game Dev

Mold Flow Analysis
Simulation and mold flow analysis tasks in Moldex3D manufacturing software.
Manufacturing

Architectural Modeling
3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.
Architecture

Brain Imaging
Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.
Neuroscience

Motion & VFX
Agents complete animation and visual effects production tasks in Adobe After Effects.
Animation

3D Modeling
Agents perform 3D model creation and editing tasks in Siemens NX.
Engineering

Game Development
Scene setup, asset placement, and rendering tasks in Unreal Engine.
Game Dev

Mold Flow Analysis
Simulation and mold flow analysis tasks in Moldex3D manufacturing software.
Manufacturing

Architectural Modeling
3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.
Architecture

Brain Imaging
Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.
Neuroscience

Motion & VFX
Agents complete animation and visual effects production tasks in Adobe After Effects.
Animation

3D Modeling
Agents perform 3D model creation and editing tasks in Siemens NX.
Engineering

Game Development
Scene setup, asset placement, and rendering tasks in Unreal Engine.
Game Dev

Mold Flow Analysis
Simulation and mold flow analysis tasks in Moldex3D manufacturing software.
Manufacturing

Architectural Modeling
3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.
Architecture

Brain Imaging
Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.
Neuroscience

Motion & VFX
Agents complete animation and visual effects production tasks in Adobe After Effects.
Animation

3D Modeling
Agents perform 3D model creation and editing tasks in Siemens NX.
Engineering

Game Development
Scene setup, asset placement, and rendering tasks in Unreal Engine.
Game Dev

Mold Flow Analysis
Simulation and mold flow analysis tasks in Moldex3D manufacturing software.
Manufacturing

Architectural Modeling
3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.
Architecture

Brain Imaging
Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.
Neuroscience
What makes Agent-HLE different
50+
Target Industries
5K+
Target Tasks
$100K+
Award Pool
Co-led by
×
100+ Contributors from
Academic Institutions
UC Berkeley
MIT
Stanford
Harvard
Oxford
USC
UC San Diego
UCSF
Syracuse
NIH
U of Colorado
Peking Univ.
SJTU
UC Berkeley
MIT
Stanford
Harvard
Oxford
USC
UC San Diego
UCSF
Syracuse
NIH
U of Colorado
Peking Univ.
SJTUIndustries
Genentech
Goldman Sachs
Morgan Stanley
JPMorgan
Citadel
PIMCO
Tesla
Meta
TDK
Brix
Photon Fund
Genentech
Goldman Sachs
Morgan Stanley
JPMorgan
Citadel
PIMCO
Tesla
Meta
TDK
Brix
Photon Fund Why Contribute - Help Set the Standard for
Agent Evaluation in Your Industry
Shape evaluation standards, publish research, and earn recognition.
Insight into Agents in Industry
See exactly how AI agents handle real workflows in industry, and where they fall short.
Learn more →Co-authorship on Manuscript
Qualifying contributors receive co-authorship credit on the research publication.
Learn more →Monetary Awards
High-impact contributions are recognized with monetary awards from our $100K+ funding pool.
Learn more →For Domain Experts
Contribute domain expertise and real workflow data - no coding required.
For Researchers & Engineers
Turn real workflows into challenging, reproducible agent benchmarks: setup, execution, and evaluation.
Advisory Committee










More advisors to be announced
FAQ
Common questions about software access, authorship, venues, and timeline.
FAQ page →Contact
For inquiries, reach out to the team directly.
rdi_research@berkeley.edu →Stay Updated
Subscribe for announcements, benchmark releases, and updates.
Join mailing list →


