Humanity's Last Exam for Agents

Challenge and measure AI agents on economically valuable and real-world tasks.

Agent-HLE (Humanity's Last Exam for Agents) aims to build the largest-scale, broadest-coverage agent evaluation benchmark to date, measuring performance on long-horizon, economically valuable tasks with verifiable outcomes. Led by Berkeley RDI and 100+ industry experts, Agent-HLE aims to cover over 90% of non-physical industries, keeping scores objective, comparable, and meaningful across domains.

Motion & VFX

Motion & VFX

3D Modeling

3D Modeling

Game Development

Game Development

Mold Flow Analysis

Mold Flow Analysis

Architectural Modeling

Architectural Modeling

Brain Imaging

Brain Imaging

What makes Agent-HLE different

Broadest CoverageVerifiable OutcomesLong-HorizonEconomically Valuable

50+

Target Industries

5K+

Target Tasks

$100K+

Award Pool

Co-led by

Berkeley RDI logo×RDI Foundation logo

100+ Contributors from

Academic Institutions

UC BerkeleyUC Berkeley
MITMIT
StanfordStanford
HarvardHarvard
OxfordOxford
USCUSC
UC San DiegoUC San Diego
UCSFUCSF
SyracuseSyracuse
NIHNIH
U of ColoradoU of Colorado
Peking UniversityPeking Univ.
SJTUSJTU
UC BerkeleyUC Berkeley
MITMIT
StanfordStanford
HarvardHarvard
OxfordOxford
USCUSC
UC San DiegoUC San Diego
UCSFUCSF
SyracuseSyracuse
NIHNIH
U of ColoradoU of Colorado
Peking UniversityPeking Univ.
SJTUSJTU

Industries

GenentechGenentech
Goldman SachsGoldman Sachs
Morgan StanleyMorgan Stanley
JPMorganJPMorgan
CitadelCitadel
PIMCOPIMCO
TeslaTesla
MetaMeta
TDKTDK
BrixBrix
Photon FundPhoton Fund
GenentechGenentech
Goldman SachsGoldman Sachs
Morgan StanleyMorgan Stanley
JPMorganJPMorgan
CitadelCitadel
PIMCOPIMCO
TeslaTesla
MetaMeta
TDKTDK
BrixBrix
Photon FundPhoton Fund

Why Contribute - Help Set the Standard for
Agent Evaluation in Your Industry

Shape evaluation standards, publish research, and earn recognition.

Insight into Agents in Industry

See exactly how AI agents handle real workflows in industry, and where they fall short.

Learn more

Co-authorship on Manuscript

Qualifying contributors receive co-authorship credit on the research publication.

Learn more

Monetary Awards

High-impact contributions are recognized with monetary awards from our $100K+ funding pool.

Learn more
Choose How You'd Like to Contribute

For Domain Experts

Contribute domain expertise and real workflow data - no coding required.

For Researchers & Engineers

Turn real workflows into challenging, reproducible agent benchmarks: setup, execution, and evaluation.

Advisory Committee

More advisors to be announced

FAQ

Common questions about software access, authorship, venues, and timeline.

FAQ page →

Contact

For inquiries, reach out to the team directly.

rdi_research@berkeley.edu →

Stay Updated

Subscribe for announcements, benchmark releases, and updates.

Join mailing list →