Agents' Last Exam
Challenge and measure AI agents on economically valuable and real-world tasks.
Agents' Last Exam aims to build the largest-scale, broadest-coverage agent evaluation benchmark to date, measuring performance on long-horizon, economically valuable tasks with verifiable outcomes. Led by Berkeley RDI and 300+ industry experts, it now spans all 55 targeted sub-industries covering most major fields of professional work performed on a computer, with 1,500+ tasks collected toward a 5,000-task target, keeping scores objective, comparable, and meaningful across domains.

Motion & VFX
Agents complete animation and visual effects production tasks in Adobe After Effects.
Animation

3D Modeling
Agents perform 3D model creation and editing tasks in Siemens NX.
Engineering

Game Development
Scene setup, asset placement, and rendering tasks in Unreal Engine.
Game Dev

Mold Flow Analysis
Simulation and mold flow analysis tasks in Moldex3D manufacturing software.
Manufacturing

Architectural Modeling
3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.
Architecture

Brain Imaging
Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.
Neuroscience

Motion & VFX
Agents complete animation and visual effects production tasks in Adobe After Effects.
Animation

3D Modeling
Agents perform 3D model creation and editing tasks in Siemens NX.
Engineering

Game Development
Scene setup, asset placement, and rendering tasks in Unreal Engine.
Game Dev

Mold Flow Analysis
Simulation and mold flow analysis tasks in Moldex3D manufacturing software.
Manufacturing

Architectural Modeling
3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.
Architecture

Brain Imaging
Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.
Neuroscience

Motion & VFX
Agents complete animation and visual effects production tasks in Adobe After Effects.
Animation

3D Modeling
Agents perform 3D model creation and editing tasks in Siemens NX.
Engineering

Game Development
Scene setup, asset placement, and rendering tasks in Unreal Engine.
Game Dev

Mold Flow Analysis
Simulation and mold flow analysis tasks in Moldex3D manufacturing software.
Manufacturing

Architectural Modeling
3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.
Architecture

Brain Imaging
Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.
Neuroscience

Motion & VFX
Agents complete animation and visual effects production tasks in Adobe After Effects.
Animation

3D Modeling
Agents perform 3D model creation and editing tasks in Siemens NX.
Engineering

Game Development
Scene setup, asset placement, and rendering tasks in Unreal Engine.
Game Dev

Mold Flow Analysis
Simulation and mold flow analysis tasks in Moldex3D manufacturing software.
Manufacturing

Architectural Modeling
3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.
Architecture

Brain Imaging
Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.
Neuroscience
What makes Agents' Last Exam different
55
Sub-Industries Covered
1.5K+
Tasks Collected
300+
Experts
Co-led by
×
Contributors & Partners from
Academic Institutions
MIT
Harvard
Stanford
UC Berkeley
Oxford
CMU
Caltech
ETH Zurich
Yale
Columbia
UPenn
Cornell
Brown
Johns Hopkins
NIH
UCLA
UCSF
NYU
U Michigan
U Washington
Georgia Tech
USC
UIUC
WashU
U Melbourne
UC San Diego
UC Santa Barbara
UC Irvine
UW-Madison
Emory
UNC
McGill
U Waterloo
Boston University
U Helsinki
Monash
U Colorado
UC Santa Cruz
UC Riverside
Northeastern
Syracuse
Lehigh
UT Southwestern
Texas A&M
MIT
Harvard
Stanford
UC Berkeley
Oxford
CMU
Caltech
ETH Zurich
Yale
Columbia
UPenn
Cornell
Brown
Johns Hopkins
NIH
UCLA
UCSF
NYU
U Michigan
U Washington
Georgia Tech
USC
UIUC
WashU
U Melbourne
UC San Diego
UC Santa Barbara
UC Irvine
UW-Madison
Emory
UNC
McGill
U Waterloo
Boston University
U Helsinki
Monash
U Colorado
UC Santa Cruz
UC Riverside
Northeastern
Syracuse
Lehigh
UT Southwestern
Texas A&MIndustries
Goldman Sachs
JPMorgan
Morgan Stanley
PIMCO
Meta
Amazon
Adobe
Oracle
Hippocratic AI
HubSpot
Brix
Photon Fund
TCCI
Goldman Sachs
JPMorgan
Morgan Stanley
PIMCO
Meta
Amazon
Adobe
Oracle
Hippocratic AI
HubSpot
Brix
Photon Fund
TCCIAdvisory Committee










Why Contribute - Help Set the Standard for
Agent Evaluation in Your Industry
Shape evaluation standards, publish research, and earn recognition.
Insight into Agents in Industry
See exactly how AI agents handle real workflows in industry, and where they fall short.
Learn more →Co-authorship on Manuscript
Qualifying contributors receive co-authorship credit on the research publication.
Learn more →Monetary Awards
High-impact contributions are recognized with monetary awards from our $100K+ funding pool.
Learn more →For Domain Experts
Contribute domain expertise and real workflow data - no coding required.
For Researchers & Engineers
Turn real workflows into challenging, reproducible agent benchmarks: setup, execution, and evaluation.
FAQ
Common questions about software access, authorship, venues, and timeline.
FAQ page →Contact
For inquiries, reach out to the team directly.
rdi_research@berkeley.edu →Stay Updated
Subscribe for announcements, benchmark releases, and updates.
Join mailing list →


