Agents' Last Exam

Challenge and measure AI agents on economically valuable and real-world tasks.

Agents' Last Exam aims to build the largest-scale, broadest-coverage agent evaluation benchmark to date, measuring performance on long-horizon, economically valuable tasks with verifiable outcomes. Led by Berkeley RDI and 300+ industry experts, it now spans all 55 targeted sub-industries covering most major fields of professional work performed on a computer, with 1,500+ tasks collected toward a 5,000-task target, keeping scores objective, comparable, and meaningful across domains.

GitHub LeaderboardarXivsoonContribute a Task

Motion & VFX

Motion & VFX

Agents complete animation and visual effects production tasks in Adobe After Effects.

Animation

3D Modeling

3D Modeling

Agents perform 3D model creation and editing tasks in Siemens NX.

Engineering

Game Development

Game Development

Scene setup, asset placement, and rendering tasks in Unreal Engine.

Game Dev

Mold Flow Analysis

Mold Flow Analysis

Simulation and mold flow analysis tasks in Moldex3D manufacturing software.

Manufacturing

Architectural Modeling

Architectural Modeling

3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.

Architecture

Brain Imaging

Brain Imaging

Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.

Neuroscience

Motion & VFX

Motion & VFX

Agents complete animation and visual effects production tasks in Adobe After Effects.

Animation

3D Modeling

3D Modeling

Agents perform 3D model creation and editing tasks in Siemens NX.

Engineering

Game Development

Game Development

Scene setup, asset placement, and rendering tasks in Unreal Engine.

Game Dev

Mold Flow Analysis

Mold Flow Analysis

Simulation and mold flow analysis tasks in Moldex3D manufacturing software.

Manufacturing

Architectural Modeling

Architectural Modeling

3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.

Architecture

Brain Imaging

Brain Imaging

Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.

Neuroscience

Motion & VFX

Motion & VFX

Agents complete animation and visual effects production tasks in Adobe After Effects.

Animation

3D Modeling

3D Modeling

Agents perform 3D model creation and editing tasks in Siemens NX.

Engineering

Game Development

Game Development

Scene setup, asset placement, and rendering tasks in Unreal Engine.

Game Dev

Mold Flow Analysis

Mold Flow Analysis

Simulation and mold flow analysis tasks in Moldex3D manufacturing software.

Manufacturing

Architectural Modeling

Architectural Modeling

3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.

Architecture

Brain Imaging

Brain Imaging

Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.

Neuroscience

Motion & VFX

Motion & VFX

Agents complete animation and visual effects production tasks in Adobe After Effects.

Animation

3D Modeling

3D Modeling

Agents perform 3D model creation and editing tasks in Siemens NX.

Engineering

Game Development

Game Development

Scene setup, asset placement, and rendering tasks in Unreal Engine.

Game Dev

Mold Flow Analysis

Mold Flow Analysis

Simulation and mold flow analysis tasks in Moldex3D manufacturing software.

Manufacturing

Architectural Modeling

Architectural Modeling

3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.

Architecture

Brain Imaging

Brain Imaging

Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.

Neuroscience

Motion & VFX

Motion & VFX

3D Modeling

3D Modeling

Game Development

Game Development

Mold Flow Analysis

Mold Flow Analysis

Architectural Modeling

Architectural Modeling

Brain Imaging

Brain Imaging

What makes Agents' Last Exam different

Broadest CoverageVerifiable OutcomesLong-HorizonEconomically Valuable

55

Sub-Industries Covered

1.5K+

Tasks Collected

300+

Experts

Co-led by

×

Contributors & Partners from

Academic Institutions

MIT

MIT

Harvard

Harvard

Stanford

Stanford

UC Berkeley

UC Berkeley

Oxford

Oxford

CMU

CMU

Caltech

Caltech

ETH Zurich

ETH Zurich

Yale

Yale

Columbia

Columbia

UPenn

UPenn

Cornell

Cornell

Brown

Brown

Johns Hopkins

Johns Hopkins

NIH

NIH

UCLA

UCLA

UCSF

UCSF

NYU

NYU

University of Michigan

U Michigan

University of Washington

U Washington

Georgia Tech

Georgia Tech

USC

USC

UIUC

UIUC

Washington University in St. Louis

WashU

University of Melbourne

U Melbourne

UC San Diego

UC San Diego

UC Santa Barbara

UC Santa Barbara

UC Irvine

UC Irvine

University of Wisconsin-Madison

UW-Madison

Emory

Emory

UNC Chapel Hill

UNC

McGill

McGill

University of Waterloo

U Waterloo

Boston University

Boston University

University of Helsinki

U Helsinki

Monash

Monash

University of Colorado

U Colorado

UC Santa Cruz

UC Santa Cruz

UC Riverside

UC Riverside

Northeastern

Northeastern

Syracuse

Syracuse

Lehigh

Lehigh

UT Southwestern

UT Southwestern

Texas A&M

Texas A&M

MIT

MIT

Harvard

Harvard

Stanford

Stanford

UC Berkeley

UC Berkeley

Oxford

Oxford

CMU

CMU

Caltech

Caltech

ETH Zurich

ETH Zurich

Yale

Yale

Columbia

Columbia

UPenn

UPenn

Cornell

Cornell

Brown

Brown

Johns Hopkins

Johns Hopkins

NIH

NIH

UCLA

UCLA

UCSF

UCSF

NYU

NYU

University of Michigan

U Michigan

University of Washington

U Washington

Georgia Tech

Georgia Tech

USC

USC

UIUC

UIUC

Washington University in St. Louis

WashU

University of Melbourne

U Melbourne

UC San Diego

UC San Diego

UC Santa Barbara

UC Santa Barbara

UC Irvine

UC Irvine

University of Wisconsin-Madison

UW-Madison

Emory

Emory

UNC Chapel Hill

UNC

McGill

McGill

University of Waterloo

U Waterloo

Boston University

Boston University

University of Helsinki

U Helsinki

Monash

Monash

University of Colorado

U Colorado

UC Santa Cruz

UC Santa Cruz

UC Riverside

UC Riverside

Northeastern

Northeastern

Syracuse

Syracuse

Lehigh

Lehigh

UT Southwestern

UT Southwestern

Texas A&M

Texas A&M

Industries

Goldman Sachs

Goldman Sachs

JPMorgan

JPMorgan

Morgan Stanley

Morgan Stanley

PIMCO

PIMCO

Meta

Meta

Amazon

Amazon

Adobe

Adobe

Oracle

Oracle

Hippocratic AI

Hippocratic AI

HubSpot

HubSpot

Brix

Brix

Photon Fund

Photon Fund

Snorkel AI

Snorkel AI

Unipat AI

Unipat AI

Tianqiao and Chrissy Chen Institute

TCCI

Goldman Sachs

Goldman Sachs

JPMorgan

JPMorgan

Morgan Stanley

Morgan Stanley

PIMCO

PIMCO

Meta

Meta

Amazon

Amazon

Adobe

Adobe

Oracle

Oracle

Hippocratic AI

Hippocratic AI

HubSpot

HubSpot

Brix

Brix

Photon Fund

Photon Fund

Snorkel AI

Snorkel AI

Unipat AI

Unipat AI

Tianqiao and Chrissy Chen Institute

TCCI

Advisory Committee

George Em Karniadakis

George Em KarniadakisProfessor @Brown

Applied Mathematics

NAE MemberPINN · DeepONet co-creator

Tapio Schneider

Tapio SchneiderProfessor @Caltech

Climate Dynamics

CliMA founderGoogle Principal Scientist

Teresa Head-Gordon

Teresa Head-GordonChancellor's Prof. @UC Berkeley

Computational Chemistry

ACS / AIMBE FellowNIH AViDD Center PI

Laure Zanna

Laure ZannaProfessor @NYU Courant

Ocean & Climate Science

Keller ChairNSF LEAP co-director

Jack Gallant

Jack GallantChancellor's Prof. @UC Berkeley

Computational Neuroscience

Brain-decoding pioneerWheeler Center co-director

Tarek Zohdi

Tarek ZohdiAssoc. Dean @UC Berkeley

Computational Mechanics

Humboldt Research AwardFormer USACM President

Ida Sim

Ida SimChief Research Informatics Officer @UCSF

Clinical Research Informatics

NAM MemberVivli co-founder

Arvind Rao

Arvind RaoProfessor @U Michigan

Biomedical Informatics

AAAS FellowAMIA Fellow (FAMIA)

Kaan Ozbay

Kaan OzbayDirector, C2SMART @NYU

NSF CAREER AwardGlobal Network Professor

Carl Boettiger

Carl BoettigerAssoc. Professor @UC Berkeley

Ecological Forecasting

rOpenSci co-founderESA Early Career Fellow

Kyle Steinfeld

Kyle SteinfeldAssoc. Professor @UC Berkeley

Computational Design

MDes DirectorHellman Fellow

Yamini Rangan

Yamini RanganCEO @HubSpot

CRM & Marketing Automation

Former Dropbox CCO

Bradley Rothenberg

Bradley RothenbergCEO @nTop

Engineering Design

Implicit modeling CADNASA, Lockheed, Honeywell

Why Contribute - Help Set the Standard for
Agent Evaluation in Your Industry

Shape evaluation standards, publish research, and earn recognition.

Insight into Agents in Industry

See exactly how AI agents handle real workflows in industry, and where they fall short.

Co-authorship on Manuscript

Qualifying contributors receive co-authorship credit on the research publication.

Monetary Awards

High-impact contributions are recognized with monetary awards from our $100K+ funding pool.

Choose How You'd Like to Contribute

For Domain Experts

Contribute domain expertise and real workflow data - no coding required.

Contribute a Task

For Researchers & Engineers

Turn real workflows into challenging, reproducible agent benchmarks: setup, execution, and evaluation.

FAQ

Common questions about software access, authorship, venues, and timeline.

Contact

For inquiries, reach out to the team directly.

rdi_research@berkeley.edu →

Stay Updated

Subscribe for announcements, benchmark releases, and updates.

Join mailing list →