🏦

Under ReviewXinyang Han (UCB), Yiyou Sun (UCB)

Finance

Workflow-driven, evidence-rich domain with strict audit/compliance requirements—ideal for execution-focused AI agent benchmarks.

Contribute to Finance

Part 1

Finance Overview

Finance is one of the most promising domains for AI agent evaluation because the work is highly workflow-driven, produces abundant digital artifacts, and is constrained by strict correctness / auditability / compliance requirements.

Market Structure

Understanding the fundamental split between primary and secondary markets, and the buy-side/sell-side ecosystem.

Capital Flow

Primary Market

Companies raise capital directly from investors

IPOBond issuancePrivate placements

Examples: Deal structuring, Underwriting, Pricing

Secondary Market

Already-issued securities trade among investors

Liquidity provisionPrice discoveryTrading

Examples: NYSE, NASDAQ, HKEX, SSE

Ecosystem

Buy-Side

Invests on behalf of end investors to generate returns

Participants

Asset ManagersHedge FundsPrivate EquityPension FundsInsurers

Revenue Streams

Management fees (% of AUM)
Performance fees
Carried interest

Interaction Loop

Sell-Side

Creates/distributes products and services

Participants

Investment BanksBroker-DealersResearch ShopsMarket Makers

Revenue Streams

Advisory fees
Underwriting fees
Commissions & spreads

Buy-Side

Invests on behalf of end investors to generate returns

Participants

Asset ManagersHedge FundsPrivate EquityPension FundsInsurers

Revenue Streams

Management fees (% of AUM)
Performance fees
Carried interest

Interaction Loop

Sell-Side

Creates/distributes products and services

Participants

Investment BanksBroker-DealersResearch ShopsMarket Makers

Revenue Streams

Advisory fees
Underwriting fees
Commissions & spreads

Career transition: A common path is moving from sell-side to buy-side after building expertise and relationships.

Organizational Structure & Key Roles

How finance institutions organize around front/middle/back office functions, and the representative roles where execution-heavy work occurs.

Office Structure

Revenue

Control

Operations

Front Office

Revenue generation

Investment BankersTradersResearch AnalystsInvestment Managers

High comp, high pressure, client-facing, long hours

Middle Office

Risk + compliance control

Risk ManagersCompliance OfficersFinance Controllers

Ensure front office stays within risk/compliance boundaries

Back Office

Operations support

SettlementITCustomer SupportHR

Ensure business runs reliably (books & records, reconciliation, reporting)

Key Roles

IBD Analyst

Investment Banking

Execution engine. Builds models (3-statement, DCF, LBO), drafts pitch decks, runs diligence, manages VDRs.

Hours

70–90+ / week

Entry Comp

$100K–$200K

Career Progression

Analyst (2-3 yr)

Associate (3-4 yr)

VP (3-4 yr)

Portfolio Manager

Asset Management

Final decision-maker; portfolio construction, security selection, risk, fundraising/IR.

Hours

45–60 / week

Core Skills

Investment judgmentRisk managementClient relations

Quant Researcher

Quant Trading

Discovers signals/alphas; runs backtests; iterates hypotheses.

Core Skills

StatisticsStochastic calculusPython/R/C++

Research Analyst

Sell-Side Research

Maintains coverage universe; builds earnings/valuation models; publishes reports with ratings/targets.

Hours

~70 / week

Career Progression

Associate (2-3 yr)

Analyst

Part 2

Where LLM Agents Fit

Identifying high-confidence surfaces where AI agents can reliably execute specifiable, reviewable tasks—while respecting the "human-last" boundary for high-stakes decisions.

The "Human-Last" Boundary

Finance has a hard constraint: errors are expensive, and many decisions are regulated and audit-sensitive. The stable pattern is:

Agents can reliably execute what humans can specify and verify, but humans retain judgment for ambiguous, high-stakes decisions.

High-Confidence Agent Surfaces (v1)

Research → Evidence Packaging

Sell-side and buy-side research automation

High confidence

Pull data (prices, fundamentals, filings), build standardized tables
Maintain/update a valuation model (spreadsheet outputs, checks)
Draft structured summaries from predefined templates
Create plots, dashboards, and "what changed" diffs

Quant Research & Engineering

Strategy implementation and validation

High confidence

Implement a strategy specification in a backtest framework
Run backtests, produce standardized performance reports
Export trade logs and validate signal → trade consistency

Trading Operations / Execution

Order management and execution reporting

High confidence

Convert a structured instruction into broker API requests
Submit orders, poll status, handle errors, output execution reports
Generate post-trade position snapshots

Compliance / Middle-Office

Rules + documentation automation

Medium confidence

Map transactions to rule checklists
Produce auditable check outputs and exception reports (with escalation)

What's Already Mature in AI for Finance

NLP & Text Analysis

Earnings call / filings NLP

Sentiment extractionForward-looking statement taggingRisk factor summarizationFinBERT models

Automated Reporting

Deck drafting, compliance reports

Pitch deck templatingESG draftsCompliance report generation

Quant + ML

Signal discovery and execution

Alpha signal generationExecution optimizationMarket microstructure

For Agents' Last Exam, the key is not "does the model know finance", but:

Can the agent operate real tools end-to-end and produce verifiable artifacts?

Part 3

Example Tasks

Benchmarkable tasks that evaluate end-to-end execution on a computer: given raw inputs and constraints, the agent must use real tools to produce deliverables that a reviewer can verify.

Core Tasks (3)

Quant Research

Run a Quantitative Strategy in a Backtesting Framework

A quant researcher validates a strategy idea by running it through historical data.

Data Extraction

Retrieve and Parse SEC Filings via edgartools (XBRL Extraction)

An analyst extracts structured financial metrics from annual reports.

Trading Operations

Execute a Trade Instruction via Alpaca API (Paper Trading)

A trader executes a PM instruction in a trading system.

Alternative Tasks

Financial Modeling

Generate an Excel Financial Model via openpyxl

An IBD analyst builds a valuation model deliverable.

Portfolio Management

Portfolio Optimization via PyPortfolioOpt

A PM solves an allocation under constraints.

Recommended Tool Stack (All Free)

Data retrieval, backtesting, trade execution, modeling outputs, and optimization—all using open-source or free-tier tools.

SEC EDGAR

edgartools

yfinance

Backtrader

QuantConnect LEAN

Alpaca Trading API

openpyxl

xlsxwriter

PyPortfolioOpt

cvxpy

Task Comparison Summary

Task	Must-Use Software	Industry Rep.	Scoring	Scale	Priority
Backtest execution	Backtrader / LEAN	★★★★★	★★★★★	★★★★★	Core
SEC parsing	edgartools	★★★★☆	★★★★★	★★★★★	Core
Trade execution	Alpaca API	★★★★★	★★★★★	★★★★★	Core
Excel modeling	openpyxl	★★★★☆	★★★★☆	★★★★★	Optional
Portfolio optimization	PyPortfolioOpt	★★★☆☆	★★★★★	★★★★★	Optional

Contribute to Finance

We seek high-level, representative contributions—not exhaustive documentation. Share your expertise in any of these areas:

Submit Landscape Understanding

Help us map sectors, roles, tasks, and tools in finance. Share your perspective on the industry structure.

Submit a Workflow

Describe a specific professional task with tools, inputs, outputs, and how success is verified.

Our Commitments to Contributors

Evaluation Only: All contributions are used exclusively for agent evaluation, never for model training.
Partner Review: Industry partners can review and approve task specifications before public release.
Data Control: Contributors can exclude sensitive or proprietary data from submissions.