⚗️

Under Review

Chemistry

Software-heavy, highly standardized workflows with deterministic, verifiable outputs—ideal for benchmarking AI agents on real professional tasks.

Contribute to Chemistry

Part 1

Industry Landscape

Chemistry is not a single industry but a foundational technology discipline that permeates nearly all manufacturing sectors. From pharmaceuticals to smartphone screens, gasoline to lithium batteries—wherever molecular-level design is needed, chemists are involved.

$5.7T

Global Chemical Products Market

2023

$4.5B+

AI Drug Discovery Market

2023 funding

$71B

Drug Discovery Market

2024, projected $120B+ by 2030

$1-2.6B

New Drug Development Cost

10-15 years timeline

Major Application Sectors

💊

Pharmaceuticals & Life Sciences

Highest R&D investment density. New drug: 10-15 years, $1-2.6B. Core logic: massive screening + iterative optimization.

Computational penetration deepest; Insilico Medicine's INS018_055 became first fully AI-discovered drug to enter Phase II.

⚗️

Specialty & Fine Chemicals

High-value, small-batch: electronic chemicals, flavors & fragrances, pharmaceutical intermediates.

Computational penetration growing rapidly, especially in electronic chemicals for semiconductor precision.

🛢️

Petrochemicals & Bulk Chemicals

Largest-volume sector—billions of tons annually. Focus on catalyst design and process optimization.

Applications concentrate on catalytic surface reactions (DFT) and process simulation (Aspen Plus).

🔋

Materials Chemistry

Polymers, coatings, adhesives, battery materials, semiconductor materials.

Battery materials growing rapidly—cathode materials, electrolytes, separators.

For Agents' Last Exam: Computational chemistry work operates entirely on computers with mature open-source tools, generates perfectly reproducible outputs, and has established validation protocols— making it ideal for benchmarking AI agents on real professional tasks.

Part 2

The Dual Work Mode: Experimental vs Computational

A structural feature across all chemistry sectors: every R&D organization operates with two parallel tracks. Their collaboration model is key to understanding chemistry workflows.

Two Parallel Tracks

Experimental Side (Wet Lab)

~70-80% of R&D staff

•Work: Synthesize molecules, operate instruments (NMR, MS, HPLC), collect data

•Rhythm: Days—reaction cycle typically 1–3 days; med-chem cycle 2–4 weeks

•Software: ChemDraw, MestReNova, ELN, SciFinder/Reaxys

Computational Side (Dry Lab)

~10-20% of staff, disproportionate influence

•Work: Simulate molecular behavior, predict properties, virtual screening, build models

•Rhythm: Hours to days—DFT: min-hrs; virtual screen: hrs-day; MD: hrs-days

•Software: Gaussian/ORCA/Psi4, GROMACS/AMBER, AutoDock Vina, RDKit

Collaboration Dynamics: "Computation Scouts, Experiments Validate"

The traditional model of "experiments lead, computation assists" is shifting to computation-first:

🎯

Virtual Screening → Validation

Computational team screens millions virtuallyExperimental team synthesizes/tests Top 1000

📊

Prediction → Precision Synthesis

QSAR/ADMET models predict best directionsMed-chem team focuses on predicted winners

🔬

Simulation → Mechanistic Insight

MD simulations reveal binding dynamicsGuide structural modification strategies

Economic driver: Synthesizing one compound costs $5K–$50K; one computation costs ~$10 of cloud computing.

"Compute-First" Paradigm Shift

•Novartis compressed 39 years of computational work into 9 hours (AWS elastic HPC)

•Recursion + Exscientia merger (late 2024, ~$688M) claims to "industrialize drug discovery"

•Insilico Medicine's INS018_055 became first fully AI-discovered drug to enter Phase II

Part 3

Tools & Infrastructure

The chemistry software ecosystem spans experimental and computational sides in a highly layered structure—different precision levels solve different scale problems.

Computational Software Ecosystem

Quantum Chemistry

Electronic structure — highest precision, smallest systems (1-500 atoms)

GaussianCommercial $2-10K

Oldest, highest-cited; closed documentation

ORCAAcademic free

Fastest-growing; excellent documentation

Psi4Open source (LGPL)

Best Python API; ideal for automated pipelines

Molecular Dynamics

Atomic level — medium precision, larger systems (10K-1M atoms)

GROMACSOpen source (LGPL)

Most widely used MD; excellent GPU acceleration

AMBERPartial commercial

Pharma preference; high-quality force fields

OpenMMOpen source (MIT)

Python-native; highest customization flexibility

Molecular Docking

Protein-ligand binding prediction

AutoDock VinaOpen source (Apache 2.0)

Most widely used open-source

Schrödinger GlideCommercial $50K+/yr

Pharma industry standard

Cheminformatics

Molecular representation level — fast, massive systems

RDKitOpen source (BSD-3)

De facto standard; chemistry's pandas

OpenBabelOpen source (GPL)

Format conversion essential (110+ formats)

DeepChemOpen source (MIT)

GNN and deep learning modeling

Open-Source Software Stack ($0 Cost)

Quantum chemistry: Psi4 (LGPL) + ORCA (academic free)

Molecular dynamics: GROMACS (LGPL)

Molecular docking: AutoDock Vina (Apache 2.0)

Cheminformatics: RDKit (BSD-3) + scikit-learn (BSD-3)

Retrosynthesis: ASKCOS (MIT) — requires Docker deployment

Format conversion: OpenBabel (GPL)

One-line installation (computational core):

conda install -c conda-forge psi4 gromacs rdkit openbabel autodock-vina mdanalysis

Part 4

Core Computational Roles

Understanding chemistry's computational workforce structure. Chemistry is among the most PhD-dense industries—computational chemists are nearly 100% PhD-holders.

Computational Team Roles

Computational / Quantum Chemist

Pharma computational teams, academic groups

Method-focused

Daily Work:DFT optimization → reaction paths → spectroscopy prediction → stability comparison

Core Tools:Gaussian / ORCA / Psi4 + Python scripts + Avogadro

Molecular Modeler / MD Researcher

Pharma CADD teams, academic biophysics

Structure-focused

Daily Work:Build MD systems → run simulations → analyze trajectories → FEP calculations

Core Tools:GROMACS / AMBER / OpenMM + MDAnalysis + VMD / PyMOL

Cheminformatician / CADD Scientist

Pharma CADD, AI drug discovery companies

Application-focused

Daily Work:Virtual screening → QSAR/QSPR modeling → ADMET prediction → compound library analysis

Core Tools:RDKit + scikit-learn / XGBoost + AutoDock Vina + pandas / Jupyter

Salary Reference (US Market)

$90K–$130K

Computational Chemist (PhD entry)

$140K–$200K

Senior CADD Scientist (5-10 yrs)

$200K–$350K

Computational Chemistry Director

$80K–$120K

Cheminformatician (MS entry)

Part 5

Why Computational Chemistry Fits Agents' Last Exam

Advantages and challenges that make computational chemistry an ideal domain for AI agent benchmarking.

💻

100% Computer-Based

All computational chemistry work happens on computers—no physical lab needed

🔓

Mature Open-Source Tools

Psi4, GROMACS, RDKit, AutoDock Vina, ASKCOS cover all core workflows

✓

Deterministic Outputs

Same input → same energy value, descriptors, conformations (perfectly reproducible)

📊

Abundant Public Data

QM9 (134K), GEOM (37M), PubChem (111M), ChEMBL (20M+ data points)

🎯

Established Validation

Gold-standard datasets and benchmarking protocols exist

💰

High Industry Value

Virtual screening replaces $50K synthesis costs with $10 computation

Challenges (Where Agents Add Value)

Workflow Fragmentation

Completing a task requires chaining 3-5 different tools with manual format conversions

Parameter Selection

DFT functional? MD force field? Docking box size? Wrong parameters → wrong results

Result Validation

A calculation "finishing" doesn't mean "correct"—need to check convergence, frequencies, physical reasonableness

Strategy: The fragmentation and parameter complexity are exactly where agents can add value through automated orchestration and knowledge application.

Part 6

Five Core Workflows

Five workflows cover the three main computational chemistry roles (quantum chemist / molecular modeler / cheminformatician) plus one pure-computer experimental workflow for industry breadth.

Selection Principle

Start from "what best reflects real industry work logic":

#	Workflow	Represents	Role	Data Scale
1	DFT Optimization	Most fundamental operation	Quantum Chemist	★★★★★
2	Molecular Docking	Core pharma deliverable	CADD Scientist	★★★★☆
3	MD Simulation	Most complex operation	Molecular Modeler	★★★☆☆
4	QSAR Modeling	Data-driven logic	Cheminformatician	★★★★★
5	Retrosynthesis	Experimental-side computer task	Synthetic Chemist	★★★★☆

Workflow Details (Click to Expand)

Quantum ChemistryCoreMillion-level (QM9, ANI)

DFT Geometry Optimization + Property Calculation

This is the most fundamental, highest-frequency operation in computational chemistry. Whether in pharma, chemical companies, or academia, the first thing a computational chemist does is "optimize molecular structure, calculate energy and properties."

CADDCoreHundred-thousand-level (PDBbind, DUD-E)

Molecular Docking

Molecular docking is the core deliverable of computational teams in pharma. A CADD scientist's value is demonstrated by "picking the Top 1000 most likely binders from 1 million compounds." Docking results directly influence the med-chem team's synthesis decisions.

Molecular ModelingCoreThousand-level (expandable)

Molecular Dynamics Simulation

MD is the most complex routine operation in computational chemistry—a 5–10 step pipeline (system setup → energy minimization → heating → equilibration → production → analysis). This makes MD the best workflow for testing agent multi-step software orchestration ability.

CheminformaticsCoreMillion-level (MoleculeNet, TDC, ChEMBL)

QSAR Model Building and Prediction

QSAR represents the data-driven work logic in chemistry. A cheminformatician's daily work is "receive activity data → compute descriptors → train model → predict new molecules." This pipeline involves RDKit + ML framework combination—a typical example of "domain tool + general tool collaboration."

Synthetic ChemistryCoreTen-thousand-level (USPTO, Organic Syntheses)

Retrosynthesis Planning

"How to synthesize this molecule" is the ultimate question in organic chemistry and the core thinking activity of synthetic chemists—the largest workforce group in chemistry. Recent AI retrosynthesis tools (ASKCOS, AiZynthFinder) have made this workflow amenable to software assistance.

Part 7

Review Agent Architecture

A two-layer validation system enables automated, reproducible evaluation across all five workflows.

Two-Layer Validation

Layer 1: Deterministic Rules (100% Automated)

DFTOutput file complete · SCF convergence · gradient threshold · frequencies

DockingPDBQT format valid · binding energy present · pose count

MDTrajectory frames · no fatal errors · energy/temp/pressure parseable

QSARDescriptors no NaN/Inf · model loadable · CV metrics correct

RetrosynthesisAtom conservation · SMILES valid · commercial catalogs

Layer 2: Reference Comparison (95% Automated)

DFTEnergy < 0.001 Hartree; RMSD < 0.1 Å60%

DockingPose RMSD < 2.0 Å vs crystal70%

MDRMSD plateau < 3 Å; energy drift < 0.01 kJ/mol/ps50%

QSARR²/RMSE vs MoleculeNet/TDC leaderboard70%

RetrosynthesisRoute match; starting material availability60%

Key Data Resources

Resource	Scale	Use	Access
QM9	134K molecules	DFT gold standard	CC0
PDBbind	23K complexes	Docking gold standard	Academic
GROMACS tutorials	Dozens	MD templates	Free
MoleculeNet	700K+	QSAR gold standard	DeepChem
USPTO	3M reactions	Retrosynthesis reference	Public
PubChem	111M compounds	Universal compound source	NCBI
ChEMBL	20M+ bioactivities	Activity data	EBI

Core Tools & Infrastructure

Psi4ORCAGROMACSAutoDock VinaRDKitOpenBabelASKCOSscikit-learnMDAnalysis

Contribute to Chemistry

We seek high-level, representative contributions—not exhaustive documentation. Share your expertise in any of these areas:

Submit Landscape Understanding

Help us map roles, workflows, and tools in computational chemistry. Share your perspective on the industry structure.

Submit a Workflow

Describe a specific professional task with tools, inputs, outputs, and how success is verified.

Our Commitments to Contributors

Evaluation Only: All contributions are used exclusively for agent evaluation, never for model training.
Partner Review: Industry partners can review and approve task specifications before public release.
Data Control: Contributors can exclude sensitive or proprietary data from submissions.