🔬

ActiveXinyang Han (UCB)Jiaming Zheng (MIT)Lu Li (USC)Yiyou Sun (UCB)

Materials Science

Part 1

Overview

Materials science work follows a continuously iterated closed loop that operates across a value chain from discovery to qualification. Understanding this landscape is essential for identifying where AI agents can contribute.

The Core Loop

The most stable, general abstraction of materials work is a continuously iterated closed loop.

Core Loop

Iterate

Design

Define target, plan computations/experiments to validate

Make

Synthesize/grow/fabricate the material

Measure

Characterize sample, transform raw data into metrics

Decide

Compare evidence to goal, choose next action

Design

Define target, plan computations/experiments to validate

Make

Synthesize/grow/fabricate the material

Measure

Characterize sample, transform raw data into metrics

Decide

Compare evidence to goal, choose next action

Repeat

Core Value Chain

A stage-based view that makes clear where "computer-completable" work emerges—not just in simulation, but also in measurement analysis and qualification documentation.

Stage-based View

Digital

Mixed

Physical

Discovery

Mostly on computers (literature, databases, simulation setup)

Synthesis

Mostly physical (furnaces/reactors) with digital logging

Processing

Mixed: physical fabrication + digital design/simulation

Characterization

Instrument → computer: acquisition physical, interpretation digital

Scale-up

Physical production + QA systems + documentation

Discovery / Design

Digital

Mostly on computers (literature, databases, simulation setup)

Synthesis / Growth

Physical

Mostly physical (furnaces/reactors) with digital logging

Processing / Integration

Mixed

Mixed: physical fabrication + digital design/simulation

Characterization / Testing

Mixed

Instrument → computer: acquisition physical, interpretation digital

Scale-up / Qualification

Mixed

Physical production + QA systems + documentation

Key insight: Even when the pipeline includes physical steps, the highest-leverage, most repeatable work often lives in the digital analysis + deliverable packaging layers.

The Three-Board Model

How teams actually work in parallel and hand off artifacts. Progress is driven by moving evidence between Compute, Make, and Measure, plus extensions for product settings.

Collaboration View

Compute

Simulate/model; generate predictions and derived quantities

Digital Artifacts

Structure filesInput decksJob scriptsPostprocessing notebooksPlots/tables

Often computer-completable and scorable (QC, extraction, reporting)

Make

Synthesize/process; realize candidates physically

Digital Artifacts

Protocol draftsRun logsStructured ELN records

Hard to benchmark directly; adjacent digital support tasks feasible

Measure

Instrument → data → interpretation; produce evidence

Digital Artifacts

Raw instrument filesQC flagsExtracted measurementsAnnotated plots

Analysis/reporting layers are repeatable; strong candidates

Compute

Simulate/model; generate predictions and derived quantities

Digital Artifacts

Structure filesInput decksJob scriptsPostprocessing notebooksPlots/tables

Often computer-completable and scorable (QC, extraction, reporting)

Make

Synthesize/process; realize candidates physically

Digital Artifacts

Protocol draftsRun logsStructured ELN records

Hard to benchmark directly; adjacent digital support tasks feasible

Measure

Instrument → data → interpretation; produce evidence

Digital Artifacts

Raw instrument filesQC flagsExtracted measurementsAnnotated plots

Analysis/reporting layers are repeatable; strong candidates

computemake: Candidates + how to attempt

makemeasure: Sample + provenance

measurecompute: Evidence that changes next loop

Product Extensions(appear in product settings)

Integrate

CAD/CAE/CAM; compare artifacts to design intent

Digital Artifacts

CAD modelsReconstructionsSegmentation masks

Geometry- and metric-based ground truth; often benchmarkable

Qualify

Standards, reliability, QA; ship-ready evidence

Digital Artifacts

SOPsQC dashboardsQualification packs

Documentation-heavy and verifiable; data often private

Digital Maturity:

High (Compute)

Medium (Measure)

Low (Make)

Part 2

Where LLM Agents Fit

Analysis of roles and representative workflows in materials science, with focus on where AI agents can reliably contribute to execution and evidence packaging.

Part 3

Example Tasks

Benchmarkable tasks that evaluate end-to-end execution on a computer: given raw files and constraints, the agent must use real tools to produce deliverables that a reviewer can verify.

Core Tasks (3)

Compute

DFT Run-Directory QC + Report Packaging

A computational researcher needs to quickly determine whether a run is trustworthy and produce submission-ready plots and structured summary.

Measure

XRD Phase Identification

A characterization scientist needs to quickly produce the most likely phases with evidence, packaged as a reviewable deliverable.

Integrate

CT Segmentation + Compare-to-CAD

Engineering team needs to verify whether processed geometry matches design intent (porosity, defects, dimensional deviation).

Recommended Tool Stack

Common Python tooling for data handling and visualization, domain-aligned parsers for computational outputs, and scriptable imaging/volume toolkits.

Contribute to Materials Science

We seek high-level, representative contributions—not exhaustive documentation. Share your expertise in any of these areas:

Submit Landscape Understanding

Help us map sectors, roles, tasks, and tools in materials science. Share your perspective on the industry structure.

Submit a Workflow

Describe a specific professional task with tools, inputs, outputs, and how success is verified.

Our Commitments to Contributors

Evaluation Only: All contributions are used exclusively for agent evaluation, never for model training.
Partner Review: Industry partners can review and approve task specifications before public release.
Data Control: Contributors can exclude sensitive or proprietary data from submissions.