🤖

Under Review

Robotics

From real-world scans to simulation-ready assets—tool-driven 3D content preparation for robot training.

Part 1

Robotics Overview

Modern robotics increasingly tries to make robots learn to manipulate objects by learning from demonstrations and training policies in digital simulation.

The Core Pipeline: Real World → Digital Simulation → Robot Training

Robotics teams often follow a loop like:

Real objects / scenes → 3D capture (scan) → asset cleanup & articulation → simulation-ready assets → robot policy training & evaluation

Why Simulation is Essential

Simulation offers repeatability and scale that the real world cannot:

Safety & Cost

Crash the robot in sim and reset instantly; no repairs.

Parallelism

Train hundreds/thousands of environments in parallel.

Data Generation

Synthesize unlimited rollouts and corner cases.

Risky Behaviors

Test dangerous actions safely.

Key Bottleneck: 3D Assets are Still Labor-Intensive

Across recent robotics/simulation literature (e.g., Seed3D, URDFormer, Real2Render2Real), the recurring issue is the content bottleneck—high-quality, compliant assets are expensive to build and maintain.

"building simulation models is often still done by hand … with predefined assets to construct rich scenes"— URDFormer (RSS 2024)

"generating high-quality, compliant, and intersection-free assets for simulation remains labor-intensive"— Real2Render2Real (2025)

Part 2

Digitization Pipeline & Automation Maturity

Understanding where manual GUI work is still required versus what can be fully automated.

End-to-End Pipeline (Asset-Centric View)

Scan

Repair

Rig

Collision

Sim

Physical Object → 3D Scan

Capture real objects using mobile scanners, photogrammetry, or structured light.

Mesh Cleanup / Repair

Fix holes, noise, and non-manifold geometry. Manual work in Blender.

Articulation / Rigging

Define joints, bones, and motion limits. Manual setup in Blender.

Collision (Convex Decomposition)

Generate collision meshes. Highly automated via CoACD, V-HACD.

Simulation Testing

Test in PyBullet, MuJoCo, or other simulators. Scriptable.

Manual GUI (bottleneck)

Highly automated

Automation Maturity by Step

Step	Typical Tools	Automation	GUI?
3D Scanning	Mobile scanner apps, RealityScan	High	No
Mesh Cleanup / Repair	Blender (manual edit/sculpt/modifiers)	Low	Yes
Articulation / Rigging	Blender Armature / manual joint setup	Low	Yes
Convex Decomposition	CoACD, V-HACD	High	No
Simulation Testing	PyBullet, MuJoCo	High	No

3D Scanning

Mobile scanner apps, RealityScan

Automation:High

Mesh Cleanup / Repair

GUI

Blender (manual edit/sculpt/modifiers)

Automation:Low

Articulation / Rigging

GUI

Blender Armature / manual joint setup

Automation:Low

Convex Decomposition

CoACD, V-HACD

Automation:High

Simulation Testing

PyBullet, MuJoCo

Automation:High

Key Finding

What many people intuitively label as "simulation readiness" often overweights convex decomposition, but collision generation is already highly automated (CoACD, V-HACD). The true manual bottlenecks are mesh cleanup/repair and articulation rigging.

Limits of Existing Automation (Why GUI Work Remains)

URDFormer-style "image → URDF" systems

•Depend heavily on bounding box / detection quality
•Often rely on predefined meshes that don't match real scenes
•Don't reliably infer physical parameters (mass, friction)
•Still need human GUI correction for key geometry / boundaries

Auto-retopology tools (Quadriflow, Instant Meshes)

•Unstable on complex geometry
•Still require manual cleanup around critical regions (e.g., joints)
•Common caveat: "may still need to clean up areas manually"

Part 3

Where LLM Agents Fit (Agents' Last Exam Lens)

What makes a good benchmark task for robotics asset preparation workflows.

Agents' Last Exam Requirements

1Must use real software

Task requires GUI/tool operation (not pure coding).

2Raw Input → Raw Output

Clear file-level I/O definition.

3Scorable / reviewable

Acceptance criteria can be checked automatically or semi-automatically.

4Scalable data

Millions (ideal) or at least easy-to-collect ongoing data.

What is NOT a Good Fit (v1)

Pure scripting

Current LLMs already do well; not distinctive for Agents' Last Exam.

Already-automated steps

E.g., convex decomposition via CoACD CLI.

Physical-world dependence

Tasks requiring real scanning hardware or lab setups.

Best-Fit Robotics Tasks (High Pain + GUI + Verifiable)

1Mesh repair / 3D scan cleanup(holes, noise, non-manifold geometry)

2Articulation rigging(joints/bones, motion limits, collision sanity)

3(Optional / later) UV unwrap + texture baking(useful, but less central to sim physics)

Part 4

Example Tasks (Benchmark Definitions)

These tasks are defined in Agents' Last Exam style: end-to-end execution on a computer with real tools, producing artifacts that a reviewer can verify.

Design Principles (Same Across Industries)

1Must use software

Tasks require tools/software; pure reasoning is not acceptable.

2Raw Input → Raw Output

Define only I/O and acceptance criteria.

3Operational scoring

Prefer deterministic checks.

4Scalable data

Use public datasets or synthetic corruption pipelines.

5Avoid physical dependence

Keep the benchmark computer-completable.

Core Tasks (2)

CoreMillion-level

3D Mesh Repair / Scan Cleanup

Every scanned object typically needs this step; manual work is commonly 2–8 hours per model.

CoreThousand-level (expandable)

Articulation Rigging

Manipulable objects (drawers, doors, tools) require correct joint definitions; existing automation is narrow and still needs human correction.

Summary

Task	Input	Output	Software	Scale
Mesh repair / scan cleanup	Damaged .obj/.glb	Repaired .obj/.glb	Blender	Millions
Articulation rigging	Static .obj + joint text	Articulated .glb/.urdf	Blender	Thousands+

Why these two: They require GUI operation (not solvable by a single CLI/script), are real production bottlenecks for simulation content, have scalable data strategies, and have clear verification hooks.

Part 5

Workflow Collection Guide

What we need from robotics contributors to build high-quality benchmarks.

Required Deliverables (Per Task)

Deliverable	Description	Quantity
Sample input files	Cover easy / medium / hard	5–10
Corresponding outputs	Ground-truth outputs	5–10
Screen recordings	Full workflow + "think-aloud" explanation	2–3
Evaluation script	Python script to check outputs	1

Screen Recording Guidelines

Use OBS or similar

Show the full process: open → operate → validate → export/save

Narrate intent and diagnosis

"There's a hole here; I'll select the boundary loop and fill it…"

Include mistakes and corrections

Valuable for training and evaluation

Data Sources (Public)

Mesh Repair

Objaverse (10M+ assets; watertight subset can be million-scale)

Articulation

PartNet-Mobility (2,000+ articulated objects; expandable)

Suggested Timeline

Phase 15 samples per task + evaluation script1 week

Phase 2Screen recordings + hard corner cases1 week

Phase 3Large-scale pair generation pipeline1 week

Professional Tools in Robotics

BlenderMuJoCoPyBulletROSCoACD / V-HACD

Contribute to Robotics

We seek high-level, representative contributions—not exhaustive documentation. Share your expertise in any of these areas:

Submit Landscape Understanding

Help us map subfields, tools, and workflows in robotics simulation. Share your perspective on the domain structure.

Submit a Workflow

Describe a specific professional task with tools, inputs, outputs, and how success is verified.

Our Commitments to Contributors

Evaluation Only: All contributions are used exclusively for agent evaluation, never for model training.
Partner Review: Industry partners can review and approve task specifications before public release.
Data Control: Contributors can exclude sensitive or proprietary data from submissions.