🤖
Under Review

Robotics

From real-world scans to simulation-ready assets—tool-driven 3D content preparation for robot training.

Contribute to Robotics
Part 1

Robotics Overview

Modern robotics increasingly tries to make robots learn to manipulate objects by learning from demonstrations and training policies in digital simulation.

The Core Pipeline: Real World → Digital Simulation → Robot Training

Robotics teams often follow a loop like:

Real objects / scenes → 3D capture (scan) → asset cleanup & articulation → simulation-ready assets → robot policy training & evaluation

Why Simulation is Essential

Simulation offers repeatability and scale that the real world cannot:

Safety & Cost

Crash the robot in sim and reset instantly; no repairs.

Parallelism

Train hundreds/thousands of environments in parallel.

Data Generation

Synthesize unlimited rollouts and corner cases.

Risky Behaviors

Test dangerous actions safely.

Key Bottleneck: 3D Assets are Still Labor-Intensive

Across recent robotics/simulation literature (e.g., Seed3D, URDFormer, Real2Render2Real), the recurring issue is the content bottleneck—high-quality, compliant assets are expensive to build and maintain.

"building simulation models is often still done by hand … with predefined assets to construct rich scenes"— URDFormer (RSS 2024)
"generating high-quality, compliant, and intersection-free assets for simulation remains labor-intensive"— Real2Render2Real (2025)
Part 2

Digitization Pipeline & Automation Maturity

Understanding where manual GUI work is still required versus what can be fully automated.

End-to-End Pipeline (Asset-Centric View)

Physical Object → 3D Scan

Capture real objects using mobile scanners, photogrammetry, or structured light.

Mesh Cleanup / Repair

Fix holes, noise, and non-manifold geometry. Manual work in Blender.

Articulation / Rigging

Define joints, bones, and motion limits. Manual setup in Blender.

Collision (Convex Decomposition)

Generate collision meshes. Highly automated via CoACD, V-HACD.

Simulation Testing

Test in PyBullet, MuJoCo, or other simulators. Scriptable.

Manual GUI (bottleneck)
Highly automated

Automation Maturity by Step

3D Scanning

Mobile scanner apps, RealityScan

Automation:High

Mesh Cleanup / Repair

GUI

Blender (manual edit/sculpt/modifiers)

Automation:Low

Articulation / Rigging

GUI

Blender Armature / manual joint setup

Automation:Low

Convex Decomposition

CoACD, V-HACD

Automation:High

Simulation Testing

PyBullet, MuJoCo

Automation:High

Key Finding

What many people intuitively label as "simulation readiness" often overweights convex decomposition, but collision generation is already highly automated (CoACD, V-HACD). The true manual bottlenecks are mesh cleanup/repair and articulation rigging.

Limits of Existing Automation (Why GUI Work Remains)

URDFormer-style "image → URDF" systems

  • •Depend heavily on bounding box / detection quality
  • •Often rely on predefined meshes that don't match real scenes
  • •Don't reliably infer physical parameters (mass, friction)
  • •Still need human GUI correction for key geometry / boundaries

Auto-retopology tools (Quadriflow, Instant Meshes)

  • •Unstable on complex geometry
  • •Still require manual cleanup around critical regions (e.g., joints)
  • •Common caveat: "may still need to clean up areas manually"
Part 3

Where LLM Agents Fit (AgentHLE Lens)

What makes a good benchmark task for robotics asset preparation workflows.

AgentHLE Requirements

1Must use real software

Task requires GUI/tool operation (not pure coding).

2Raw Input → Raw Output

Clear file-level I/O definition.

3Scorable / reviewable

Acceptance criteria can be checked automatically or semi-automatically.

4Scalable data

Millions (ideal) or at least easy-to-collect ongoing data.

What is NOT a Good Fit (v1)

Pure scripting

Current LLMs already do well; not distinctive for AgentHLE.

Already-automated steps

E.g., convex decomposition via CoACD CLI.

Physical-world dependence

Tasks requiring real scanning hardware or lab setups.

Best-Fit Robotics Tasks (High Pain + GUI + Verifiable)

1Mesh repair / 3D scan cleanup(holes, noise, non-manifold geometry)
2Articulation rigging(joints/bones, motion limits, collision sanity)
3(Optional / later) UV unwrap + texture baking(useful, but less central to sim physics)
Part 4

Example Tasks (Benchmark Definitions)

These tasks are defined in AgentHLE style: end-to-end execution on a computer with real tools, producing artifacts that a reviewer can verify.

Design Principles (Same Across Industries)

1Must use software

Tasks require tools/software; pure reasoning is not acceptable.

2Raw Input → Raw Output

Define only I/O and acceptance criteria.

3Operational scoring

Prefer deterministic checks.

4Scalable data

Use public datasets or synthetic corruption pipelines.

5Avoid physical dependence

Keep the benchmark computer-completable.

Core Tasks (2)

1
CoreMillion-level

3D Mesh Repair / Scan Cleanup

Every scanned object typically needs this step; manual work is commonly 2–8 hours per model.

2
CoreThousand-level (expandable)

Articulation Rigging

Manipulable objects (drawers, doors, tools) require correct joint definitions; existing automation is narrow and still needs human correction.

Summary

TaskInputOutputSoftwareScale
Mesh repair / scan cleanupDamaged .obj/.glbRepaired .obj/.glbBlenderMillions
Articulation riggingStatic .obj + joint textArticulated .glb/.urdfBlenderThousands+

Why these two: They require GUI operation (not solvable by a single CLI/script), are real production bottlenecks for simulation content, have scalable data strategies, and have clear verification hooks.

Part 5

Workflow Collection Guide

What we need from robotics contributors to build high-quality benchmarks.

Required Deliverables (Per Task)

DeliverableDescriptionQuantity
Sample input filesCover easy / medium / hard5–10
Corresponding outputsGround-truth outputs5–10
Screen recordingsFull workflow + "think-aloud" explanation2–3
Evaluation scriptPython script to check outputs1

Screen Recording Guidelines

Use OBS or similar

Show the full process: open → operate → validate → export/save

Narrate intent and diagnosis

"There's a hole here; I'll select the boundary loop and fill it…"

Include mistakes and corrections

Valuable for training and evaluation

Data Sources (Public)

Mesh Repair

Objaverse (10M+ assets; watertight subset can be million-scale)

Articulation

PartNet-Mobility (2,000+ articulated objects; expandable)

Suggested Timeline

Phase 15 samples per task + evaluation script1 week
Phase 2Screen recordings + hard corner cases1 week
Phase 3Large-scale pair generation pipeline1 week

Professional Tools in Robotics

BlenderMuJoCoPyBulletROSCoACD / V-HACD

Contribute to Robotics

We seek high-level, representative contributions—not exhaustive documentation. Share your expertise in any of these areas:

Our Commitments to Contributors

  • Evaluation Only: All contributions are used exclusively for agent evaluation, never for model training.
  • Partner Review: Industry partners can review and approve task specifications before public release.
  • Data Control: Contributors can exclude sensitive or proprietary data from submissions.