Foundation Models Meet Embodied Agents

@ 2025 BEHAVIOR Challenge

December 6-7, 2025

San Diego Convention Center, San Diego, California


2025 BEHAVIOR Challenge

Robots in the BEHAVIOR simulator perform everyday activities (like preparing food) in virtual home environments. BEHAVIOR (Benchmark for Everyday Household Activities in Virtual, Interactive, and Realistic environments) is a large-scale embodied AI benchmark with 1,000 defined household tasks grounded in real human needs. These tasks introduce long-horizon mobile manipulation challenges in realistic settings, bridging the gap between current research and real-world, human-centric applications.

Even state-of-the-art robot learning models still struggle with the complexity and extended duration of BEHAVIOR's activities, which is why we are thrilled to announce the 1st BEHAVIOR Challenge at NeurIPS 2025. This competition invites the community to tackle 50 full-length tasks in a realistic simulator โ€” pushing the frontiers of both high-level planning and low-level control in house-scale environments.

Participants will need to make progress on hierarchical planning, robust perception under realistic visual conditions, and reliable manipulation across long-horizon episodes. By focusing on full-length, human-scale household tasks, the challenge aims to surface the practical limitations of current methods and drive advances that matter for real-world robot deployments.

๐Ÿ”—More information on the official 2025 BEHAVIOR challenge website.

๐Ÿงฉ Challenge Components

๐Ÿ“‹ Task Definitions

The benchmark includes 1,000 everyday household activities covering diverse behaviors across:

  • Rearrangement - organizing and placing objects
  • Cleaning/Wiping - maintaining cleanliness
  • Cooking/Freezing - food preparation and storage
  • Painting/Spraying - surface treatment tasks
  • Hanging/Installing - mounting and assembly
  • Slicing/Dicing - precise cutting operations
  • Baking - complex cooking procedures
  • Doing Laundry - textile care activities

๐Ÿ  Interactive Environments

50 fully interactive scenes with house-scale layouts

10,000+ richly annotated objects

๐ŸŽฎ OmniGibson Simulator

The simulation environment supports:

  • Rigid body physics - realistic object interactions
  • Deformable objects (cloth, fabric) - soft body dynamics
  • Fluid interactions (water, oils) - liquid simulation
  • Object semantic states (e.g., open, filled, on-top, inside, etc.) - rich state representation

โš–๏ธ Data and Baselines

๐Ÿ“š Dataset

The benchmark includes 10,000 human-demonstrated trajectories with diverse behaviors across all task categories. Each demonstration contains:

  • Synchronized RGBD observations - multi-modal visual data
  • Object and part-level segmentation masks - precise object identification
  • Ground-truth object states - semantic state annotations
  • Robot proprioception - internal sensor data
  • Robot actions - complete action sequences
  • Skill and subtask annotations - hierarchical task decomposition

๐Ÿค– Available Baseline Methods

Participants have access to training and evaluation pipelines for these baseline methods:

  • ACT - Action Chunking Transformer
  • Diffusion Policy - Diffusion-based control
  • BC-RNN - Behavioral cloning with RNNs
  • WB-VIMA - Multimodal imitation learning
  • OpenVLA - Vision-language-action models
  • ฯ€0 - Foundation policy models

๐Ÿ“Š Evaluation

๐Ÿ“ˆ Metrics

Agents are evaluated across three areas:

  • Task completion rate (primary metric): Fraction of satisfied predicates in the goal condition of BDDL (BEHAVIOR Domain Definition Language) task definition
  • Agent efficiency: Total distance traveled and energy expended during task execution
  • Data efficiency: Total number of frames from demonstrations (IL) or simulator (RL) used during training

๐Ÿ“‹ Reporting

  • Results are reported with 95% confidence intervals - ensuring statistical significance
  • Primary ranking based on task completion rate - main performance indicator
  • All metrics displayed on the leaderboard - comprehensive performance tracking
Contact

Please email behavior-contact@googlegroups.com if you have any questions.