Foundation Models Meet Embodied Agents
@ NeurIPS 2025 - EAI Challenge
December 6-7, 2025
San Diego Convention Center, San Diego, California
This challenge invites participants to enhance Large Language Models (LLMs) for embodied reasoning through our standardized Embodied Agent Interface evaluation protocol. The framework systematically evaluates critical embodied reasoning capabilities: goal interpretation (understanding objectives and grounding to environment states), subgoal decomposition (breaking down complex goals), action sequencing (planning action sequences), and transition/world modeling (modeling world state changes through actions). Despite growing interest in using LLMs for robotics and agent planning, current evaluations lack standardization and fail to pinpoint fine-grained reasoning failures.
Our challenge builds a unified evaluation to task formulation, input/output structures, and evaluation metrics by utilizing the well-established BEHAVIOR and VirtualHome simulators, enhanced with detailed annotations including Linear Temporal Logic (LTL) goal specifications and comprehensive error analysis. Unlike typical evaluations that just report an overall success rate, leaving us in the dark about which specific abilities LLMs struggle with, our framework uses fine-grained metrics that examine both whether proposed actions could actually work in practice and if they truly accomplish the intended goals. Our evaluation system breaks down each reasoning component into separate modules, giving us a clear picture of exactly where and how these models succeed or fail.
Baseline results from state-of-the-art LLMs reveal significant performance gaps and motivate further innovation. The competition aims to advance the understanding of how LLMs reason in embodied environments, promote the development of robust and interpretable AI agents, and foster collaboration between the language modeling and robotics communities.
πMore information on our official competition website
June 30, 2025 - The beta test for our competition platform, Eval AI, is now underway! Get ready for the official launch and public registration in late July or early August. Please stay tuned for more updates!
May 25, 2025 - We are thrilled to announce that our Embodied Agent Interface Challenge has been officially accepted by the NeurIPS 2025 Competition Track! Get ready for an amazing event!
May 16, 2025 - A huge thank you to Adobe Research for their generous $1000 in support for our challenge! This contribution is invaluable to our community.
May 1, 2025 - We are so excited to have secured $4000 in support from AIX! This will greatly help in making the EAI Challenge a huge success.
Date | Task Description |
---|---|
Apr - May 2025 | |
Jun 2025 | |
Late Jul - Early Aug 2025 | Official launch of the competition: open public registration for EvalAI and release training datasets, starter-kit, documentation, and baselines. |
Aug - Early Nov 2025 | Development phase. Participants work on the tasks, submit results to the leaderboard, and iterate on their models. |
Mid Nov 2025 | Evaluation phase. Freeze public leaderboard for development phase. Open final submission phase using hidden held-out set. |
Late Nov 2025 | Organizers rerun top submissions to verify results. Aggregate final scores and finalize leaderboard rankings. |
Dec 2025 | Submit competition report and winning team write-ups to be included in NeurIPS 2025 proceedings. |
The Embodied Agent Interface Challenge at NeurIPS 2025 offers a range of exciting prizes and incentives to recognize outstanding contributions and foster inclusive participation across the community.
The top three teams will be invited to:
Finalist teams who submit reproducible results and technical reports will be offered:
All winners will receive certificates in addition to their cash awards.
We will also recognize notable contributions in the following categories: