Foundation Models Meet Embodied Agents
@ NeurIPS 2025 - EAI Challenge
Sun Dec 07 11:00 - 13:45 PST, 2025
Mezzanine Room 15AB, San Diego Convention Center
San Diego, California
This challenge invites participants to enhance Large Language Models (LLMs) for embodied reasoning through our standardized Embodied Agent Interface evaluation protocol. The framework systematically evaluates critical embodied reasoning capabilities: goal interpretation (understanding objectives and grounding to environment states), subgoal decomposition (breaking down complex goals), action sequencing (planning action sequences), and transition/world modeling (modeling world state changes through actions). Despite growing interest in using LLMs for robotics and agent planning, current evaluations lack standardization and fail to pinpoint fine-grained reasoning failures.
Our challenge builds a unified evaluation to task formulation, input/output structures, and evaluation metrics by utilizing the well-established BEHAVIOR and VirtualHome simulators, enhanced with detailed annotations including Linear Temporal Logic (LTL) goal specifications and comprehensive error analysis. Unlike typical evaluations that just report an overall success rate, leaving us in the dark about which specific abilities LLMs struggle with, our framework uses fine-grained metrics that examine both whether proposed actions could actually work in practice and if they truly accomplish the intended goals. Our evaluation system breaks down each reasoning component into separate modules, giving us a clear picture of exactly where and how these models succeed or fail.
Baseline results from state-of-the-art LLMs reveal significant performance gaps and motivate further innovation. The competition aims to advance the understanding of how LLMs reason in embodied environments, promote the development of robust and interpretable AI agents, and foster collaboration between the language modeling and robotics communities.
πMore information on our official competition website
Stay updated with the latest news and updates from the EAI Challenge
The final evaluation phase officially starts at 12:00AM UTC-0! Please follow the instructions in the updated Participate page and Technical Report page to submit your final model outputs and technical report before the deadline on December 1, 2025 12:00AM UTC-0. Good luck to all participants!
We're thrilled to announce that the BEHAVIOR Challenge is joining forces with the Embodied Agent Interface Challenge at this year's NeurIPS Competition Track. Two challenges, one stage β bringing richer benchmarks, diverse tasks, and a united embodied AI community. Learn more about the BEHAVIOR Challenge here!
The EAI Challenge officially kicks off at 12:00 PM (CDT)! We are thrilled to welcome all participants and can't wait to see your innovative solutions. Check out our challenge on EvalAI and the Participate section for all the details and resources you need to get started. Good luck to everyone!
The beta test for our competition platform, Eval AI, is now underway! Get ready for the official launch and public registration in late July or early August. Please stay tuned for more updates!
We are thrilled to announce that our Embodied Agent Interface Challenge has been officially accepted by the NeurIPS 2025 Competition Track! Get ready for an amazing event!
We are so excited to have secured $4000 in support from AIX! This will greatly help in making the EAI Challenge a huge success.
| Date | Task Description |
|---|---|
| Apr - May 2025 | |
| Jun 2025 | |
| Late Jul - Early Aug 2025 | |
| Aug - Early Nov 2025 | |
| Early Nov - Dec 2025 | Final evaluation phase. Freeze public leaderboard for development phase. Open final submission phase using holdout set. |
| Dec 7th 2025 | Winning teams invited to present in NeurIPS 2025 Competition Track in-person session. |
The Embodied Agent Interface Challenge at NeurIPS 2025 offers a range of exciting prizes and incentives to recognize outstanding contributions and foster inclusive participation across the community.
The top three teams will be invited to:
Finalist teams who submit reproducible results and technical reports will be offered:
All winners will receive certificates in addition to their cash awards.
We will also recognize notable contributions in the following categories:
Join our Slack workspace to communicate with other participants and stay updated with the latest announcements.