Foundation Models Meet Embodied Agents
@ NeurIPS 2025 - EAI Challenge
December 6-7, 2025
San Diego Convention Center, San Diego, California
This challenge invites participants to enhance Large Language Models (LLMs) for embodied reasoning through our standardized Embodied Agent Interface evaluation protocol. The framework systematically evaluates critical embodied reasoning capabilities: goal interpretation (understanding objectives and grounding to environment states), subgoal decomposition (breaking down complex goals), action sequencing (planning action sequences), and transition/world modeling (modeling world state changes through actions). Despite growing interest in using LLMs for robotics and agent planning, current evaluations lack standardization and fail to pinpoint fine-grained reasoning failures.
Our challenge builds a unified evaluation to task formulation, input/output structures, and evaluation metrics by utilizing the well-established BEHAVIOR and VirtualHome simulators, enhanced with detailed annotations including Linear Temporal Logic (LTL) goal specifications and comprehensive error analysis. Unlike typical evaluations that just report an overall success rate, leaving us in the dark about which specific abilities LLMs struggle with, our framework uses fine-grained metrics that examine both whether proposed actions could actually work in practice and if they truly accomplish the intended goals. Our evaluation system breaks down each reasoning component into separate modules, giving us a clear picture of exactly where and how these models succeed or fail.
Baseline results from state-of-the-art LLMs reveal significant performance gaps and motivate further innovation. The competition aims to advance the understanding of how LLMs reason in embodied environments, promote the development of robust and interpretable AI agents, and foster collaboration between the language modeling and robotics communities.
πMore information on our official competition website
Stay updated with the latest news and updates from the EAI Challenge
We're thrilled to announce that the BEHAVIOR Challenge is joining forces with the Embodied Agent Interface Challenge at this year's NeurIPS Competition Track. Two challenges, one stage β bringing richer benchmarks, diverse tasks, and a united embodied AI community. Learn more about the BEHAVIOR Challenge here!
The EAI Challenge officially kicks off at 12:00 PM (CDT)! We are thrilled to welcome all participants and can't wait to see your innovative solutions. Check out our challenge on EvalAI and the Participate section for all the details and resources you need to get started. Good luck to everyone!
The beta test for our competition platform, Eval AI, is now underway! Get ready for the official launch and public registration in late July or early August. Please stay tuned for more updates!
We are thrilled to announce that our Embodied Agent Interface Challenge has been officially accepted by the NeurIPS 2025 Competition Track! Get ready for an amazing event!
A huge thank you to Adobe Research for their generous $1000 in support for our challenge! This contribution is invaluable to our community.
We are so excited to have secured $4000 in support from AIX! This will greatly help in making the EAI Challenge a huge success.
Date | Task Description |
---|---|
Apr - May 2025 | |
Jun 2025 | |
Late Jul - Early Aug 2025 | |
Aug - Early Nov 2025 | Development phase. Participants work on the tasks, submit results to the leaderboard, and iterate on their models. |
Mid Nov 2025 | Evaluation phase. Freeze public leaderboard for development phase. Open final submission phase using hidden held-out set. |
Late Nov 2025 | Organizers rerun top submissions to verify results. Aggregate final scores and finalize leaderboard rankings. |
Dec 2025 | Winning teams announced in NeurIPS 2025 Competition Track in-person session. |
The Embodied Agent Interface Challenge at NeurIPS 2025 offers a range of exciting prizes and incentives to recognize outstanding contributions and foster inclusive participation across the community.
The top three teams will be invited to:
Finalist teams who submit reproducible results and technical reports will be offered:
All winners will receive certificates in addition to their cash awards.
We will also recognize notable contributions in the following categories:
Join our Slack workspace to communicate with other participants and stay updated with the latest announcements.