Foundation Models Meet Embodied Agents

@ NeurIPS 2025 - EAI Challenge

December 6-7, 2025

San Diego Convention Center, San Diego, California

Embodied Agent Interface Challenge @ NeurIPS 2025

This challenge invites participants to enhance Large Language Models (LLMs) for embodied reasoning through our standardized Embodied Agent Interface evaluation protocol. The framework systematically evaluates critical embodied reasoning capabilities: goal interpretation (understanding objectives and grounding to environment states), subgoal decomposition (breaking down complex goals), action sequencing (planning action sequences), and transition/world modeling (modeling world state changes through actions). Despite growing interest in using LLMs for robotics and agent planning, current evaluations lack standardization and fail to pinpoint fine-grained reasoning failures.

Our challenge builds a unified evaluation to task formulation, input/output structures, and evaluation metrics by utilizing the well-established BEHAVIOR and VirtualHome simulators, enhanced with detailed annotations including Linear Temporal Logic (LTL) goal specifications and comprehensive error analysis. Unlike typical evaluations that just report an overall success rate, leaving us in the dark about which specific abilities LLMs struggle with, our framework uses fine-grained metrics that examine both whether proposed actions could actually work in practice and if they truly accomplish the intended goals. Our evaluation system breaks down each reasoning component into separate modules, giving us a clear picture of exactly where and how these models succeed or fail.

Baseline results from state-of-the-art LLMs reveal significant performance gaps and motivate further innovation. The competition aims to advance the understanding of how LLMs reason in embodied environments, promote the development of robust and interpretable AI agents, and foster collaboration between the language modeling and robotics communities.

🔗More information on our official competition website

📢 Latest Announcements

Stay updated with the latest news and updates from the EAI Challenge

September 1 2025

🤝 EAI x BEHAVIOR: Co-Hosted at NeurIPS!

We're thrilled to announce that the BEHAVIOR Challenge is joining forces with the Embodied Agent Interface Challenge at this year's NeurIPS Competition Track. Two challenges, one stage — bringing richer benchmarks, diverse tasks, and a united embodied AI community. Learn more about the BEHAVIOR Challenge here!

August 15 2025

🚀 EAI Challenge Launch!

The EAI Challenge officially kicks off at 12:00 PM (CDT)! We are thrilled to welcome all participants and can't wait to see your innovative solutions. Check out our challenge on EvalAI and the Participate section for all the details and resources you need to get started. Good luck to everyone!

June 30 2025

🧪 Beta Testing Phase

The beta test for our competition platform, Eval AI, is now underway! Get ready for the official launch and public registration in late July or early August. Please stay tuned for more updates!

May 25 2025

🎉 Official Acceptance!

We are thrilled to announce that our Embodied Agent Interface Challenge has been officially accepted by the NeurIPS 2025 Competition Track! Get ready for an amazing event!

May 16 2025

🙏 Adobe Research Support

A huge thank you to Adobe Research for their generous $1000 in support for our challenge! This contribution is invaluable to our community.

May 1 2025

💰 AIX Sponsorship

We are so excited to have secured $4000 in support from AIX! This will greatly help in making the EAI Challenge a huge success.

Competition Timeline

Date	Task Description
Apr - May 2025	~~Finalize Eval AI settings and data annotations. Conduct internal validation of baseline metrics and evaluation scripts.~~
Jun 2025	~~Launch beta test of the competition platform Eval AI, including starting kit, public leaderboard, and evaluation API.~~
Late Jul - Early Aug 2025	~~Official launch of the competition: open public registration for EvalAI and release training datasets, starter-kit, documentation, and baselines.~~
Aug - Early Nov 2025	Development phase. Participants work on the tasks, submit results to the leaderboard, and iterate on their models.
Mid Nov 2025	Evaluation phase. Freeze public leaderboard for development phase. Open final submission phase using hidden held-out set.
Late Nov 2025	Organizers rerun top submissions to verify results. Aggregate final scores and finalize leaderboard rankings.
Dec 2025	Winning teams announced in NeurIPS 2025 Competition Track in-person session.

🏆 Prizes & Incentives

The Embodied Agent Interface Challenge at NeurIPS 2025 offers a range of exciting prizes and incentives to recognize outstanding contributions and foster inclusive participation across the community.

🥇 Podium Recognition at NeurIPS 2025

The top three teams will be invited to:

Present their work in a dedicated podium session at the NeurIPS 2025 Competition Track.
Showcase their models and results to a global audience of researchers, practitioners, and industry leaders.

📝 Co-Authorship Opportunities

Finalist teams who submit reproducible results and technical reports will be offered:

Named authorship on the official competition summary paper submitted to the NeurIPS proceedings.
Opportunities to contribute to documentation, analysis, and best practices shared with the research community.