Foundation Models Meet Embodied Agents

@ NeurIPS 2025 - EAI Challenge

December 6-7, 2025

San Diego Convention Center, San Diego, California


Embodied Agent Interface Challenge @ NeurIPS 2025

This challenge invites participants to enhance Large Language Models (LLMs) for embodied reasoning through our standardized Embodied Agent Interface evaluation protocol. The framework systematically evaluates critical embodied reasoning capabilities: goal interpretation (understanding objectives and grounding to environment states), subgoal decomposition (breaking down complex goals), action sequencing (planning action sequences), and transition/world modeling (modeling world state changes through actions). Despite growing interest in using LLMs for robotics and agent planning, current evaluations lack standardization and fail to pinpoint fine-grained reasoning failures.

Our challenge builds a unified evaluation to task formulation, input/output structures, and evaluation metrics by utilizing the well-established BEHAVIOR and VirtualHome simulators, enhanced with detailed annotations including Linear Temporal Logic (LTL) goal specifications and comprehensive error analysis. Unlike typical evaluations that just report an overall success rate, leaving us in the dark about which specific abilities LLMs struggle with, our framework uses fine-grained metrics that examine both whether proposed actions could actually work in practice and if they truly accomplish the intended goals. Our evaluation system breaks down each reasoning component into separate modules, giving us a clear picture of exactly where and how these models succeed or fail.

Baseline results from state-of-the-art LLMs reveal significant performance gaps and motivate further innovation. The competition aims to advance the understanding of how LLMs reason in embodied environments, promote the development of robust and interpretable AI agents, and foster collaboration between the language modeling and robotics communities.

πŸ”—More information on our official competition website

πŸ“’ Announcements πŸ“’

June 30, 2025 - The beta test for our competition platform, Eval AI, is now underway! Get ready for the official launch and public registration in late July or early August. Please stay tuned for more updates!

May 25, 2025 - We are thrilled to announce that our Embodied Agent Interface Challenge has been officially accepted by the NeurIPS 2025 Competition Track! Get ready for an amazing event!

May 16, 2025 - A huge thank you to Adobe Research for their generous $1000 in support for our challenge! This contribution is invaluable to our community.

May 1, 2025 - We are so excited to have secured $4000 in support from AIX! This will greatly help in making the EAI Challenge a huge success.

Competition Timeline

Date Task Description
Apr - May 2025 Finalize Eval AI settings and data annotations. Conduct internal validation of baseline metrics and evaluation scripts.
Jun 2025 Launch beta test of the competition platform Eval AI, including starting kit, public leaderboard, and evaluation API.
Late Jul - Early Aug 2025 Official launch of the competition: open public registration for EvalAI and release training datasets, starter-kit, documentation, and baselines.
Aug - Early Nov 2025 Development phase. Participants work on the tasks, submit results to the leaderboard, and iterate on their models.
Mid Nov 2025 Evaluation phase. Freeze public leaderboard for development phase. Open final submission phase using hidden held-out set.
Late Nov 2025 Organizers rerun top submissions to verify results. Aggregate final scores and finalize leaderboard rankings.
Dec 2025 Submit competition report and winning team write-ups to be included in NeurIPS 2025 proceedings.

πŸ† Prizes & Incentives

The Embodied Agent Interface Challenge at NeurIPS 2025 offers a range of exciting prizes and incentives to recognize outstanding contributions and foster inclusive participation across the community.

πŸ₯‡ Podium Recognition at NeurIPS 2025

The top three teams will be invited to:

  • Present their work in a dedicated podium session at the NeurIPS 2025 Competition Track.
  • Showcase their models and results to a global audience of researchers, practitioners, and industry leaders.

πŸ“ Co-Authorship Opportunities

Finalist teams who submit reproducible results and technical reports will be offered:

  • Named authorship on the official competition summary paper submitted to the NeurIPS proceedings.
  • Opportunities to contribute to documentation, analysis, and best practices shared with the research community.

πŸ’° Monetary Prizes

  • πŸ₯‡ First Place: $1000
  • πŸ₯ˆ Second Place: $500
  • πŸ₯‰ Third Place: $300
  • πŸ’‘ Most Innovative Approach: $200

All winners will receive certificates in addition to their cash awards.

πŸŽ–οΈ Honorable Mentions

We will also recognize notable contributions in the following categories:

  • 🧠 Best Goal Interpretation Module
  • 🧩 Best Subgoal Decomposition Strategy
  • 🦾 Best Action Sequencing Policy
  • βš™οΈ Best Transition Modeling Logic

Organizers

Organizer Committee @ NeurIPS 2025 - EAI Challenge

Avatar

Manling Li

Northwestern University

Avatar

Ruohan Zhang

Stanford University

Avatar

Weiyu Liu

Stanford University

Avatar

Qineng Wang

Northwestern University

Avatar

Kangrui Wang

Northwestern University

Avatar

Tianwei Bao

Northwestern University

Steering Committee @ NeurIPS 2025 - EAI Challenge

Avatar

Jiajun Wu

Stanford University

Avatar

Fei-Fei Li

Stanford University

Avatar

Yejin Choi

NVIDIA, Stanford University

Avatar

Percy Liang

Stanford University

Avatar

Li Erran Li

Amazon, Columbia University

Sponsors

Avatar

AIX

Contact

Please email TianweiBao@u.northwestern.edu if you have any questions.