Foundation Models Meet Embodied Agents

@ CVPR 2026 Workshop

Tue June 3rd, 2026, Room TBD

at Denver, Colorado, US

Call for Papers

Recent advances in foundation models, including Large Language Models (LLMs), Vision–Language Models (VLMs), and Vision–Language–Action Models (VLAs), have supported embodied agents in performing a wide range of tasks in real-world and simulated environments. However, challenges such as fine-grained visual perception and long-horizon reasoning still remain significant barriers to reliable embodied decision-making.

In this workshop, we aim to bring together researchers from computer vision, robotics, and natural language processing to advance grounded perception, planning, and action for embodied intelligence. We focus on a unified decision-making pipeline, spanning goal understanding, subgoal decomposition, action sequencing, and transition modeling, to enable scalable and generalizable embodied agents.

Topics of Interest

We welcome contributions including, but not limited to, the following directions:

Long-horizon reasoning & planning
Spatial intelligence & physical understanding
World models, memory, and interaction
Vision-language-action learning and evaluation
Benchmarks, datasets, and evaluation protocols for embodied agents

Submission Guidelines

Paper Types

We welcome submissions covering:

Research papers: Long papers (8 pages) showcasing novel findings, methods, or theoretical advancements.
Short/Abstract papers: Features exploratory work (4 pages or 2 pages excluding references) that may be preliminary but presents innovative concepts, early results, or thought-provoking viewpoints that stimulate discussion and future work.
Position papers: Offer critical perspectives on trends and challenges within the field (no less than 8 pages).
Survey papers: Provide thorough reviews of specific topics, mapping the current research landscape and suggesting directions for future exploration (no less than 8 pages).

Formats & Rules

All types allow unlimited references and appendices.
Submissions should follow CVPR two-column style and be anonymous; see the CVPR-26 author kit for details.
Please submit through OpenReview submission portal (TBD).
Contributions will be non-archival but hosted on our workshop website, and thus dual submission is allowed where permitted by third parties. We welcome submissions that are under submission or accepted by other conferences. Please mention it in the last sentence of the paper abstract if your paper has been under submission or accepted by other conferences. Paper awards will prefer the original submissions.

Challenge / Benchmark Track

We host multiple evaluation tracks to benchmark embodied intelligence:

ENACT – a new challenge on evaluating embodied cognition of VLMs with world modeling of egocentric interaction
EmbodiedBench – comprehensive benchmarking of VLM-based embodied agents across perception, reasoning, and action
Embodied Agent Interface (EAI) – evaluating LLM-based agents on goal interpretation, subgoal decomposition, action sequencing, and transition modeling
RoboMME – a new challenge for robotic generalist policies in a diverse set of memory-critical manipulation tasks

Important Dates

All deadlines are 11:59 pm UTC-12h (“Anywhere on Earth”).

Submission Deadline	May 1st 2026 (23:59pm AoE)
Call for Program Committee Members	May 1st 2026 (23:59pm AoE)
Decision Notifications	May 18th 2026 (23:59pm AoE)
Camera-Ready Deadline (Non-Archival)	May 25th 2026 (23:59pm AoE)
Workshop Date	June 3rd 2026

Schedule

Time	Program
09:00–09:10	Opening Remarks
09:10–09:40	Keynote 1 - Kristen Grauman (UT Austin)
09:40–10:10	Keynote 2 - Yunzhu Li (Columbia University)
10:10–10:40	Keynote 3 - Wei-Chiu Ma (Cornell University)
10:40–11:30	Spotlight Session (6 min talks)
11:30–12:30	Poster Session
12:30–13:30	Student Mentoring Lunch Session
13:30–14:00	Keynote 4 - Lingjie Liu (University of Pennsylvania)
14:00–14:30	Keynote 5 - Yuke Zhu (UT Austin)
14:30–15:00	Keynote 6 - Xiaolong Wang (UC San Diego)
15:00–15:50	Panel Discussion
15:50–17:10	Oral Presentations (12 min talk + 3 min Q&A)
17:10–17:30	Best Paper Presentation (15 min talk + 5 min Q&A)
17:30–17:40	Closing Remarks