Foundation Models Meet Embodied Agents

@ CVPR 2026 Workshop

Thu June 4th, 2026, Room 703

at Denver, Colorado, US

Call For Papers Submission Portal Schedule

Call for Papers

Recent advances in foundation models, including Large Language Models (LLMs), Vision–Language Models (VLMs), and Vision–Language–Action Models (VLAs), have supported embodied agents in performing a wide range of tasks in real-world and simulated environments. However, challenges such as fine-grained visual perception and long-horizon reasoning still remain significant barriers to reliable embodied decision-making.

In this workshop, we aim to bring together researchers from computer vision, robotics, and natural language processing to advance grounded perception, planning, and action for embodied intelligence. We focus on a unified decision-making pipeline, spanning goal understanding, subgoal decomposition, action sequencing, and transition modeling, to enable scalable and generalizable embodied agents.

Topics of Interest

We welcome contributions including, but not limited to, the following directions:

Long-horizon reasoning & planning
Spatial intelligence & physical understanding
World models, memory, and interaction
Vision-language-action learning and evaluation
Benchmarks, datasets, and evaluation protocols for embodied agents

Submission Guidelines

Paper Types

We welcome submissions covering:

Research papers: Long papers (8 pages) showcasing novel findings, methods, or theoretical advancements.
Short/Abstract papers: Features exploratory work (4 pages or 2 pages excluding references) that may be preliminary but presents innovative concepts, early results, or thought-provoking viewpoints that stimulate discussion and future work.
Position papers: Offer critical perspectives on trends and challenges within the field (no less than 8 pages).
Survey papers: Provide thorough reviews of specific topics, mapping the current research landscape and suggesting directions for future exploration (no less than 8 pages).

Formats & Rules

All types allow unlimited references and appendices.
Submissions should follow CVPR two-column style and be anonymous; see the CVPR-26 author kit for details.
Please submit through OpenReview submission portal.
Contributions will be non-archival but hosted on our workshop website, and thus dual submission is allowed where permitted by third parties. We welcome submissions that are under submission or accepted by other conferences. Please mention it in the last sentence of the paper abstract if your paper has been under submission or accepted by other conferences. Paper awards will prefer the original submissions.

Challenge / Benchmark Track

We host multiple evaluation tracks to benchmark embodied intelligence:

ENACT – a new challenge on evaluating embodied cognition of VLMs with world modeling of egocentric interaction
EmbodiedBench – comprehensive benchmarking of VLM-based embodied agents across perception, reasoning, and action
Embodied Agent Interface (EAI) – evaluating LLM-based agents on goal interpretation, subgoal decomposition, action sequencing, and transition modeling
RoboMME – a new challenge for robotic generalist policies in a diverse set of memory-critical manipulation tasks

Important Dates

All deadlines are 11:59 pm UTC-12h (“Anywhere on Earth”).

Submission Deadline	~~May 1st 2026 (23:59pm AoE)~~ May 10th 2026 (23:59pm AoE)
Call for Program Committee Members	~~May 1st 2026 (23:59pm AoE)~~ May 10th 2026 (23:59pm AoE)
Decision Notifications	~~May 18th 2026 (23:59pm AoE)~~ May 25th 2026 (23:59pm AoE)
Camera-Ready Deadline (Non-Archival)	~~May 25th 2026 (23:59pm AoE)~~ May 30th 2026 (23:59pm AoE)
Workshop Date	June 4th 2026

Schedule

Tentative — all times are in Denver local time (Mountain Time), in the afternoon.

Time	Program
1:00–1:05 PM	Opening
1:05–1:45 PM	Invited Talk 1 - Kristen Grauman (UT Austin)
1:45–2:25 PM	Invited Talk 2 - Wei-Chiu Ma (Cornell University)
2:25–3:05 PM	Invited Talk 3 - Xudong Wang (Physical Intelligence)
3:05–3:55 PM	Contributed Talks
3:55–4:30 PM	Poster Session / Coffee Break
4:30–5:10 PM	Invited Talk 4 - Kaichun Mo (NVIDIA)
5:10–5:50 PM	Invited Talk 5 - An-Chieh Cheng (UC San Diego)
5:50–6:00 PM	Award Ceremony + Closing Remarks

Accepted Papers

Congratulations to the authors of the 34 papers accepted to FMEA @ CVPR 2026. All accepted papers are non-archival and will be presented in the poster session.

1 Inference-Time Planning with Action-Conditioned Video Models for Generalizable Robot Manipulation

2 RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains

3 Theory of Space: Benchmarking Active Spatial Belief Construction and Revision in Foundation Models for Embodied Agents

4 PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning

5 Multimodal Causal Subtask Modeling for Scalable VLA Pipelines in Long-Horizon Manipulation

6 VL-Nav: A Neuro-Symbolic Approach for Reasoning-based Vision-Language Navigation

7 PInVerify: An Offline Embodied Benchmark for Active Instance Verification

8 FunFact: Building Probabilistic Functional 3D Scene Graphs via Factor-Graph Reasoning

9 AgenticLab: A Real-World Robot Agent Platform that Can See, Think, and Act

10 RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks

11 Re²: Reflective Rule Induction and Rule-Guided Refinement for Embodied Planning

12 A Physics-Grounded Benchmark for Multi-Agent Dynamics in World Models