Foundation Models Meet Embodied Agents

@ ICCV 2025 Tutorial

1:00 PM – 5:00 PM HST, October 20, 2025

306B, Hawaii Convention Center, Honolulu, Hawaii


Schedule Zoom

Foundation Models Meet Embodied Agents

An embodied agent is a generalist agent that can take natural language instructions from humans and perform a wide range of tasks in diverse environments. Recent years have witnessed the emergence of foundation models, which have shown remarkable success in supporting embodied agents for different abilities such as goal interpretation, subgoal decomposition, action sequencing, and transition modeling (causal transitions from preconditions to post-effects).

We categorize the foundation models into Large Language Models (LLMs), Vision-Language Models (VLMs), and Vision-Language-Action Models (VLAs). In this tutorial, we will comprehensively review existing paradigms for foundations for embodied agents, and focus on their different formulations based on the fundamental mathematical framework of robot learning, Markov Decision Process (MDP), and design a structured view to investigate the robot's decision making process.

This tutorial will present a systematic overview of recent advances in foundation models for embodied agents. We compare these models and explore their design space to guide future developments, focusing on Lower-Level Environment Encoding and Interaction and Longer-Horizon Decision Making.

🔗 More details on the ICCV 2025 tutorial page.

Schedule

Session Duration Time (HST) Presenter Slides/Video
Motivation and Overview 15min 1:00-1:15 PM Manling Li Slides, Video (Upcoming)
Foundation Models meet Virtual Agents 45min 1:15-2:00 PM Manling Li Slides, Video (Upcoming)
Foundation Models meet Physical Agents: Overview & Perception 25min 2:00-2:25 PM Jiayuan Mao Slides, Video (Upcoming)
Foundation Models meet Physical Agents: High-Level and Low-level Decision Making 50min 2:25-3:15 PM Wenlong Huang Slides, Video (Upcoming)
Break 30min 3:15-3:45 PM
Robotic Foundation Models 30min 3:45-4:15 PM Yunzhu Li Slides, Video (Upcoming)
Remaining Challenges 15min 4:15-4:30 PM Yunzhu Li Slides, Video (Upcoming)
QA 30min 4:30-5:00 PM

Presenters

Manling Li

Manling Li

Northwestern University

Yunzhu Li

Yunzhu Li

Columbia University

Jiayuan Mao

Jiayuan Mao

Amazon FAR and UPenn

Wenlong Huang

Wenlong Huang

Stanford University

Contact

Please email manling.li@u.northwestern.edu if you have any questions.