Foundation Models Meet Embodied Agents

@ AAAI 2025 Tutorial

08:30 - 12:30, Feb 25, 2025

118 A, Pennsylvania Convention Center, Philadelphia, Pennsylvania


Schedule Zoom

Foundation Models Meet Embodied Agents

An embodied agent is a generalist agent that can take natural language instructions from humans and perform a wide range of tasks in diverse environments. Recent years have witnessed the emergence of Large Language Models as powerful tools for building Large Agent Models, which have shown remarkable success in supporting embodied agents for different abilities such as goal interpretation, subgoal decomposition, action sequencing, and transition modeling (causal transitions from preconditions to post-effects).

However, moving from Foundation Models to Embodied Agents poses significant challenges in understanding lower-level visual details, and long-horizon reasoning for reliable embodied decision-making. We will cover the advances of the foundation models into Large Language Models Vision-Language Models, and Vision-Language-Action Models. In this tutorial, we will comprehensively review existing paradigms for foundations for embodied agents, and focus on their different formulations based on the fundamental mathematical framework of robot learning, Markov Decision Process (MDP), and present a structured view to investigate the robot’s decision-making process.

Schedule

Session Duration Time Presenter Slides/Video
Motivation and Overview 15min 08:30-08:45 Manling Li Slides, Video (Upcoming)
Foundation Models meet Virtual Agents 45min 08:45-09:30 Manling Li Slides, Video (Upcoming)
Foundation Models meet Physical Agents: Overview & High-level Decision Making 25min 09:30-09:55 Jiayuan Mao Slides, Video (Upcoming)
Foundation Models meet Physical Agents: Low-level Decision Making 50min 09:55-10:45 Wenlong Huang Slides, Video (Upcoming)
Break 30min 10:45-11:15
Robotic Foundation Models 30min 11:15-11:45 Yunzhu Li Slides, Video (Upcoming)
Remaining Challenges 15min 11:45-12:00 Yunzhu Li Slides, Video (Upcoming)
QA 30min 12:00-12:30

Presenters

Avatar

Manling Li

Northwestern University

Avatar

Yunzhu Li

Columbia University

Avatar

Wenlong Huang

Stanford University

Contact

Please email manling.li@northwestern.edu if you have any questions.