With progress in AI, researchers are increasingly looking at more holistic problem settings where agents are embodied in simulated or physical environments and must use visual perception to navigate and interact with the world in order to achieve natural language specified goals -- developing embodied AI agents that can see, talk, act, and reason. In this class, you will learn about, apply, and (possibly) advance state-of-the-art techniques for problems at this exciting intersection of disciplines.
The focus of this course is on reading and critiquing recent research papers in this area, and on executing a quarter-long research project. There are no exams or traditional assignments, the course can be summarized as: Read papers. Do research. Talk about both..
We will read and analyze the strengths and weaknesses of recent research papers on a variety of topics in embodied AI and identify open research questions. See the schedule below for a list of topics and papers.
Over the quarter, you will also execute a research project with a concrete objective in teams of 2-3 students (depending on enrollment). While certainly not a requirement, students should actively consider submitting a paper at the end of the course to a top-tier conference in Computer Vision, Natural Language Processing, Machine learning, or AI. See aideadlin.es for upcoming deadlines.
|
After the introductory classes, you will read and review a paper (listed in the schedule) prior to each class. Each lecture will start with a discussion of the paper that was reviewed. The discussion will be led by two students -- one highlighting the strengths of the paper and the other highlighting the weaknesses.
Following the paper discussion, 3 teams will present their project ideas, updates and the issues they faced -- the slot is for 25 minutes and presenters will be asked to prepare a ~10 minute presentation. The goal is to have enough time for discussion and brainstorming. Each team will present updates on their project 3 times over the course of the semester. At the end of the semester, teams will give final project presentations.
The class will vote on the best discussion participant, best project presentation and best project!
Feedback is very welcome. If you have any questions or concerns about the class or the requirements, please be sure to discuss them with the instructor early on.
No laptops, cell phones or other distractions in class please.
Summary:NOTE: There will be no final exam!
Date | Topic | Review Paper | Project Presentation |
W1: Sep. 25 | Lecture: Intro to Embodied AI Topics and Class Administrativia [slides] | N/A | N/A |
W2: Sep. 30 |
Lecture: Essential Neural Building Blocks (CNN / RNN / Multimodal Attention) [slides] |
N/A | N/A |
W2: Oct. 2 | Lecture: Imitation / Reinforcement Learning (Part I) [slides] | N/A | N/A |
Oct. 4 | Sign up your teams by 11:59 pm | ||
W3: Oct. 7 | Lecture: Imitation / Reinforcement Learning (Part II) [slides] Special: Individual Meetings to Discuss Project Ideas |
N/A | N/A |
W3: Oct. 9 @ BEXL 321 |
Paper: Policy Optimization Lecture: Intro to Visual Navigation [slides] |
R1: Schulman et al., 2017 For: S1 Against: S5 |
N/A |
W4: Oct. 14 | Paper: Visual Navigation In Synthetic Environments |
R2: Mirowski and Pascanu et al., 2017 For: S7 Against: S6 |
Project Proposals (Teams 1-2) 1. S7, S6, & S2 2. S4 & S3 |
W4: Oct. 16 @ BEXL 321 |
Paper: Visual Navigation In Realistic Environments |
R3: Savva, Kadian, and Maksymets et al., 2017 For: S5 Against: S3 |
Project Proposals (Teams 3-4) 3. S8 4. S9 |
W5: Oct. 21 | Paper: Topological Representations for Visual Navigation |
R4: Savinov and Dosovitskiy et al., 2018 For: S7 Against: S2 |
Project Proposals (Teams 5-6) 5. S1 6. S5 |
W5: Oct. 23 |
Paper: Spatial Memory for Visual Navigation Lecture: Intro to Instruction Following [slides] |
R5: Gupta et al., 2017 For: S8 Against: S1 |
N/A |
W6: Oct. 28 | Paper: Open-Area Instruction Following with Continuous Control |
R6: Blukis et al., 2018 For: S9 Against: S4 |
Project Update 1 (Teams 1-2) 1. S7, S6, & S2 2. S4 & S3 |
W6: Oct. 30 | Paper: Instruction Following in Home Environments |
R7: Wang et al., 2019 For: S8 Against: S7 |
Project Update 1 (Teams 3-4) 3. S8 4. S9 |
W7: Nov. 4 | Paper: Evaluating Instruction-conditioned Navigation |
R8: Magalhaes et al., 2019 For: S4 Against: S9 |
Project Update 1 (Teams 5-6) 5. S1 6. S5 |
W7: Nov. 6 |
Paper: Interactive Instruction Following Lecture: Intro to Embodied Question Answering [slides] |
R9: Thomason et al., 2019 For: S3 Against: S1 |
N/A |
W8: Nov. 11 | Veteran's Day Holiday -- No class | ||
W8: Nov. 13 | Paper: Question Answering in Synthetic Interactive Environments |
R10: Gorden et al., 2018 For: S6 Against: S5 |
Project Update 2 (Teams 1-2) 1. S7, S6, & S2 2. S4 & S3 |
W9: Nov. 18 | Paper: Question Answering in Realistic Indoor Environments |
R11: Wijmans, Datta, and Maksymets et al., 2019 For: S2 Against: S9 |
Project Update 2 (Teams 3-4) 3. S8 4. S9 |
W9: Nov. 20 | Paper: Answering Questions about Multiple Objects |
R12: Yu et al., 2019 For: S6 Against: S3 |
Project Update 2 (Teams 5-6) 5. S1 6. S5 |
W10: Nov. 25 |
Paper: Careful Baselines for Embodied Question Answering Lecture: Intro to Grounded Language Learning [slides] |
R13: Thomason et al., 2019 For: S9 Against: S2 |
N/A |
W10: Nov. 27 |
Paper: Learning Attributed Objects in Simple Environments Lecture: Intro to Sim2Real Transfer [slides] |
R14: Chalpot et al., 2018 For: S4 Against: S8 |
N/A |
W11: Dec. 2 |
Paper: Moving Visual Navigation to the Real World |
R15: Bansal and Tolani et al., 2018 For: S5 Against: S1 |
N/A |
W11: Dec. 4 | N/A | N/A |
Final Project Presentation (Teams 3-6) 1. S7, S6, & S2 2. S4 & S3 3. S1 4. S9 5. S8 6. S5 |
Dec. 6 | Project Videos due by 11:59pm through Canvas |
You will be assigned to lead discussion on the paper that you have read, about once (estimated) during the semester. You will be asked to argue either in favor of the paper or against the paper. In each case, come prepared with 5 points of discussion (in favor or against the paper).
NOTE: You need not submit a review for the paper you are leading a discussion on.
Slides should be made as visual (with videos, images, animations) and clear as possible. Students should practice their talks ahead of time to make sure they are of appropriate length -- not shorter by more than a few minutes, and certainly not longer (we will set a timer that will go off). The talks should be well organized and polished. Take a look at some example presentations: Example 1, Example2.
Initial presentation: Each team will present for about 10 min. In the first presentation, teams will present a project proposal organized as follows:
Each team will explain their project in a 15 min. presentation with an organization similar to the project proposal presentation, except now describing the actual outcomes rather than plans. In addition, also describe any challenges you faced, any insights on future extensions of the project. 3 min. of QA will follow each presentation.
Teams will prepare a 1 min. YouTube video summarizing the project. The video is a teaser to convey the main points, and gain the viewer's interest in wanting to know more. It should be understandable by anyone familiar wtih AI. Please submit the YouTube link to the Canvas assignment.
See: Example 1, Example 2 , Example 3 , Example 4 , Example 5 , Example 6 , Example 7 , Example 8 , Example 9 , Example 10 .
Projects typically fall under one of these categories:
Project teams should have 2-3 students (depending on enrollment). No more than 9 teams in the class.
You may combine this with another course project if approved by other the professor but you must delineate the different parts. Overlap with your own research is highly recommended.