CS539 - Embodied AI

Fall 2019 -- Oregon State University, College of Engineering
Class meets M / W 4:00-5:50pm, Kelley 1005

Learn about, apply, and possibly advance state-of-the-art techniques at the intersection of computer vision, natural language processing, and reinforcement learning.

Course Information

With progress in AI, researchers are increasingly looking at more holistic problem settings where agents are embodied in simulated or physical environments and must use visual perception to navigate and interact with the world in order to achieve natural language specified goals -- developing embodied AI agents that can see, talk, act, and reason. In this class, you will learn about, apply, and (possibly) advance state-of-the-art techniques for problems at this exciting intersection of disciplines.

The focus of this course is on reading and critiquing recent research papers in this area, and on executing a quarter-long research project. There are no exams or traditional assignments, the course can be summarized as: Read papers. Do research. Talk about both..

We will read and analyze the strengths and weaknesses of recent research papers on a variety of topics in embodied AI and identify open research questions. See the schedule below for a list of topics and papers.

Over the quarter, you will also execute a research project with a concrete objective in teams of 2-3 students (depending on enrollment). While certainly not a requirement, students should actively consider submitting a paper at the end of the course to a top-tier conference in Computer Vision, Natural Language Processing, Machine learning, or AI. See aideadlin.es for upcoming deadlines.

Instructor Stefan Lee
Email leestef [at] oregonstate [dot] edu
Office Hours M/W 5:50 - 6:30 (right after class)
To showcase everyone's effort, the project spotlights are below. Thanks for a great quarter!

Project Spotlights

Visual Navigation-based Environment Exploration
Manish Saroya, Enna Sachdeva, Dhruv Jawalkar

Value Iteration on Sterioids
Aayam Shrestha

Vision and Language Navigation: Off the Rails
Anand Koshy, Jacob Krantz

Precision Visuo-Servoing of 6DOF Arm
Kartik Gupta

Measuring Robustness of Embodied RL Agents
Vivswan Shitole

Foreseeable RL Agent
Zhengxian Lin
Course Structure
The first few classes will be lectures -- introducing you to common tools and topics in Embodied AI. The goal is to provide an overview of the area for you to make an informed decision about the direction of the course project prior to your proposal presentation.

After the introductory classes, you will read and review a paper (listed in the schedule) prior to each class. Each lecture will start with a discussion of the paper that was reviewed. The discussion will be led by two students -- one highlighting the strengths of the paper and the other highlighting the weaknesses.

Following the paper discussion, 3 teams will present their project ideas, updates and the issues they faced -- the slot is for 25 minutes and presenters will be asked to prepare a ~10 minute presentation. The goal is to have enough time for discussion and brainstorming. Each team will present updates on their project 3 times over the course of the semester. At the end of the semester, teams will give final project presentations.

The class will vote on the best discussion participant, best project presentation and best project!

Feedback is very welcome. If you have any questions or concerns about the class or the requirements, please be sure to discuss them with the instructor early on.

No laptops, cell phones or other distractions in class please.


NOTE: There will be no final exam!

Recommended Background
CS 539: Embodied AI is an ADVANCED class. This should not be your first exposure to computer vision, machine learning, or deep learning. You will absolutely need:

Date Topic Review Paper Project Presentation
W1: Sep. 25 Lecture: Intro to Embodied AI Topics and Class Administrativia [slides] N/A N/A
W2: Sep. 30 Lecture: Essential Neural Building Blocks
(CNN / RNN / Multimodal Attention) [slides]
W2: Oct. 2 Lecture: Imitation / Reinforcement Learning (Part I) [slides] N/A N/A
Oct. 4 Sign up your teams by 11:59 pm
W3: Oct. 7 Lecture: Imitation / Reinforcement Learning (Part II) [slides]
Special: Individual Meetings to Discuss Project Ideas
W3: Oct. 9
@ BEXL 321
Paper: Policy Optimization
Lecture: Intro to Visual Navigation [slides]
R1: Schulman et al., 2017
For: S1
Against: S5
W4: Oct. 14 Paper: Visual Navigation In Synthetic Environments R2: Mirowski and Pascanu et al., 2017
For: S7
Against: S6
Project Proposals (Teams 1-2)
1. S7, S6, & S2
2. S4 & S3
W4: Oct. 16
@ BEXL 321
Paper: Visual Navigation In Realistic Environments R3: Savva, Kadian, and Maksymets et al., 2017
For: S5
Against: S3
Project Proposals (Teams 3-4)
3. S8
4. S9
W5: Oct. 21 Paper: Topological Representations for Visual Navigation R4: Savinov and Dosovitskiy et al., 2018
For: S7
Against: S2
Project Proposals (Teams 5-6)
5. S1
6. S5
W5: Oct. 23 Paper: Spatial Memory for Visual Navigation
Lecture: Intro to Instruction Following [slides]
R5: Gupta et al., 2017
For: S8
Against: S1
W6: Oct. 28 Paper: Open-Area Instruction Following with Continuous Control R6: Blukis et al., 2018
For: S9
Against: S4
Project Update 1 (Teams 1-2)
1. S7, S6, & S2
2. S4 & S3
W6: Oct. 30 Paper: Instruction Following in Home Environments R7: Wang et al., 2019
For: S8
Against: S7
Project Update 1 (Teams 3-4)
3. S8
4. S9
W7: Nov. 4 Paper: Evaluating Instruction-conditioned Navigation R8: Magalhaes et al., 2019
For: S4
Against: S9
Project Update 1 (Teams 5-6)
5. S1
6. S5
W7: Nov. 6 Paper: Interactive Instruction Following
Lecture: Intro to Embodied Question Answering [slides]
R9: Thomason et al., 2019
For: S3
Against: S1
W8: Nov. 11 Veteran's Day Holiday -- No class
W8: Nov. 13 Paper: Question Answering in Synthetic Interactive Environments R10: Gorden et al., 2018
For: S6
Against: S5
Project Update 2 (Teams 1-2)
1. S7, S6, & S2
2. S4 & S3
W9: Nov. 18 Paper: Question Answering in Realistic Indoor Environments R11: Wijmans, Datta, and Maksymets et al., 2019
For: S2
Against: S9
Project Update 2 (Teams 3-4)
3. S8
4. S9
W9: Nov. 20 Paper: Answering Questions about Multiple Objects R12: Yu et al., 2019
For: S6
Against: S3
Project Update 2 (Teams 5-6)
5. S1
6. S5
W10: Nov. 25 Paper: Careful Baselines for Embodied Question Answering
Lecture: Intro to Grounded Language Learning [slides]
R13: Thomason et al., 2019
For: S9
Against: S2
W10: Nov. 27 Paper: Learning Attributed Objects in Simple Environments
Lecture: Intro to Sim2Real Transfer [slides]
R14: Chalpot et al., 2018
For: S4
Against: S8
W11: Dec. 2 Paper: Moving Visual Navigation to the Real World
R15: Bansal and Tolani et al., 2018
For: S5
Against: S1
W11: Dec. 4 N/A N/A Final Project Presentation (Teams 3-6)
1. S7, S6, & S2
2. S4 & S3
3. S1
4. S9
5. S8
6. S5
Dec. 6 Project Videos due by 11:59pm through Canvas
Reviews (30% of final grade)

Reviews are due 11:59 pm on the day before the class discussing the paper.

Anatomy of a Review:

As you might imagine, other folks have also written tips on How to read a paper and How to write a review. These are from other CS fields, but the advice (at least at a high-level) is still relevant.
Paper Discussion (10% of final grade)

You will be assigned to lead discussion on the paper that you have read, about once (estimated) during the semester. You will be asked to argue either in favor of the paper or against the paper. In each case, come prepared with 5 points of discussion (in favor or against the paper).

NOTE: You need not submit a review for the paper you are leading a discussion on.


Slides should be made as visual (with videos, images, animations) and clear as possible. Students should practice their talks ahead of time to make sure they are of appropriate length -- not shorter by more than a few minutes, and certainly not longer (we will set a timer that will go off). The talks should be well organized and polished. Take a look at some example presentations: Example 1, Example2.

Initial and update presentations (30% of final grade. See schedule)

Initial presentation: Each team will present for about 10 min. In the first presentation, teams will present a project proposal organized as follows:

  • Problem statement: Clearly state the goal of your project. Specify the input and desired output.
  • Related work: Briefly describe existing related work (with citations) and what your project brings to the table that these other works do not. The most relevant papers may not necessarily be papers listed on the schedule, so be sure to also look beyond the list.
  • Approach: Describe the technical approach you plan to employ. Clearly state the assumptions of your approach.
  • Experiments and results: Describe the experimental setup you will follow, which datasets you will use, which existing code you will exploit, what you will implement yourself, and what you would define as a success for the project. If you plan on collecting your own data, describe what data collection protocol you will follow. Specify if you plan on experimentally analyzing different characteristics of your approach, or if you will compare to existing techniques. Provide a list of experiments you will perform. Describe what you expect the experiments to reveal, or what is uncertain about the potential outcomes. If you have any preliminary results, please summarize those as well.
  • Timeline: Present a timeline of the planned tasks/goals. Clearly state what you plan to complete by the next presentation. Break this down into two sets -- tasks that you are sure you will have completed and tasks that are a bit of a long shot but you would like to complete. Please also use a similar breakdown during the update presentations. You will be expected to try hard to stick to this timeline.
Update presentations: In the following two presentations, you will update the class on your progress. You will remind the class of your problem statement, and provide a quick recap of the approach. Remind us of your timeline from your earlier presentation, and then describe your current results, any challenges or issues that you faced, and an updated timeline. Presentations will be <=10 min. long and will be followed by 15 min. of discussion.

Final presentation (15% of final grade)

Each team will explain their project in a 15 min. presentation with an organization similar to the project proposal presentation, except now describing the actual outcomes rather than plans. In addition, also describe any challenges you faced, any insights on future extensions of the project. 3 min. of QA will follow each presentation.

Project video (15% of final grade)

Teams will prepare a 1 min. YouTube video summarizing the project. The video is a teaser to convey the main points, and gain the viewer's interest in wanting to know more. It should be understandable by anyone familiar wtih AI. Please submit the YouTube link to the Canvas assignment.

See: Example 1, Example 2 , Example 3 , Example 4 , Example 5 , Example 6 , Example 7 , Example 8 , Example 9 , Example 10 .

Your project will typically involve addressing a novel problem or addressing an existing problem in a novel way. Your goal should be to advance a state-of-the-art technique, or introduce a new task in vision and language along with benchmarking basic approaches, and potentially proposing an interesting model for the new task. Refer to the schedule to find topics of interest. But feel free to be creative and come up with your own! If you need help brainstorming an idea or refine a concept, reach out to the instructor. While certainly not a requirement for the class, students should actively consider submitting a paper at the end of the course to a top-tier conference in Computer Vision, Natural Language Processing, Machine learning, or AI.

Projects typically fall under one of these categories:

  • Novel problem / task / application.
  • Application/survey - compare a bunch of algorithms on an application domain of interest. These make most sense if you are expecting some interesting trend / finding in the analysis.
  • Formulation/Development - formulate a new model or algorithm for a new/old problem.
  • Analysis - analyze an existing algorithm.

Project teams should have 2-3 students (depending on enrollment). No more than 9 teams in the class.

You may combine this with another course project if approved by other the professor but you must delineate the different parts. Overlap with your own research is highly recommended.

This course borrows heavily from Devi Parikh's Vision and Language course structure.