Lifelong Active Transfer Learning for Sequential Decision Making

Project Description

The overall goal of this project is to learn to perform multiple sequential decision tasks by actively interacting with an in-situ expert and transferring this knowledge across different tasks. The tasks might involve strategic thinking as in playing a real-time strategy game or efficiently searching for targets in a space using limited resources, or more simpler control tasks such as balancing a bicycle or a cart-pole. We are exploring several subproblems of this challenging research topic including Bayesian transfer learning, interactive reinforcement learning, and active imitation learning.

In Bayesian transfer learning, the task knowledge is organized hierarchically into different classes, where related tasks fall under the same class. The number and the definition of classes is variable and learned from experience using the framework of hierarchical Dirichlet processes. The classes correspond to similar Markov Decision processes in model-based reinforcement learning and or similar role-based policies in moodel-free learning.

In interactive reinforcement learning, the goal is to accelerate reinforcement learning by having an expert critique the trajectories generated by the learner and offer advice. The learner combines the self-practice sessions with critique sessions which makes it possible to converge more quickly than either practice or critique by themselves.

In active imitation learning, we are exploring a number of approaches where the learning agent can actively ask queries to quickly learn to imitate the expert. In one approach, we ask state-based queries about what action to take in a given state. In a second approach we ask queries about preferences between multiple trajectories generated by different policies. More recently we are developing approaches that combine priors over utilities with expert advice on actions.

Publications

Judah, K., Fern, A., and Dietterich, Active Imitation Learning via Reduction to I.I.D. Active Learning, Conference on Uncertainty in Artificial Intelligence (UAI) 2012.
Judah, K., Roy, S., Fern, A. and Dietterich, T., Reinforcement Learning via Practice and Critique Advice, AAAI Conference on Artificial Intelligence (AAAI) 2010. (Slides)
Judah, K., Fern, A. and Dietterich, T., Active Imitation Learning via Reduction to I.I.D. Active Learning , AAAI 2012 Fall Symposium on Robots Learning Interactively from Human Teachers.
Judah, K., Fern, A. and Dietterich, T., Active Imitation Learning via State Queries , ICML 2011 Workshop on Combining Learning Strategies to Reduce Label Cost.
Judah, K., Fern, A. and Dietterich, T., Reinforcement Learning via Practice and Critique Advice , AAMAS 2010 Workshop on Agents Learning Interactively from Human Teachers.
Natarajan, S., Kunapuli, G., Judah, K., Tadepalli, P., Kersting, K., and Shavlik, J.,Multi-Agent Inverse Reinforcement Learning, International Conference on Machine Learning and Applications (ICMLA) 2010.
Wilson, A., Fern, A. and Tadepalli, P., A Bayesian Approach to Policy Learning from Trajectory Preference Queries, NIPS, 2012.
Wilson, A., Fern, A. and Tadepalli, P., A Behavior Based Kernel for Policy Search via Bayesian Optimization in ICML Workshop on Planning and Acting with Uncertain Models.
Wilson, A., Fern, A. and Tadepalli, P., Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach in ICML Workshop on Unsupervised and Transfer Learning, 2011.
Wilson, A., Fern, A., and Tadepalli, P., Incorporating Domain Models into Bayesian Optimization for Reinforcement Learning in European Conference on Machine Learning, 2010.
Wilson, A., Fern, A. and Tadepalli, P., Bayesian Policy Search for Multiagent Role Discovery in National Conference on Artificial Intelligence, 2010.
Wilson, A., Fern, A., Ray, S. and Tadepalli, P. Multi-Task Reinforcement Learning: A Hierarchical Bayesian Approach International Conference on Machine Learning, 2007.

Funding source:
ONR

Faculy:
Alan Fern
Prasad Tadepalli

Postdoctoral Researchers:
Aaron Wilson
Robby Goetschalckx

Students:
Kshitij Judah
Kranti Kumar Potanapalli
Henry Trimbach