Looking for papers: RL + Bayes Nets

Marco Wiering) (marco@idsia.ch)
Wed, 21 Oct 1998 12:50:50 +0200

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Annette Burton: "Job Announcement"
Previous message: Padhraic Smyth: "faculty position at UC Irvine"

Excuse me in case you receive multiple copies....

I have received/collected the following 3 papers
associated to combining RL and Bayesian networks.
Unfortunately, it seems there has not been performed
a lot of research in this direction.

(1)

@inproceedings{Andre:98,
author = {David Andre and Nir Friedman and Ronald Parr},
title = {Generalized Prioritized Sweeping},
booktitle = {Advances in Neural Information Processing Systems 10},
editor={M.I. Jordan and M.J. Kearns and S.A. Solla},
publisher={MIT Press, Cambridge MA},
pages = {1001-1007},
year = {1998}}

The paper introduces GPS, an algorithm which extends Prioritized Sweeping
(Moore and Atkeson, 1993) for updating Q-values using model-based reinforcement
learning and tabular representations. GPS allows the model to be represented
by (e.g.) dynamic Bayesian networks.

(2)

@techreport{Tadepalli:98,
author = {P. Tadepalli and D. Ok},
title = {Model-based Avarege Reward Reinforcement Learning},
institution ={Oregon State University, Corvallis},
year = {1998}}

This lenghty paper describes (a.o) using dynamic Bayesian networks to compactly
store the model for H-learning, a model-based RL method maximizing average
reward per time step. The DBN for storing the model can then be combined with
local linear regression for approximating the value function. The results of
this combination on a Automated guided vehicle domain show promising results.

(3)

@techreport{D'Ambrosio:97,
author = {B. D'Ambrosio},
title = {POMDP Learning using Qualitative Belief Spaces},
institution ={Oregon State University, Corvallis},
year = {1997}}

The paper presents K-abstraction, a method for quantizing a belief state
space after which Q-learning is used to learn the Q-function approximation.
Experiments are used on a diagnosis/online maintenance task consisting of
256 states. A belief network is used for modelling the problem. The results
show that the method can quickly improve its policy.

---------------
Marco Wiering
IDSIA
Switzerland
---------------

Next message: Annette Burton: "Job Announcement"
Previous message: Padhraic Smyth: "faculty position at UC Irvine"