Query Distribution

Russ Greiner (greiner@cs.ualberta.ca)
Sun, 4 Oct 1998 22:50:46 -0600

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Marco Valtorta: "AAAI---Predictive Toxicology Symposium"
Previous message: pascal@icsc.ab.ca: "CIMA'99"

Dear UAI People,

There are now a number of deployed systems that use belief nets to
answer queries -- ie, to compute the posterior probability of some
variable(s), based on some specified set of evidence. It would be very
useful to know the actual distribution of queries posed to such real-world
systems; eg, how often the user asks
"What is the probability of cancer, given SoreThroat=T and Age>42 ?",
vs
"What is the probability of cancer, given SoreThroat=F, lump=F and Gender=M ?"
vs
"What is the prior probability of hepatitis ?"
etc etc etc.
We could then use this "query distribution", for example, to compute the
*average efficiency* of some algorithm for evaluating belief nets,
where the "average" is wrt this real-world distribution [1];
this would be useful when validating the effectiveness of some algorithm,
or when comparing two different algorithms.
We could also use such query distributions to compute the
*average (sum-squared) accuracy* (as well as average efficiency)
of some approximation scheme,
or even the
*average (sum-squared) accuracy* of some learning algorithm [2].
This would improve on the today's empirical studies, which can use only
synthesized data.

Hence this request:

Please let me know if you can provide some real-world *query distributions*
-- eg, if you have maintained a record of the queries that are actually
posed to a real system, or perhaps have stored a set of session transcripts
or log files, of a system's interations with its users.

To avoid confusions, note that this QUERY DISTRIBUTION cannot necessarily be
inferred from the given belief net B, as the query distribution might be
completely unrelated to the "NATURAL DISTRIBUTION" of events, encoded by B.
Eg, we may ask many queries about low probablity events --- the probability of
the QUERY
"What is the probability of cancer?"
may be very high, even though the actual probability of
Cancer
is very low.

Thank you.

Cheers,
Russ Greiner

[1] E. Herskovits and C. Cooper,
"Algorithms for Bayesian belief-network precomputation",
Methods of Information in Medicine, 1991.

[2] R. Greiner, A. Grove and D. Schuurmans,
"Learning Bayesian Nets that Perform Well"
UAI-97.

| Russell Greiner Phone: (403) 492-5461 |
| Dep't of Computing Science FAX: (403) 492-1071 |
| University of Alberta Email: greiner@cs.ualberta.ca |
| Edmonton, AB T6G 2H1 Canada http://www.cs.ualberta.ca/~greiner/ |

Next message: Marco Valtorta: "AAAI---Predictive Toxicology Symposium"
Previous message: pascal@icsc.ab.ca: "CIMA'99"