Re: Bayesian Networks and Belief Functions

Judea Pearl (judea@cs.ucla.edu)
Tue, 8 Jun 1999 19:53:45 -0700 (PDT)

Hi all,

The latest discussion of belief functions (BF) has reminded me of
the difficult time I had in 1987-88, when I struggled to
understand the Dempster-Shafer (D-S) theory and to relate it
to Bayesian analysis. Fortunately for me, that struggle
has ended in a simple interpretation of belief functions
that settled all my difficulties,
put my mind at ease, and clearly delineated the type of
applications where BF would be useful.
This interpretation, which identifies BF's with
"PROBABILITY OF NECESSITY" (or probability of provability),
has helped me turn endless philosophical arguments into fruitful
discussions with the proponents of D-S theory.
I think it can also add clarity to the
latest discussion that bloomed from Smets' innocent remark.

Enclosed is a subsection from chapter-9 of my book
"Probabilistic Reasoning (1988)", p. 427,
in which the ideas behind belief-functions, Dempster's rule,
and probability of necessity are illustrated in a simple example.

I hope it helps,
=========Judea

-----------------annotated section -----------
.sp
.SH "TRUTH, POSSIBILITY, AND PROVABILITY"
.IX Possibility
.IX Provability
.P 0
The notion of
provability normally reflects mathematical
artifice rather than empirical reality.
Thus, the preceding discussion might leave
the erroneous impression that the D-S theory deals with a
totally unrealistic model of the world and provides
answers to contrived, uninteresting queries about
such a model.
There are, however, cases where compatibility relations are
natural representations of world knowledge,
and where queries concerned with the probability of
provability rather than the probability
of truth are the ones that we
wish to ascertain.
.P
For example,
suppose we face the problem of
scheduling classes; we have a set of teachers, a set of
topics, a set of classrooms, and a set of time
slots, and our task is to cover all of the topics with the
available resources. Before we actually select
an
assignment (if one exists), our knowledge is
represented in the form of constraints: some
teachers cannot teach certain topics, some classrooms
are unavailable at certain times, etc. On
the basis of this knowledge, all we can answer are
questions about possibilities,
.IX Possibilities
e.g., whether it is
possible
to take topic $x$ in time slot $y$, whether
it is feasible to get teacher $z$
for topic $w$, etc. To answer such queries, we need to
search the space of feasible assignments (i.e.,
those satisfying all of the constraints) and test whether there
is an assignment that satisfies the query. If a query
$Q$ is satisfied by at least one feasible assignment, we
say that $Q$ is
.ul
possible;
if it is satisfied by all feasible assignments, we
say it is
.ul
necessary
or
.ul
provable.
.P
Probabilistic measures enter in the
form of partial constraints. For example, suppose
one of the teachers states, "Due to a medical
problem, there is an 80 percent chance that I will be
unable to teach Mondays and Thursdays." This
information imposes on the problem an additional constraint\*(EMthis
time probabilistic in nature,
similar to the action of the switch in Figure 9.3.
It assigns a weight of $m ~=~ 0.80$ to a narrower set
of feasible assignments and a weight of $m ~=~ 0.20$
to the original set, in which the teacher was
presumed to be available on Mondays and Thursdays.
It is natural now for some other teacher to
ask,
"What are the chances that I will be able to teach
mathematics?" or,
"What are the chances that I will be forced
to teach on Tuesday evening?" These queries, concerning
possibilities and necessities, translate directly
into the D-S measures of $Pl (Q)$ and $Bel (Q)$,
respectively.

****Note added: As we see, Bel(Q) has nothing to do with
our belief that Q is true, and the name "belief function"
is somewhat misleading. Bel(Q) stands for the probability
that Q is provable given (1) a set of hard constraints and
(2) probabilities on some variables in that set.
Behind every talk of BF's there is always an implicit set
of hard constraints.
***** end of note

To answer such queries, we must
solve the assignment problem for each state of the
switch,
determine for each state whether the query is
provable, possible, or impossible, and then compute the
desired probability using the weight $m$.
.P
Can such answers be computed by a pure, single-level
Bayesian model? Not really. We could of course
assume a uniform distribution over the
feasible assignments in each state, combine the
two
distributions with the appropriate weights\*(EM0.20
and 0.80\*(EMand then calculate
the resultant probability of the query sentence $Q$
(e.g., that teacher $x$ will be assigned to teach on
Tuesdays).
However, this is not the same as the probability
that $Q$ is possible, or the probability that it is necessary,
and the difference might be significant. For example, the
inquirer might have a strong aversion to
teaching on Tuesdays and hence be determined to talk his
way out of any such assignment if
a feasible alternative exists.
He is, therefore, concerned
not with the probability of being assigned a Tuesday
time slot, but rather with the probability
that such an assignment will be necessary for lack of
an alternative. In such cases, the purely probabilistic
approach will not provide the desired answer.
Queries regarding probability of
possibility (or necessity) require two levels
of knowledge and hence
cannot be answered by treating the
compatibility constraints as probability statements
having value 0 or 1. Rather, the compatibility
constraints must remain outside the probabilistic
model and serve as object-level theories
which, in themselves, are assigned probabilistic measures.
.sp
.H 3 "Dempster's Rule of Combination"
.IX "Dempster's rule"
.P 0
When several pieces of evidence are available, their
impacts are combined by assuming that the
corresponding switches act independently of each
other.
***** Note added:
Here the section continues by demonstrating that
the normalization involved in Dempster's rule turns
the interpretation of Bel(Q) into:
"The probability that Q is provable, conditional on
the pieces of evidence being (1) independent
and (2) non-contradictory.
Further, conditioning on the evidence being non-contradictory
may have funny, unintended consequences. For example,
a model from which an important piece of information is
missing may yield Bel(Q)=Pl(Q), thus giving the false
impression that we are in possession of sufficient
information for determining the probability of Q.
**** end of note ************************