Bayesian Networks and Belief Functions

Smets Philippe (psmets@ulb.ac.be)
Thu, 10 Jun 1999 18:36:04 +0100

Philippe Smets:

Lao Tse said: "Knowing Ignorance is Strength".
And somebody else said 'Ignorance is beautiful: once lost, can never be
recovered'.

I read with interest the discussion generated by Rolf Haenni's mail.
I like to submit the next comments on the next topics.
1) About sequences of 0 and 1.
2) About representing ignorance.

***********************************************
1) About sequences of 0 and 1.

It seems many probabilists consider that every sequence of 0 and 1 is
somehow random.
So let us discuss about such 0-1 sequences.

A very rough typology: would be:

Set 1: the set of all 0-1 sequences

Set 2: the set of 0-1 sequences which proportion of 1 converges to a given
value as the length of the sequence increases (denoted prop(1) and which
belongs of course to [0,1])
Set 2 is a strict subset of set 1. I can build a 0-1 sequence where prop(1)
oscillates between 1/3 and 2/3 for ever)

Set 3: the set of 0-1 sequences which proportion of 1 converges to a given
value and that is 'random'. What is 'random' or 'stochastic'? how do you
recognize that a sequence is 'random'? not an easy task, as illustrated by
the difficulty encountered when people tried to define the concept of
'complexity' for such sequences. Set 3 is a strict subset of Set 2.
In Set 2 we have deterministic sequences like 1 1 1 1 1 1 1 1 1Š. that
few people will defend as being 'random'.

Set 4: the set of 0-1 sequences like those generated by Bernoulli trials
(the concept of conditional independence between successive outcomes are
introduced). Here we get sequences generated by coins tossing and urn
samplingŠ

In Set 4, we can define several subsets according to what is known by me
about the probability p(1) that the next outcome is 1:
Set 4.1: we only know that p(1) belongs to some subset of [0,1]
Set 4.2: we have a meta-probability about the value of p(1)
Set 4.3: we know exactly the value of p(1)
4.3 is a s ubset of 4.2, itself a subset of 4.1.

Now let us go back to the problem about 'I don't know' raised by David Poole.
He seems to be focussing on set 4 sequences.
There are two cases:
Case 1: the situation where you are in 4.3 and p(1) = .5 (coin tossing
experiments)
Case 2: called here 'total ignorance' where we are in set 4.1 and where the
subset is [0,1].

Note 1: For those who do not know about belief function and the
transferable belief model (TBM), i.e.,
1) the difference between the credal level where beliefs are only
entertained and represented by belief functions and the pignistic level
where decisions are taken and uncertainty is represented by probability
functions (called the pignistic probability and denoted BetP),
2) how BetP is derived from the belief function that represents your
beliefs at the credal level,
we suggest reading the paper by Smets Ph and Kennes R. The transferable
belief model, Artif. Intell. 1993 or 1994. (sorry I don't have the correct
reference with me, but it is easy to find, and if you wait I'll send the
correct reference in a week)

With belief function and within the TBM, you describe these two situations as:
Case 1: bel(1) = pl(1) = .5, and BetP(1) = .5
Case 2: bel(1) = 0, pl(1) = 1, and BetP(1) = .5
Indeed in both cases, I would bet identically (with .5) (at the pignistic
level I have the same probability function) but I don't share the same
belief at the credal level.
Can we discover the difference between th two cses: easily.
Repeat experiment ten times, you observed 3 0's and 7 1's. How would you
bet on the next outcome?
In Case 1: With probability .5. (you knew p(1)=.5 so it stays as it is).
In Case 2: probably you will defend .7 for next outcome (this is what you
get when you use a equi-prior on [0,1]).
So the difference is clear. It does not show up at the statis level, but
once the dynamic behavior of the system of belief is considered,
differences show up clearly.

Now MY question: we should be able to handle also the series in sets 1 and
2 (we neglect set 3 as it would mean discussing about correlations, and
this is not the topic under consideration). I don't see how probability
theory could be used in such cases.

Remember in set 1, there are series where the limit of the proportion of 1
does not exist, and in set 2, there are also non-ramdom series.

In set 1, the concept of p(1) is no defined, so speaking of imprecise
probability seems non-sensical. All you might speak of is the way you would
bet on the next outcome, and call that the 'probability', but does this
number reflects your beliefs, this is what I object in the TBM. The betting
quotient tells us how we bet, not how we belief. When it comes to non
repeatable experiment, I feel that we should act as if we were facing a set
1 case where the sequence is of length 1. I feel the TBM models such cases
nicely, bu other theories like possibility theory might be also defend.
Probability theory would have difficulty representing total ignorance
(forget about p(1) = .5, this is not the probability an agnostic gives to
the existence of god, see also my comments below about the unknown
proposition).

In set 2, you would assume by default that the sequence is random, and act
as if you were facing a set 4 case. This is a realistic option, but is it
correct?

Note 2: Beware: the TBM is NOT a model where:
1) we have a probability function which value is only known to be in a
given subsets of all possible probability functions (as would be the case
for set 4.1)
2) we have a probability distribution on a space X and a one-to-many
mapping from space X to another space Y, and we ask for the induced
probability function on Y (a random set interpretation that fits with
Dempster's model, Kohlas and Monney' hints model, Rolf Haenni's
'Probabilistic Argumentation Systems' (Rolf is Kohlas's collaborator).

The TBM intends only to represent in a quantified way the strength of the
beliefs held by an agent, but does not just accept the probability axioms,
in particular the axiom 'proba(not A) is a function of proba(A)' is not
assumed as done by Cox and Jaynes (who just states it and accepts it
without worrying to justify its necessity). (The TBM is close to Shafer's
approach as described in his book).

***********************************************
2) We present now an example where ignorance seems better handled by belief
functions.
Beliefs and bets on unknown propositions. (published in AI paper on the TBM)

A-context
Suppose 2 pieces of papers labeled A1 and A2.
On A1, I write a proposition (all propositions are verifiable), but I don't
tell you what the proposition is.
On A2, I write the negation of the proposition written on paper labeled A1.
What is your belief and how would you bet.
I suppose the Bayesian solution will be: p(A1) = p(A2) = .5 (what is of
course reasonable)
The TBM solution: bel(A1) = bel(A2) = 0, bel(A1 or A2) = 1, BetP(A1) =
BetP(A2) = .5

B-context
Suppose 3 pieces of papers labeled B1, B2 and B3.
On B1, I write a proposition, but I don't tell you what the proposition is.
On B2, I write another proposition but I don't tell you what the
proposition is.
On B3, I write still another proposition but I don't tell you what the
proposition is.
All you know is that the propositions are pair-wise inconsistent, and their
disjunction B1 or B2 or B3 is a tautology.
What is your belief and how would you bet.
I suppose the Bayesian solution will be: p(B1) = p(B2) = p(B3) = 1/3
The TBM solution: bel(B1) = bel(B2) = bel(B3) = bel(B1 or B2) = bel(B1 or
B3) = bel(B2 or B3) = 0, bel(B1 or B2 or B3) = 1, BetP(B1) = BetP(B2) =
BetP(B3) = 1/3

AB-context:
You learn that the proposition written on paper A1 is the same as the one
on paper B1.
How would you adapt your beliefs and probabilities?

For the Bayesian, which solution will you choose: p(A1) = p(B1) = .5 or
1/3, or??? Beware that whenever you decide one of them, I'll challenge your
choice by asking why not the other.
For the TBM: bel as before, indeed bel(A1) = bel(B1) as it should (and both
= 0). For the pignistic probabilities, they are different but this reflects
only the fact that we bet on different frames. What is required is that
after learning A1 == B1, both A1 and B1 share the same belief as they
represent the same proposition.

***********************************************
To Kevin van Horn.
'the use of group invariance arguments' is a nice mathematical argument,
but is it required when you want to represent beliefs? I feel it is
essentially an assumption justified only for convenience (and very
brilliant mathematically)

***********************************************
For details on how to handle case belonging to set 4.1 when observation
have been collected, see the paper Smets Ph. Beliefs induced by imprecise
probabilities, published in Uncertainty in AI, mid 90's. (title and years
approximate)
***********************************************

********************************************
Philippe Smets

email: psmets@ulb.ac.be

IRIDIA-CP 194/6
Universite Libre de Bruxelles
50 av. Roosevelt,
1050 Bruxelles, Belgium.

tel 32 2 650 27 29 secretary,
32 2 344 82 96 private (where I am usually)
fax: 32 2 650 27 15
GSM: 32 495 50 10 72
********************************************