Re: Total Ignorance

Kathryn Blackmond Laskey (klaskey@gmu.edu)
Thu, 10 Jun 1999 11:08:01 -0400

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Joseph Halpern: "Re: Bayesian priors representing ignorance"
Previous message: Fabio Gagliardi Cozman: "Re: Total Ignorance"

Thank you, Jonathan, for a very clear example.

Bel(R) corresponds to the probability that a person's hat-and-glasses
status "proves" his/her prize status. In other words, without knowing
anything about the gender proportions or their relationship to hats and
glasses, but knowing only that hat-wearing and glasses-wearing are
independent, for 72% of the people, we can "prove" they will get a prize
without regardless of their gender. Another way of looking at this is that
for 72% of the people, having a sex change operation while leaving the
glasses and the hat as they are, would leave their prize status invariant.
Similarly, 98% of the people can be "proven" not to get a prize regardless
of counterfacutal manipulation of their gender.

While semantically meaningful (if a bit mind-contorting) this is hardly
something one would care about for this problem.

Can we find an example where someone might care about these 72% and 98%
bounds? What if we recast this example in Judea's room-scheduling or Rolf's
argumentation system domain? If we can do this, it might help us to
characterize the set of problems for which belief functions are
appropriate. It might also give us a handle on whether the extra formalism
has practical value to balance the danger of producing nonsense with a
mathematical penache.

Here's a stab. We are trying to schedule classrooms.

All daytime engineering classes are scheduled in Building A.
All daytime liberal arts classes are scheduled in Building B.
All evening undergraduate classes are scheduled in Building A.
All evening graduate classes are scheduled in Building B.

90% of the daytime classes are engineering. 80% of the evening classes are
undergraduate.

To get the 72% and 98% bounds, assume that whether a class is engineering
or liberal arts is independent of whether it is graduate or undergraduate.

Now for 72% of the classes, we can "prove" just from whether they are
engineering/liberal arts and undergraduate/graduate, that they will be in
Building A. In other words, moving them from daytime to evening or vice
versa wouldn't change the building assignment. For 98% of the classes we
can "prove" they are in Building B, and this couldn't be changed by
manipulating the time of day.

I guess if there were reasons for the above constraints on building
assignments, and if the time assignment was flexible, this might be
something that was useful to know. You would know that you had only
limited latitude to change the building assignment by manipulating time of
day.

It seems, then, that the "probability of provability" interpretation ties
belief functions to counterfactuals. Bel(R) and Pl(R) are bounds on the
probability of R that could be achieved by manipulating Q but in a way that
leaves unchanged the causal mechanism by which Q causes R. The problem
with the party example is that it doesn't make semantic sense to think
about manipulating the gender of a person while leaving his/her hat status
unchanged. So uncritical application of belief functions to that problem
produces bounds that are not useful for anything one would care about.

Whether the bounds given by belief functions have meaning one would care
about would seem in this example to depend on the following structural
assumptions:

- Q (in this case, day/evening) is "causal" to R (in this case, building
assignment) in the sense that manipulating Q changes R.
- A1 (in this case, engineering/liberal arts) and A2 (in this case,
undergraduate/graduate) are factors that influence the manner in which Q
and not-Q, respectively, cause R.
- The bounds of .72 and .98 depend on the assumption of independence of A1
and A2. This assumption is not necessary for belief functions to apply.
As I stated previously, one can different belief functions with the same
conditional beliefs by changing the dependence structure of A1 and A2.
However, independence is typically assumed by practitioners with (I
believe) insufficient understanding of what this assumption means.

Unfortunately, in practice belief functions are often applied to all sorts
of problems for which the probability of provability interpretation is
problematic or outright inappropriate. Its inappropriateness was why I
dropped my attempts to use belief functions to model inference about
friendly/hostile aircraft.

Examining this example in detail reminds me of why I came away in the late
80's convinced that uncritical application of belief functions to problems
of "incomplete information" was a dangerous "easy way out" of certain hard
modeling problems one can run into. Using belief functions because you
don't know how to model your ignorance "solves" one problem only to create
a new problem of hidden modeling assumptions that most modelers (let alone
subject matter experts) don't fully understand. You can turn the crank and
get an answer without having to specify a prior for those pesky variables
you feel ignorant about. The price you pay is that the applicability of
your answer to the problem depends on assumptions whose implications you
probably wouldn't accept if you understood them. I'd rather make my
assumptions upfront and take responsibility for them, rather than bury them
inside the black box of my calculus.

Kathy Laskey

At 12:35 AM -0400 6/10/99, Jonathan Weiss wrote:
>OK, let's try something a little more concrete that doesn't require Bayesian
>priors:
>
>At a dance, there was a special contest with many prizes awarded. The rules
>were quite simple:
>
>All men who wear a hat get a prize.
>All men who don't wear a hat don't get a prize.
>All women who wear glasses get a prize
>All women who don't wear glasses don't get a prize.
>
>Exactly 90 percent of the men wore hats (and therefore got prizes).
>Exactly 80 percent of the women wore glasses (and therefore got prizes).
>
>(Just in case there are any lawyers out there, everyone at the dance was
>either
>a man or a woman, and nobody was both a man and a woman. Sheesh!)
>
>With no further information, what can you say about the overall percentage of
>people at the dance who got prizes?
>
>The answer isn't completely determined because it depends on the ratio of men
>to women at the dance. However, if p is the fraction of attendees that were
>men (0 <= p <= 1), the answer is (.9p + .8(1-p)), which takes on its
>minimum or
>80% when the dance has all women and its maximum of 90% when the dance has all
>men. If you can show me how as few as 72% or as many as 98% ot the attendees
>received prizes, I'll take off my hat (and glasses) to you.
>
>What changes when we replace "Man" by Q, "Woman" by ~Q, "Hat" by A1, "Glasses"
>by A2, and "Prize" by R?
>
>To continue the parallel with Rolf's example (quoted below),
>
>In the dance example, we don't know anything about Q: not its "prior", not
>the
>proportion of hat-wearers who are men, not the proportion of
>hat-and-not-glasses-wearers who are men, etc.
>
>>For example, if Q depends on A1 and A2 in the following way:
>>
>> 1) (A1 and 括2) --> 昨,
>> 2) (括1 and A2) --> Q,
>>
>>then we get Bel(R)=Pl(R)=P(R)=0.72 and Bel(昱)=Pl(昱)=P(昱)=0.28
>
>(1) corresponds to "None of the men wear hats but no glasses", which is the
>same thing as "All of the hat-wearing men also wear glasses". (2) corresponds
>to "All the glasses-wearing women also wear hats". If this is the case, then
>A1 and A2 are definitely not independent, so there is no way Pr(A1,A2) can be
>.72. In fact, its values are restricted to [.8,.9]
>
>Once again, the problem is not the use of belief functions, but the failure to
>represent constraints adequately. Another simple example: There are three
>numbers A, B, and C that sum to 10. A is in the interval [2,5], B is in the
>interval [3,6], and C is in the interval [1,4]. Now define D = A+B+C.
>With no
>further information, what can we say about the possible values of D? If your
>answer is that it can range over the interval [6,15] you ignored the "sum to
>10" constraint.
>
>Belief functions are fine (for those who like them), but they can't ignore
>constraints such as nonnegativity or summing-to-1 without logical problems
>down
>the road. Yes, this takes away some of the simplicity of representation and
>some of the freedom that make belief functions attractive, but the alternative
>is inconsistency and/or incoherency. (Note: I remember fighting the same
>battle, with only limited success, in the fuzzy logic community in the late
>1970s.)
>
>Jonathan Weiss
>
>At 6/9/99 08:35 AM, Rolf Haenni wrote:
>>Hi all,
>>
>>thanks to Judea's help clarifying the situation by identifying the notion
>>of BELIEF as the "PROBABILITY OF NECESSITY" (or probability of
>>provability). I agree with this point of view. In fact, in probabilistic
>>argumentation systems, instead of BELIEF we prefer to say DEGREE OF
>>SUPPORT, which is defined as the probability of the supporting arguments (=
>>possible proofs). More precisely, it's a conditional probabilitiy given no
>>contradiction.
>>
>>Unfortunately, I don't have the time to reply to every individual point
>>discussed in the emails I received today. However, I see that proponents of
>>the Baysian approach have big difficulties to accept a value Bel(R)=0.72
>>lower than 0.8 and a value Pl(R)=0.98 higher than 0.9, as Kathryn B. Laskey
>>said:
>>
>>>One can "explain" the phenomenon by saying that there is only a 0.72 chance
>>>that "the evidence would prove R," but I was never able to come up with a
>>>way to argue this convincingly to a subject matter expert. I guess that's
>>>because I can't argue it convincingly to myself. I can follow the
>>>mathematics, but I don't have a handle on what it means.
>>
>>Let me try to clarify this further. As I already said, the crucial point is
>>the total ignorance about Q. Total ignorance means YOU DON'T KNOW ANYTHING
>>ABOUT Q, i.e. first of all, you don't know a prior probability (see my
>>example about the existence of god), but secondly, it also means that you
>>don't even know whether such an (independent) prior probability exists.
>>Note that the truth of Q could possibly depend on R, or on A1 or on A2. YOU
>>DON'T KNOW IT.
>>
>>For example, if Q depends on A1 and A2 in the following way:
>>
>> 1) (A1 and 括2) --> 昨,
>> 2) (括1 and A2) --> Q,
>>
>>then we get Bel(R)=Pl(R)=P(R)=0.72 and Bel(昱)=Pl(昱)=P(昱)=0.28. In
>>contrast, if Q depends on A1 and A2 by
>>
>> 1) (A1 and 括2) --> Q,
>> 2) (括1 and A2) --> 昨,
>>
>>then Bel(R)=Pl(R)=P(R)=0.98 and Bel(昱)=Pl(昱)=P(昱)=0.02. This shows how
>>the probabilities for the cases 2) and 3) can simultaneously jump either to
>>R or to 昱 (see my last email):
>>
>> 1) A1 and A2 --> R is automatically true (0.72)
>> 2) A1 and 括2 --> nothing can be said about R (0.18)
>> 3) 括1 and A2 --> nothing can be said about R (0.08)
>> 4) 括1 and 括2 --> 昱 is automatically true (0.02)
>>
>>To summarize, if nothing is known about Q (not even whether an independent
>>prior probability exists), then it makes perfectly sense to say that the
>>Belief (or the probability of the provability) is 0.72. The intuition that
>>the value must be between 0.8 and 0.9 comes from the assumption that a
>>prior probabilty exists.
>>This is finally the main point producing all the confusion. It's clear,
>>that for a proponent of the Bayesian approach is perhaps difficult to give
>>up the assumption that prior probabilities exist. However, I think it's
>>necessary in order to capture the nature of total ignorance properly.
>>
>>The message of K.S.Van Horn underlines all this:
>>
>>KEVIN S. VAN HORN wrote:
>>>...regardless of the value of P(Q), we know from 0 <= P(Q) <= 1 that 0.8 <=
>>>P(R) <= 0.9.
>>
>>==> as I said, it may be difficult to give it up!!! :-)
>>
>>KEVIN S. VAN HORN wrote:
>>>Again, Haenni's theory is losing information by giving unnecessarily
>>>loose bounds.
>>
>>==> or should we say, YOU are ADDING information??? :-)
>>
>>To conclude, I think it should be clear now that the main difference
>>between the Bayesian and the Belief Function approach is just given by the
>>way in which total ignorance is handled. For me, the "existence of
>>God"-example is a strong indication that total ignorance is handled more
>>accurately by belief functions (and also by probabilistic argmentation
>>systems), that's all.
>>
>>Enjoy your day,
>>
>>Rolf Haenni
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>************************************************************************
>>* *
>>* Dr. Rolf Haenni __/ __/ __/ __/ _______/ *
>>* Institute of Informatics (IIUF) __/ __/ __/ __/ __/ *
>>* University of Fribourg, Switzerland __/ __/ __/ __/ _____/ *
>>* Phone: ++41 26 300 83 31 __/ __/ __/ __/ __/ *
>>* Email: rolf.haenni@unifr.ch __/ __/ ______/ __/ *
>>* *
>>************************************************************************
>>* World Wide Web:
><http://www2-iiuf.unifr.ch/tcs/rolf.haenni>http://www2-iiuf.unifr.ch/tcs/rol
>f.haenni *
>>************************************************************************
>>

Next message: Joseph Halpern: "Re: Bayesian priors representing ignorance"
Previous message: Fabio Gagliardi Cozman: "Re: Total Ignorance"