Re: Bayesian priors representing ignorance

Rich Neapolitan (neo@megsinet.net)
Thu, 10 Jun 1999 11:53:32 -0500

Seems to me we would not use probability of the value of color if we did
not have the concept of color. We can speak of P(color=x), P(sex=x), but
what could P(?=x) mean?

Given we identify a variable, there are two siuations: one where we know
the number of values or categories in advance, and one where we do not.
There are quite a few arguments for often using the Dirichlet distribution
to quantify prior beliefs about relative frequencies in the former case. In
the latter case, before we see any categories there is only one event,
namely that the next item will be from a new category. For example, if I am
about to be stranded on an island, I may expect to see species but I have
no idea how many. P(first creature seen will be of an unseen species)=1.
After seeing the first creature, things obviously change depending on your
prior beliefs. Sandy Zabell discusses this problem in detail and has some
recent concrete results. I will give the references if anyone is interested.

Rich

arrive on as At 09:14 PM 6/9/99 -0600, Kevin S. Van Horn wrote:
>In the context of assigning Bayesian priors that represent complete
>ignorance, Jonathan Weiss asks:
>
> 1) Someone presents you with a huge deck of cards (not standard playing
> cards -- each card has a spot of a given color on it). Before even
> one card is seen, what is the probability that the first card dealt
> is red?
>
>The problem as stated is ill-posed until we know what set of alternatives we
>are considering. Suppose that the only alternatives I know of are red and
>not-red, and I am otherwise completely ignorant -- in particular, I don't
>even know that red is a color or I haven't the vaguest idea what colors are.
>Call this state of information X0. Then, by the permutation invariance
>argument given below, this state of ignorance must be represented by P(red |
>X0) = 1/2.
>
> 2) Assuming you assigned some finite probability P(red), now for the same
> card that you still haven't seen, what is the probability that it it
> blue?
>
>Let's continue to assume that I know nothing beyond what is stated in the
>problem, and am still ignorant of the concept of color. Assuming that I
>can't rule out the possibility that the card is neither blue nor red, I
>am now aware of three possibilities: red, blue, and not-red-not-blue. Call
>this state of knowledge X1. Then P(red | X1) = 1/3.
>
>This might seem to contradict my previous assessment of P(red | X0) = 1/2.
>But X0 and X1 are not identical states of information. We are talking about
>two qualitatively different conditional probabilities, one conditioned on
>X0, the other conditioned on X1. It should surprise nobody that my
>assignment of probabilities changes when I have access to more information.
>
> 3) Now, what is the probability that it is yellow? Black? Purple?
> Orange? White? Fuchsia? etc.? Has your P(red) assessment changed?
> How many colors can you name? Are you willing to assign them equal
> probabilities just based on ignorance?
>
>Yes: again assuming a complete ignorance of the concept of color, P(red | X)
>changes as X -- my set of mutually exclusive and exhaustive possibilities
>(sample space) -- changes. And yes, if I am truly ignorant, and cannot
>attach any semantic content to these labels, then the only sensible thing I
>can do is assign equal probabilities to the possibilities.
>
> 4) Now, suppose you are told reliably that every card in the deck is either
> red, blue, or green. Now what is your P(red)?
>
>Call this state of information X2. Then P(red | X2) = 1/3.
>
>Here's the permutation-invariance argument. Suppose I relabel the colors,
>for example, I relabel red as "blue", blue as "green", and green as "red".
>Call this state of information X2'. If I am truly ignorant, I can't
>distinguish between this problem and the original, so the probability
>distributions conditional on X2 and X2' should be the same. This holds for
>any permutation of the labels. The only distribution that remains invariant
>under any permutation of the labels is the uniform distribution, that is,
>P(c | X2) = 1/3 for each label c.
>
> 5) One more bit of information now: among the blue cards, there are light
> blue and dark blue. Does this change P(red)?
>
>The important phrase here is "one more bit of information": our
>probabilities are conditioned on different information than we had in
>problem (4). So, of course, P(red | X3) != P(red | X2), where X3 is the
>state of information described in (5). And, as a truly ignorant person who
>doesn't know what "light blue" and "dark blue" mean, this is no different
>from breaking up not-red into blue and not-red-not-blue, as in (2).
>
>What's really going on here is that Weiss is playing bait and switch: he
>asks us to assign probabilities based on an assumption of total ignorance,
>then criticizes those assignments based on *additional* information that a
>person totally ignorant of the semantic content of the labels "red,"
>"green," et cetera would not have. The fact that these labels are colors
>immediately makes relevant a great body of information we all have about
>colors. We are not, in fact, in state of complete ignorance.
>
>However, there is a form of ignorance that is worth examining here. Human
>color perception is such that three coordinates -- for example, hue, chroma,
>and lightness -- suffice to specify all perceivable colors. The set of all
>colors then occupies a compact three-dimensional volume. I haven't examined
>the problem (nor studied color theory) in sufficient detail to give a
>compelling argument that one particular prior over this volume represents a
>state of complete ignorance, but my intuition suggests that a uniform
>prior over, say, the color space of the Munsell Color System, should do the
>job. (My reasoning is that equal volumes in this color space apparently
>represent equal volumes in human perceptual space.)
>
>Weiss continues:
>
> [...] what would be an uninformed prior over the set of real numbers?
>
>It depends on what kind of parameter you are talking about. If you have a
>location parameter, translation invariance arguments give an (improper)
>uniform prior over the entire real line. If you have a scale parameter,
>scale invariance arguments give an (improper) prior proportional to 1/x
>(uniform over log x).
>
>-- Kevin S. Van Horn
>