Re: "information gained" sometimes != "entropy reduction" ??

Tom Kane (tom@icbl.hw.ac.uk)
Thu, 03 Sep 1998 11:08:06 +0100

Hi Stephen,

I just saw your note this morning after coming back from a short
holiday in the Scottish rain, and thought I'd like to comment on
the distributions you used in your interesting note about how
entropy can be seen to rise as information is accrued.

Your first distribution:

>p(y=0,x=0)=0 p(y=1,x=0)=.98
>p(y=0,x=1)=.01 p(y=1,x=1)=.01
>
>Entropy = -.98 * lg .98 - .02 * lg .01 = .16

is actually a very resolved description of certainty. The possible world
(y=1,x=0) skews completely away from all of the others, and reinforces
not only that p(y) is greater than or equal to 0.98, but also that p(x) is
less than or equal to 0.02.

Wheras your second distribution:
>p(y=0,x=0)=0 p(y=1,x=0)=0
>p(y=0,x=1)=.5 p(y=1,x=1)=.5
>
>with entropy = -lg .5 = 1

although it is resolved as far as x is concerned, p(x)=1, it has absolutely
no information whatsoever to offer as far as y is concerned. And the equal
assignment of 0.5 to the final two worlds, p(y=0,x=1) p(y=1,x=1), is just
as probability would be applied among unstructured
possibilities in the maximum entropy formalism: such possibilities are
considered
equally likely and treated equally.

In other words, in the second distribution, we have:

p(x=1)=1 p(y=0)=.5 p(y=1)=.5

Which is a great deal less informative than

p(x=0) >= 0.98 p(y=1) >= 0.98

which is the information in the first distribution.

I don't find it at all counter-intuitive that the second distribution has
a higher entropy
than the first, because although (in the second dictribution) we know x to
be true, we know
nothing about y.

All the Best,
Tom Kane.

>> Similarly, isn't it true that the entropy of the joint over all
>> variables in the model has decreased?

I thought this was true also but consider the following simple
counterexample:
>
>p(y=0,x=0)=0 p(y=1,x=0)=.98
>p(y=0,x=1)=.01 p(y=1,x=1)=.01
>
>Entropy = -.98 * lg .98 - .02 * lg .01 = .16
>
>When we learn that x=1 this becomes:
>
>p(y=0,x=0)=0 p(y=1,x=0)=0
>p(y=0,x=1)=.5 p(y=1,x=1)=.5
>
>with entropy = -lg .5 = 1
>
>So the entropy goes up after we gain information about the state! Most
>disturbing!
>
>Stephen Omohundro
>
>
>
>
>
>
>
>