Re: "information gained" sometimes != "entropy reduction" ??

Alexandr A. Savinov (savinov@math.md)
Tue, 25 Aug 1998 11:35:55 +0300

Bruce D'Ambrosio wrote:
>
> > I thought this was true also but consider the following simple
> > counterexample:
> >
> > p(y=0,x=0)=0 p(y=1,x=0)=.98
> > p(y=0,x=1)=.01 p(y=1,x=1)=.01
> >
> > Entropy = -.98 * lg .98 - .02 * lg .01 = .16
> >
> > When we learn that x=1 this becomes:
> >
> > p(y=0,x=0)=0 p(y=1,x=0)=0
> > p(y=0,x=1)=.5 p(y=1,x=1)=.5
> >
> > with entropy = -lg .5 = 1
> >
> > So the entropy goes up after we gain information about the state! Most
> > disturbing!
> >
> > Stephen Omohundro
>
> Very interesting.
>
> Perhaps it is normalization that is misleading us here? the entropy of
> the unnormalized distribution does decrease (unless all values in the
> joint set to zero by the observation were already zero, in which case
> we learned nothing):
>
> 0 0
> .01 .01
>
> - .02*lg.01 = .1329
>
> I suspect this is a fairly simple question for someone familiar with
> information theory - anyone?
>
> tnx - Bruce
> dambrosi@cs.orst.edu

Maybe I do not understand the problem/question since I do not
have the thread start messages but it seems normal. Consider
1-dimensional example

d1 = {1/4, 3/4}, old information (current state)
d2 = {3/4, 1/4}, new information

d1 * d2 = {1/2, 1/2}, entropy(d1*d2) > entropy(d1) -- (no information)
d1 * d1 = {1/10, 9/10}, entropy(d1*d1) < entropy(d1)

So the entropy can both increase and decrease when we learn new
information. It depends how this new information is connected
with the old information.

In multidimensional case the situation is the same. If we have
an arbitrary first distribution (e.g., 2-dimensional described in the
previous message) then we can always find a distribution about one
variable (e.g., 2-valued variable x) so that it either increases or
decreases the joint information (except for some special cases where
they are independent and the joint information is not changed). In the
previous example, if we learn that x=0 we obtain that the information
is increased (the entropy is decreased).

So to learn something does not mean that the certainty (the quantity
of information) will be increased. Usually the loss of information can
be interpreted as a contradiction with the old information, with
the current state of knowledge. Thus in probabilistic approaches
the higher contradiction of two propositions the higher uncertainty
of the joint proposition (in contrast to, e.g., fuzzy approaches).

Regards,

Alexandr Savinov

--
Alexandr A. Savinov, PhD
Senior Scientific Collaborator, Laboratory of AI Systems
Inst. Math., Moldavian Acad. Sci. 
str. Academiei 5,  MD-2028 Kishinev, Moldavia
Tel: +3732-73-81-30, Fax: +3732-73-80-27
mailto:savinov@math.md
http://www.geocities.com/ResearchTriangle/7220/