I have a brief question concerning terminology, this time
about "information."
As a pleasant learning exercise, I am reinventing the wheel of
Bayesian network inference. As one of the subsidiary outputs, I am
planning to compute the difference in entropy between the posterior
for some variable before a certain evidence item is introduced and
the entropy of the posterior of the same variable after the evidence.
Now what we'll usually see, I imagine, is that evidence usually
reduces the entropy of the posterior, and I believe it is consistent
with conventional terminology to say "reduction of entropy == gain
of information" -- so many bits per item of evidence.
But I know there is no guarantee that the posterior will have less
entropy after the evidence is introduced. (I often have that feeling
of "now I am more confused than before!")
In this scenario, where is the "information gain"? In absorbing the
evidence, something is gained -- but what? What is the quantity
(if there is one) that's always increased by absorbing evidence?
I can, of course, leave the word "information" out of the picture and
refer simple to "change of entropy". But "information" is so suggestive
and attractive -- I would rather use it if I can.
Your comments are greatly appreciated.
Regards,
Robert Dodier