I agree with Finn's comment.
Mathmatically, for any two variables X and Y (n variable case
is similar), the joint entropy H(X,Y) is proven to be no less than the
conditional entropy of X given Y, H(X|Y):
H(X,Y) >= H(X|Y).
H(X|Y) represents, on average, the information in X after Y is learned.
On the other hand, we are talking about the posterior entropy
H(X|Y=y0) when a specific value of Y=y0 is learned. If we compare the
definition H(X,Y) = - Sum_x,y P(x,y) log P(x,y)
with H(X|Y=y0) = - Sum_x P(x,y0)log P(x|y0),
then it is possible to choose P(X,Y) and y0 to make H(X|Y=y0) larger
or smaller than H(X,Y) as demonstrated by the examples of serval others.
Yang Xiang, Ph.D. Associate Professor
Department of Computer Science Tel: (306) 585-4088
University of Regina Fax: (306) 585-4745
Regina, Saskatchewan E-mail: yxiang@cs.uregina.ca
Canada S4S 0A2 WWW: http://cs.uregina.ca/~yxiang/