Re: [UAI] Measures of dependence between random variables

From: Stefano Monti (268-3740) (smonti+@cs.cmu.edu)
Date: Thu May 25 2000 - 15:07:35 PDT

  • Next message: Nathalie Jitnah: "Re: [UAI] Measures of dependence between random variables"

    Dear Prakash,

    I'm not sure whether this is what you are looking for, and probably I'm saying
    something you already know. Anyway, in the Bayesian (model selection) approach
    to BN learning, rather than making use of a test of independence, the
    posterior distribution of a structure is used as a measure of its "goodness",
    and the higher scoring structure(s) is(are) favored.

    This can be readily adapted to provide for a test of
    independence by use of the Bayes factor. Very roughly,
    assuming we have two variables only, X and Y, to determine
    whether X and Y are dependent, we can compare the two
    hypotheses

        H_0: X Y (no dependence)
    and H_1: X --> Y (or X <-- Y, i.e., dependence)

    by computing the posteriors P(H_0|D) and P(H_1|D). The
    hypothesis/model with the highest posterior should be
    favored. However, this leaves open the question of how
    confident we can be about our conclusion, and, intuitively,
    the larger the difference, the higher the confidence. To
    quantify this confidence, the Bayes factor can be used,
    which is the ratio of the posterior odds to the prior odds:

     Bayes factor = ... = P(D|H_1)/P(D|H_0)

    If we take the log10 of the Bayes factor, the rule of thumb
    is that a value greater than 10 is 'strong' evidence in favor
    of H_1, and a value greater than 100 is 'decisive'
    evidence. The use of Bayes factors is not uncontroversially
    accepted, but then, what is? For details see, e.g.:

    @Article{
      author = "Robert E. Kass and Adriand E. Raftery",
      title = "Bayes Factors",
      journal = "Journal of the American Statistical Association",
      year = 1995,
      volume = 90,
      pages = "773--795",
      annote = "http://www.stat.washington.edu/tech.reports/tr254.ps"
    }

    Clearly, this doesn't say anything about how to compute
    P(D|H_i), and in most cases this is no easy task. The "easy"
    cases are when all the variables are discrete, e.g.:

    @Article{
      author = "Gregory F. Cooper and E. Herskovits",
      title = "A {B}ayesian Method for the Induction of
         Probabilistic Networks from Data",
      journal ="Machine Learning",
      year = "1992",
      volume = "9",
      pages = "309--347"
    }

    @Article{
      author = "David Heckerman and Dan Geiger and David M. Chickering",
      title = "Learning {B}ayesian networks: The combination of
         knowledge and statistical data",
      journal ="Machine Learning",
      year = 1995,
      volume = 20,
      pages = "197--243"
    }

    or all continuous and normally distributed, e.g.:

    @InProceedings{
      author = "Dan Geiger and David Heckerman",
      title = "Learning {G}aussian Networks",
      crossref = "uai94",
      editor = "R. Lopez de Mantras and D. Poole",
      booktitle = "Prooceedings of the 10th Conference of Uncertainty in AI",
      year = 1994,
      address = "San Francisco, California",
      annote = "http://www.sis.pitt.edu/~dsl/UAI94/Geiger1.UAI94.html"
    }

    Hope it helps. Best,

    -- ste

    "Prakash P. Shenoy" wrote:

    > Dear UAI colleagues,
    >
    > Could someone please point me to references on different measures of
    > dependence between random variables? I am familiar with
    > covariance/correlation coefficient that measures linear dependence between
    > variables. What else is out there that is being used by the Bayes net
    > learning community? Thanks in advance for the pointers.
    >
    > Prakash Shenoy
    > <pshenoy@ukans.edu>

    ___________________________________________________________________
    Stefano Monti | Voice: (412) 268-3740 Fax: (412) 268-5569
    CMU, Robotics Institute | Email: smonti+@cs.cmu.edu
    Pittsburgh, PA 15213 | http://www.cs.cmu.edu/~smonti

    ------- End of Forwarded Message



    This archive was generated by hypermail 2b29 : Thu May 25 2000 - 15:08:59 PDT