Re: [UAI] Degree of relevance in Bayesian Networks

From: Nick Hynes (admin@1site.co.nz)
Date: Wed Jun 06 2001 - 16:03:10 PDT

  • Next message: Ian Miguel: "[UAI] CP2001 Call for Tutorial Proposals"

    God, I just read my own e-mail & thought (sarcastically) "well that really
    cleared that up."

    My research is concerned with machine learning algorithms, including
    statistical methods.

    I have looked at the issue of 'feature selection' - i.e. trying to determine
    which input features are necessary to predict the target value, and which
    are just adding noise.

    Typically these methods can be divided into those that deal with: a)
    supervised data, and b) unsupervised data. By 'supervised' I mean that you
    will have training examples which already have a target output value
    associated with them - as opposed to unsupervised data where you only have
    the inputs, but don't have any examples with outputs associated with them.

    I had a look at RELIEF-F - an algorithm that requires supervised data.

    Supervised data is not so common with datamining problems, since most
    commercial databases are constructed for some reason other than datamining,
    and will not be classified when you first see them. There is a version of
    RELIEF-F that works with unsupervised data called SUD. References to thes
    algorithms are available in the seminar notes:
    http://www.cs.auckland.ac.nz/~pat/760_2001/seminars/nicks760.html.

    Hope that's a bit clearer,

    ;)

    Nick.

    - ----- Original Message -----
    From: "Nick Hynes" <admin@1site.co.nz>
    To: <uai@cs.orst.edu>
    Sent: Friday, June 01, 2001 5:18 AM
    Subject: Re: [UAI] Degree of relevance in Bayesian Networks

    > Hi Samuel,
    >
    > I started to look at the issues around feature selection. Most of the
    > current methods used are fairly non-stochastic. The statistical methods
    are
    > robust to independent features (i.e. features that tell you nothing about
    > the target class/value), and so not many statisticians have not considered
    > reducing the number of features.
    >
    > In the more traditional machine learning arena this problem has been
    looked
    > at by a number of authors. Principle Component Analysis is commonly used -
    > search on http://www.researchindex.org/. Any new branch of research claims
    > to work well with highly dependent variables, and uses clustering. I
    > presented a seminar on this branch of work, which you can find at:
    > http://www.cs.auckland.ac.nz/~pat/760_2001/seminars/nicks760.html
    >
    > I should point out that the mathematical justification for these methods
    is
    > immature - some short comings are highlighted on the website.
    >
    > ;)
    >
    > Regards,
    > Nick Hynes.
    >
    >

    ------- End of Forwarded Message



    This archive was generated by hypermail 2b29 : Wed Jun 06 2001 - 16:12:11 PDT