God, I just read my own e-mail & thought (sarcastically) "well that really
cleared that up."
My research is concerned with machine learning algorithms, including
statistical methods.
I have looked at the issue of 'feature selection' - i.e. trying to determine
which input features are necessary to predict the target value, and which
are just adding noise.
Typically these methods can be divided into those that deal with: a)
supervised data, and b) unsupervised data. By 'supervised' I mean that you
will have training examples which already have a target output value
associated with them - as opposed to unsupervised data where you only have
the inputs, but don't have any examples with outputs associated with them.
I had a look at RELIEF-F - an algorithm that requires supervised data.
Supervised data is not so common with datamining problems, since most
commercial databases are constructed for some reason other than datamining,
and will not be classified when you first see them. There is a version of
RELIEF-F that works with unsupervised data called SUD. References to thes
algorithms are available in the seminar notes:
http://www.cs.auckland.ac.nz/~pat/760_2001/seminars/nicks760.html.
Hope that's a bit clearer,
;)
Nick.
- ----- Original Message -----
From: "Nick Hynes" <admin@1site.co.nz>
To: <uai@cs.orst.edu>
Sent: Friday, June 01, 2001 5:18 AM
Subject: Re: [UAI] Degree of relevance in Bayesian Networks
> Hi Samuel,
>
> I started to look at the issues around feature selection. Most of the
> current methods used are fairly non-stochastic. The statistical methods
are
> robust to independent features (i.e. features that tell you nothing about
> the target class/value), and so not many statisticians have not considered
> reducing the number of features.
>
> In the more traditional machine learning arena this problem has been
looked
> at by a number of authors. Principle Component Analysis is commonly used -
> search on http://www.researchindex.org/. Any new branch of research claims
> to work well with highly dependent variables, and uses clustering. I
> presented a seminar on this branch of work, which you can find at:
> http://www.cs.auckland.ac.nz/~pat/760_2001/seminars/nicks760.html
>
> I should point out that the mathematical justification for these methods
is
> immature - some short comings are highlighted on the website.
>
> ;)
>
> Regards,
> Nick Hynes.
>
>
------- End of Forwarded Message
This archive was generated by hypermail 2b29 : Wed Jun 06 2001 - 16:12:11 PDT