The current methodology for applying machine learning can be
depicted as follows:
A machine learning researcher confronts a set of data. Typically,
there is a large gap between the raw data and the desired "prediction
target" (i.e., the variable whose value we wish to predict). The ML
researcher bridges this gap by defining a set of features. To do
this, he or she interviews experts in the domain to understand how the
raw data relates to the prediction target. He or she applies ML
expertise to choose a set of features that reflect this domain
knowledge and that narrow the gap. Then one or more machine learning
algorithms are applied to build a classifier or predictor.
This methodology has many problems. It is difficult to apply, because you need someone with a PhD in machine learning or statistics to design the features. It is difficult to maintain. If new sources of data become available, the feature design process must be repeated. In addition, the rationale behind the set of features is typically not captured in any formal or informal way.
We propose a new methodology that replaces the hand-crafting of
features by the hand-crafting of contextual knowledge, as shown below:
In this methodology, contextual knowledge is captured in the form of
an object-relational model and a collection of qualitative causal
relationships. This information is then automatically transformed to
design the features for a learning system and a set of constraints
that can be incorporated into the learning system. The hope is that
the explicit contextual knowledge will be easier for someone without a
PhD in machine learning to maintain. We also believe that if the
contextual knowledge is made explicit, then it can be incorporated
into the learning algorithms to constraint the parameter values of the
fitted predictive models. This should enable fast learning from small
samples, and thus provide higher-performance systems than can be
obtained with the current methodology.
In our methodology, we envision the following steps which result in a probabilistic relational model (PRM) that can be fit to the available data.
We are developing and testing this methodology in three application domains: (a) TaskTracer (the intelligent desktop), (b) predicting the spread of West Nile Virus, and (c) predicting grasshopper populations in Eastern Oregon.
This project is in the early stages, so we have no technical results to report at this time.
This project has been funded by the following grants and contracts:
The views expressed on this page are those of the principal investigators and do not necessarily reflect the views of the National Science Foundation or the Defense Advanced Research Projects Agency.