Find a partner to work on this project. (Two person teams are encouraged, though you may work alone. No three person teams please.)
Choose your application domain and learning problem within
it. Turn in a project proposal (1 page maximum) by May 12th. Each
team is required to meet with the instructor at least once to discuss
the project before turning in the proposal. You can come to the office
hour or email me to schedule a different time for meeting.
As a guideline, you will need to go through the following
questions and make your decisions on each one.
Feature design. How should the "raw" data be transformed into proper features (inputs) so that the data is suitable for machine learning? Should the data be aggregated in some way? Should the data be transformed so that it has a more gaussian distribution? Can we apply dimension reduction / feature subset selection to improve learning performance? If so, how can we go about it?
Algorithm choice. What learning algorithms would be appropriate for this problem? Factors to consider: data set size, noise level, continuous versus discrete features, missing values, supervised vs unsupervsed vs semi-supervsed.
Overfitting Avoidance. Is there a risk of overfitting? If so, what overfitting avoidance methods should be applied? How should they be tuned?
Performance criterion. How should performance be measured? Error rate? Expected misclassification cost? Cross-validation Likelihood? What loss function should be used for training the learning algorithm?
Perform the work, run the experiments!
Turn in a final report (no longer than 8 pages including
references, figures and tables).
Each team should turn in a single report and please email me
your report before the deadline. Your report should precisely describe
the following:
The application domain and formulation of your learning task(s).
A precise description of your approach; What algorithm
did you choose and why? what features did you use? Describe any
preprocessing that's involved. What software package was used and what
was programmed by you? NOTE: no restrictions on using existing software
packages and no restrictions on what programming language you use if
you decide you need code your own.
Be creative! Exploring your own interesting ideas and comparing them with the baseline approaches will receive credit whether they beat the baseline or not.