CS 434: Machine Learning and Data Mining

 Fall  2008

MWF 15:00 - 15:50 Kelly 1001




Instructor: Xiaoli Fern

Email:
xfern@eecs.oregonstate.edu
Office:
kelly 3073
Office hour:
MWF 2-3pm, or by appointment
Class email list:
cs434-f08@engr.oregonstate.edu


Machine learning and Data mining is a subfield of artificial intelligence that develops computer programs that can learn from past experience and find useful patterns in data.  This field has provided many tools that are widely used and making significant impacts in both industrial and research settings. Some of the application domains include personalized spam filters, HIV vaccine design, handwritten digit recognition, face recognition, credit card fraud detection, unmanned vehicle control, medical diagnosis, intelligent web search, etc.

This course will provide a basic introduction to this dynamic and fast advancing field. Topics include the three basic branches in this field: (1) Supervised learning for prediction problems (learn to predict); (2) Unsupervised learning for clustering data and discovering interesting patterns from data (learn to understand); and (3) Reinforcement learning for learning to select actions based on positive and negative feedback (learn to act). It will have a special focus on the practical side --- students will not only learn various machine learning and data mining techniques, but also learn how to apply them to real problems in practice.

Syllabus

Course Policy


Course materials


Learning objectives

Upon completing the course, students are expected to be able to:
1) Students are able to apply supervised learning algorithms to prediction problems and evaluate the results.
2) Students are able to apply unsupervised learning algorithms to data analysis problems and evaluate results.
3) Students are able to apply reinforcement learning algorithms to control problem and evaluate results.
4) Students are able to take a description of a new problem and decide what kind of problem (supervised, unsupervised, or reinforcement) it is.


Lecture Schedule

see previous class for a rough lecture schedule cs434 Fall 2007
Date Topics Lecture Notes
Reading
Assignments
9/29 M
Introduction to basic concepts
slides
TM Chapter 1
10/1 W
The perceptron algorithm slides notes on perceptron by William Cohen
10/3 F
The nearest neighbor algorithm
Slides

hw1, due monday 13th in class Solution
10/6 M
Decision tree algorithm
slides
J. R. Quinlan, Induction of decision trees, Machine learning 1: 81-106, 1986

10/8 W
Decision tree cont.
slides


10/10 F
Review of probability theory
slides


10/13 M
(Naive) Bayes classifier
slides

hw2 due on Friday Oct 24th in class
solution to the written part
10/15 W
NBC cont, logistic regression
slides
generative model vs discriminative model

10/17 F
Logistic Regression
slides


10/20 M
Support Vector Machine
slides

Final project information
10/22 W
support vector machines cont.
slides


10/24 F
Ensemble methods, bagging
slides


10/27 M
boosting
Slides
A short introduction to boosting

10/29 W
Feature Selection
slides


10/31 F
Clustering, HAC
slides

Assignment 3 Due Nov 12th
11/3 M
Clustering cont. Kmeans
slides


11/5 W
midterm exam



11/7 F
Gaussian Mixture modeling
slides


11/10 M
Discussion of midterm questions



11/12 W
Canceled class



11/14 F
GMM cont, unsupervised dimension reduction
slides

Assignment 4, Due Nov 24th
cluster.csv; random.csv
11/17 M
Guest lecture on sequence analysis



11/19 W
Markov Decision Processes
slides


11/21 F
MDPs cont.
slides


11/24 M
Reinforcement learning
slides

hw5 : Due on 12/03
11/26 W
Reinforcement learning - passive learning
slides


11/28 F
No class - thanks giving holiday



12/1 M
Reinforcement learning - active learning
slides


12/3 W
Reinforcement learning - function approximation
slides


12/5 F
Association rules mining