CS534: Machine Learning
Course Description
This course will present an introduction to algorithms for machine
learning and data mining. These algorithms lie at the heart of many
leading edge computer applications including optical character
recognition, speech recognition, text mining, document classification,
pattern recognition, computer intrusion detection, and information
extraction from web pages. Every machine learning algorithm has both a
computational aspect (how to compute the answer) and a statistical
aspect (how to ensure that future predictions are accurate). Algorithms
covered include linear classifiers (Gaussian maximum likelihood, Naive
Bayes, and logistic regression) and non-linear classifiers (neural
networks, decision trees, support-vector machines, nearest neighbor
methods). The class will also introduce techniques for learning from
sequential data and advanced ensemble methods such as bagging and
boosting.
Prerequisites: CS515; basic knowledge of search algorithms,
probability, statistics, calculus, linear algebra. 4 Units.
Class Hours: MWF 9:00-10:00 Bat 250
Office Hours: Thursdays 9:00-10:30 Dear 221C
Grader: Charles Parker
Textbook:
Duda, Hart, and Stork: Pattern Classification. Make sure your
copy is not the first printing (or go to David Stork's web page and
download the bug fixes).
Course Handouts
Software
In this class, we will be using the WEKA package from The University
of Waikato (Hamilton, New Zealand). This is a package of machine
learning algorithms and data sets that is very easy to use and easy to
extend. See the assignment for Homework 2 for information about how
to use WEKA.
Homework Assignments
Solutions and course grades are available on the Blackboard
System.
Please turn in all homework in two forms: (i) as hardcopy at the start
of class and (ii) electronically via the ENGR homework
system. (To submit electronically, first Login to the ENGR Teach
site, and then click on the Submit Assignment item on the left side of
the page.)
Viewgraphs for Lectures
- Part 1: Introduction, Perceptrons, Logistic Regression, Linear
Discriminant Analysis (pdf)
- Part 2: Requirements for Off-The-Shelf Learning Methods. Decision Trees.
(pdf)
- Part 3: Neural networks.
(pdf)
- Part 4: Nearest neighbor. (pdf)
- Part 5: Support Vector Machines. (pdf)
- Part 6: Bayesian Networks (pdf)
- Part 7: Statistical and Computational Learning Theory (pdf)
- Part 8: Bayesian Learning Theory (pdf) Updated April 29, 2005
- Part 9: Bias/Variance Theory (pdf)
- Part 10: Overfitting and Penalty Methods (pdf)
- Part 11: Hold-Out and Cross-validation methods (pdf
- Part 12: Sequential Supervised Learning (pdf)
- Part 13: Methodology (pdf)
- Bonus: Unsupervised Learning (pdf)
- Part 14: Course Summary (pdf)
Tom Dietterich, tgd@cs.orst.edu