Trustable Machine Learning

Course Description

This short course considered the problem of obtaining reliable decisions from supervised machine learning. It attempts to summarize the current state of knowledge about how we can create machine learning classifiers that, when they make a prediction, can provide a guarantee that the prediction is correct with high probability. These classifiers reject test queries for which they are not sufficiently confident. The course consists of four lectures, with each lecture centered around a few recent papers but including material from other publications.

Lecture 1: Calibrated Probabilities. This lecture discusses how to obtain calibrated probabilities from supervised classifiers. These are useful for making rejection decisions, but also for cost-sensitive classification, for handling class imbalance, and for serving as a component of a larger AI system.
Lecture 2: Classification with a Reject Option. We do not need to obtain calibrated probabilities in order to make reject decisions correctly. This lecture discusses methods for setting a rejection threshold that provide accuracy guarantees. This includes standard thresholding methods and also the method of conformal prediction.
Lecture 3: Open Category Detection. The first two lectures considered only the case of a closed world with iid training data. In this lecture, we discuss the problem of detecting test queries that belong to classes not present in the training data.
Lecture 4: Anomaly Detection. Most open category methods apply an anomaly detection method to detect the novel-class queries. This lecture discusses a benchmark study of eight anomaly detection algorithms. It then presents the Rare Pattern Anomaly Detection theory developed by Alan Fern, Md. Amran Siddiqui, and me that gives a PAC-style theory for anomaly detection methods.

I was not able to cover ALL of the relevant literature in these presentations. I would be grateful to receive email with pointers to other papers that discuss these topics. Similarly, if you see errors in these presentations, please send me email so that I can correct them.

Tom Dietterich, tgd@cs.orst.edu