Thomas G. Dietterich
Professor and Director of Intelligent Systems
School of Electrical Engineering and Computer Science
1148 Kelley Engineering Center
Oregon State University
Corvallis,
Oregon 97331-5501
E-mail:
tgd@cs.orst.edu
Phone: +1-541-737-5559
Office: KEC 2067
PGP Public Key
Recent changes to this page.
(Last updated November 4, 2009.)
Page Contents:
Research
Prospective Students
Publications
CV
Software
Students and Staff
Course Materials
Bio Sketch
Conferences
Comedy
"If you invent a breakthrough in artificial intelligence,
so machines can learn," Mr. Gates responded, "that is worth
10 Microsofts." (Quoted in NY Times, Monday March 3, 2004)
The focus of my research is machine learning: How can we make
computer systems that adapt and learn from their experience? How can
we combine machine learning with other advances in AI to build
Integrated Intelligent Systems? How can we combine human knowledge
with massive data sets to expand scientific knowledge and build more
useful computer applications? My laboratory combines research on
machine learning and AI fundamentals with applications to problems in
science and engineering.
- Scientific Projects
- Ecosystem Informatics and Computational
Sustainability: Oregon State University is a leader in combining
computer science and the ecological sciences to build the new
discipline of Ecosystem Informatics. Ecosystem Informatics studies
methods for collecting, analyzing, and visualizing data on the
structure and function of ecosystems. It is an instance of an
important new direction in science: Data Exploration Science (see Jim
Gray's 2003
KDD talk).
Oregon State is also part of the NSF Expedition in
Computational Sustainability jointly with Cornell University,
Bowdoin College, Howard University, and the Conservation Fund. This
effort seeks to develop novel computational methods to address
problems in ecosystem science and sustainable management of the
biosphere.
My group is involved in many Ecosystem Informatics and
Computational Sustainability activities:
- Arthropod Identification. Our current understanding of
complex ecosystems is limited by a lack of data. One particularly
useful kind of data is population counts of "bugs" (small
arthropods that live in soils, lakes, streams, and the ocean). We
seek to develop devices for capturing, imaging, and sorting bugs
combined with general image processing/machine learning/pattern
recognition tools for counting and classifying them. We hope to
transform the ability of scientists to measure the health of
forests, streams, and estuaries. More generally, we are
interested in developing a wide range of novel instruments for
expanding the quality, quantity, and spatio-temporal resolution of
ecologically-relevant data. We recently received a grant of
$800,000 from the NSF to support our work in this area. See OSU
News Service Story. Project Pages. Image Databases. Our
research also contributes to computer vision and object
recognition more generally.
- Automated Data Cleaning for Sensor Networks. As large
scale sensor networks are deployed to collect ecological data,
methods are needed for automated quality assurance ("data
cleaning") for this data to detect and remove errors due to sensor
failure. Graduate student Ethan Dereszynski has developed Dynamic
Bayesian Network (DBN) models of sensor network data from the
Andrews Experimental Forest. These models are being applied to
automatically detect and flag anomalies in temperature data.
- Machine Learning for Species Distribution. One of the
central goals of ecology is to understand and predict the
distribution of species (including the bugs that we are studying
in the Insect Identification project). Given a data set that
records observations of the presence (or absence) of multiple
species at multiple locations, we wish to develop models that can
predict their presence/absence elsewhere. We are interested not
only in static distribution models, but also in process models
that capture the temporal and spatial of species distributions
(e.g., bird migration, flight times of moths, spread of invasive
species, survival of endangered species, etc.). Our species
distribution team includes faculty members (Matt Betts, myself,
and Weng-Keen Wong), post-doc Rebecca Hutchinson, and
graduate students Arwen Lettkeman and Paul Wilkins.
- Approximate Optimization for Bio-Economic Models. Many
sustainability applications require solving large spatio-temporal
optimization problems under uncertainty. We are collaborating
with economists Jo Albers and Claire Montgomery on methods for
approximate solution of spatio-temporal optimization problems
involving land management for wildfire control and
counter-measures for controlling invasive species.
- Summer
Institute in Ecosystem Informatics. Junior and Senior
undergraduates and first-year graduate students can spend 10 weeks
studying and researching at the H. J. Andrews Experimental Forest
under funding from this NSF summer program. The goal is to
introduce students to the research challenges at the intersection
of mathematics, computer science, and the ecosystem sciences.
- Intelligent Desktop Assistants. We are involved in two
large efforts to develop intelligent assistants for the computer desktop.
- TaskTracer. When you come into work in the morning,
you don't want to say to your computer "I want to run Word", but
rather, "I want to work on my CS534 homework". In other words,
you would like a user interface that was organized around your
projects and activities rather than around application programs,
files, folders, etc. You would also like all of your information
in one place rather than scattered across the local file system,
network file systems, web sites, email folders, calendar,
contacts, etc. TaskTracer extends the Windows UI to provide
exactly this functionality. This research is supported by DARPA
through the CALO project and gifts from Intel. OSU
News Service story. Project Web
Site.
- CALO. The goal of the CALO project
was to develop an AI personal assistant that can help you find
relevant documents, prepare for meetings, keep track of what is
going on during meetings, and autonomously execute tasks such as
arranging travel, scheduling meetings, executing administrative
workflows (e.g., purchasing and staffing), and so on. Our work on
CALO focused on developing methods for integrating multiple,
separately-engineered components into a single learning and
reasoning system. We also prototyped a novel system that
employs programming-by-demonstration to define new learning tasks
for CALO to solve autonomously. We are currently editing a book
describing the results of the CALO project.
- Learning via Interaction. Statistical machine learning
has focused primarily on learning from observational data. Human
learning often involves learning from demonstrations, explanations,
and feedback. I am interested in combining all of these methods to
develop intelligent systems that can learn both autonomously and
through interaction with human users and coaches. We have three
efforts in this direction:
- AI for Computer Games. The AI components of most computer
games are hand-authored rule-based systems. We are studying
methods for developing game AI agents via machine learning. One
approach is to apply reinforcement learning to automatically learn
an opponent for games. Another approach is to teach the game AI
by demonstrating how to play the game. A third method is to
provide "coaching feedback" to the learning system. A fourth
approach is to transfer knowledge learned on one game to rapidly
construct an AI agent for a new game. We use the RTS game Wargus as our
experimental platform. Wargus is based on the Strategus game engine.
This work is funded by ARO under a MURI grant and through the
DARPA Bootstrapped Learning program.
- End-User Debugging of Learned Programs. We are starting
to see end-user applications that incorporate machine learning
components (e.g., adaptive spam filters, adaptive email
management, adaptive user interfaces). How can we empower end
users to get these learning systems to behave properly? For
example, how can end users define new features, provide advice on
feature relevance, and yet also understand that no learning system
can be perfect? How can a learning system explain itself to the
end user? Funding provided by the National Science Foundation.
- Bootstrapped Learning. Under the DARPA Bootstrapped
Learning program, we are studying how a teacher can create a
complex AI system through natural instructional methods that
combine direct instruction, demonstration, examples, and feedback.
The vision of Bootstrapped Learning is that the teacher constructs
a series of lessons where each lesson introduces a new concept or
skill that relies on the lessons previously taught. We are part
of the Student team, lead by SRI International. Alan Fern leads
our effort.
- Fundamental Machine Learning Research
- Evidence Trees. We have developed a new approach to
supervised learning in which ensembles of tree classifiers are
applied not to make classification decisions but to select which
training data points provide evidence relevant to making a
decision or prediction. This evidence can then be input to a
second-level decision making process, which could be another
classifier or some form of kernel density estimation. We are
exploring this in the context of computer vision and the Arthropod
Identification project.
- Sequential and Spatial
Supervised Learning (SSSL). Many emerging applications of
machine learning involve sequential or spatial data including the
scientific applications listed above as well as problems of
transaction monitoring, counter-terrorism, and fraud detection.
Most present-day applications of machine learning to
spatio-temporal require extensive ad hoc tool-building. Can we
build a new generation of generic machine learning tools that can
be applied "off the shelf" to solve these sequential and spatial
supervised learning problems?
- Machine Reading. In collaboration with
researchers at BBN, CMU, University of Washington, ISI, and
Cycorp, we are studying methods for extracting knowledge from text
to support inference. Our focus is on learning rules (e.g., Horn
clauses) from noisy and incomplete training data extracted from
reading text. Funding provided by the DARPA Machine Reading
program.
- Reviews, tutorials, and
books. I have written several review articles and tutorials on
machine learning.
There is a weekly AI Colloquium (Tuesdays, 3-4pm, KEC 1007) where we
present and discuss research we are doing here at Oregon State and
where visitors also make presentations from time to time. Students
interested in joining my group are strongly encouraged to attend these
meetings. This quarter there are two reading groups: Machine Reading
(Tuesdays 4-5pm KEC 3057) and Learning + Search (Wednesdays 4-5pm KEC
2057).
If you are seeking a research career in machine learning, data mining,
artificial intelligence and related areas, and you have a strong
background in mathematics and programming, please read my Information for Prospective Students
page. To see what courses I expect my Ph.D. students to take, please
see Recommended Courses for Ph.D. Students in
Machine Learning.
Journals and Book Series
Entrepreneurial Activities
- I am a co-founder of Strands
(formerly MyStrands; formerly MusicStrands), a recommendation company.
- I am a co-founder of Smart Desktop. Smart Desktop
is now part of Decho, Inc., which is a "cloud
computing" effort of EMC.
Decho is commercializing technology developed as part of the
TaskTracer system.
- Christian Baumberger, Graduate Student.
- Ethan Dereszynski, Graduate Student.
- Rebecca Hutchinson, Postdoc.
- Jed Irvine, Software Developer.
- Arwen Lettkeman, Graduate Student.
- Junyuan Lin, Graduate Student.
- Jonathan Mark, REU Student.
- Gibby Reynolds, REU Student.
- Michael Slater, Project Manager.
- Shahed Sorower, Graduate Student.
Former Students and Staff
- Hussein Almuallim,
Oil and Energy Professional, Calgary, Canada.
- Eric Altendorf, Google.
- Adam Ashenfelter, Strands, Inc., Corvallis, Oregon.
- Ghulum Bakiri, Department of Computer Science, Bahrain University
- Xinlong Bao. Google Pittsburgh.
- Brian Breck.
- Waranun Bunjongsat.
- Giuseppe Cerbone. Independent Information Services Professional,
Milan, Italy.
- Martha Chamberlin.
- Hei Chan.
- Richard Charon.
- Eric Chown,
Associate Professor, Bowdoin College.
- Dan Corpron
- Diane Damon,
Damon Consulting, Portland, OR.
- Phuoc Do, Decho.
- Nicholas Flann
Associate Professor, Utah State University
- Greg Foltz.
- Dan Forrest.
- Tony Fountain, Lab Director/Scientist, San Diego Supercomputer Center.
- Ashit Gandhi, Founder and Vice-President, Prism Gem, LLC - The Art of Diamond Coloring.
- Colin Gerety, Fort Collins, CO.
- Brandon
Harvey, Project Manager, GarageGames, Eugene, Oregon.
- Guohua Hao,
Machine Learning Researcher, Motorola.
- Hermann Hild,
President, SMI Cognitive Software GmbH .
- Saket Joshi, Graduate Student, Tufts University.
- Varad
Joshi, Senior Software Engineering Manager at Arris, Portland,
OR.
- Caroline Koff, Hewlett-Packard Corporation, Fort Collins, CO.
- Victoria
Keiser, Research Programmer, CMU. Masters Thesis (PDF).
- Michael
Kelm, Research Scientist, Siemens Healthcare.
- Eun Bae Kong, Associate Professor and Department Head, Computer Science, Chungnam
National University, South Korea
- Bill Langford, Post-doc at RMIT, Melbourne, Australia.
- Dragos Margineantu, The Boeing Company.
- Gonzalo
Martinez, Assistant Professor, Autonomous University of Madrid.
- Prafulla Mishra
- Avis Ng.
- Soumya Ray, Assistant Professor, Case-Western Reserve University.
- Angelo Restificar, ZetaInteractive.
- Ritchey Ruff, Senior SDET, Microsoft.
- Jianqiang
Shen. Research Scientist, Pearson
Knowledge Technologies. Doctoral
Dissertation (PDF).
- Rongkun Shen.
Post-doc, Oregon Health and Science University, Portland.
- Shriprakash Sinha
- Simone
Stumpf. Lecturer, City University London.
- Dan Vega, Senior Software Engineer at Valley Inception, LLC.
- Mark Vulfson. Microsoft Corporation.
- Xin Wang, Microsoft Corporation.
- Dietrich Wettschereck. Recommind.com.
- Pengcheng Wu.
- Michael Wynkoop, Qualcomm.
- Wei Zhang, The Boeing Company.
- Wei
Zhang, Post-doc. HP Labs.
Doctoral Dissertation (PDF).
- Valentina Zubek, Aureon Biosciences.
- CS519/GEO599: Principles of
Ecosystem Informatics, 2004-2005.
- CS 534, Spring 2005, Machine
Learning.
- CS430, Fall 2003, Introduction to
Artificial Intelligence
- CS539, Fall 2003, Seminar: Probabilistic
Relational Models
- CS 533, Applied Artificial
Intelligence for Engineeers.
- CS 539, Winter 2000, Selected Topics in
Artificial Intelligence: Probabilistic Agents
- CS 430/530, Fall 1999, Artificial Intelligence
Programming Techniques.
- CS 519, Fall 1996. Research Methods
in Computer Science.
- CS 450/550, Winter 1996, Introduction to Computer Graphics.
Machine Learning Resources
Conferences and Workshops
The 1st Asian
Conference on Machine Learning (ACML-2009), Nanjing,
China. November 2-4, 2009.
Neural Information Processing Systems
Conference (NIPS-2009), Vancouver BC, Canada, December 7-12,
2009.
International Conference on Machine Learning (ICML
2010), Heifa, Israel, 21-24 June 2010.
Recent Conferences
Twenty-first International
Joint Conference on Artificial Intelligence (IJCAI-09), Pasadena,
CA, July 11-17, 2009.
First International Conference on Computational
Sustainability (Comp-Sust 2009), Ithaca, NY, June 8-11, 2009.
International
Conference on Machine Learning (ICML 2009),
Montreal, Canada June 14-18, 2009.
IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR 2009), Miami, Florida, June
20-25, 2009.
Favorite misstatements in papers:
- "Gaussian classifiers and support vector machine
with Gaussian kernels) are wildly used in many areas such
as speech signal processing and pattern recognition." (Lin, Bilmes,
and Crammer, How to Loose Confidence: Probabilistic Linear Machines
for Multiclass Classification).
My Family's Musical Activities
- yOya. My son Noah
writes songs and plays keyboards for this band.
- The Stack. Former band
that my son Noah belonged to. While in college, Noah
also arranged lots of songs for Mens' Blue and White (Men's A Capella
at the Claremont Colleges); watch them on YouTube.
- Jubilate: The
Women's Choir of Corvallis. My wife Carol sings in this choir.
Tom Dietterich, tgd@cs.orst.edu