Thomas G. Dietterich
Professor and Director of Intelligent Systems
School of Electrical Engineering and Computer Science
1148 Kelley Engineering Center
Oregon State University
Corvallis,
Oregon 97331-5501
E-mail:
tgd@cs.orst.edu
Phone: +1-541-737-5559
Office: KEC 2067
PGP Public Key
Recent changes to this page.
(Last updated July 1, 2009.)
Page Contents:
Research
Prospective Students
Publications
CV
Software
Students and Staff
Course Materials
Bio Sketch
Conferences
"If you invent a breakthrough in artificial intelligence,
so machines can learn," Mr. Gates responded, "that is worth
10 Microsofts." (Quoted in NY Times, Monday March 3, 2004)
The focus of my research is machine learning: How can we make
computer systems that adapt and learn from their experience? How can
we combine machine learning with other advances in AI to build
Integrated Intelligent Systems? How can we combine human knowledge
with massive data sets to expand scientific knowledge and build more
useful computer applications? My laboratory combines research on
machine learning and AI fundamentals with applications to problems in
science and engineering.
- Scientific Projects
- Ecosystem Informatics and Computational
Sustainability: Oregon State University is a leader in combining
computer science and the ecological sciences to build the new
discipline of Ecosystem Informatics. Ecosystem Informatics studies
methods for collecting, analyzing, and visualizing data on the
structure and function of ecosystems. It is an instance of an
important new direction in science: Data Exploration Science (see Jim
Gray's 2003
KDD talk).
Oregon State is also part of the NSF Expedition in
Computational Sustainability jointly with Cornell University,
Bowdoin College, Howard University, and the Conservation Fund. This
effort seeks to develop novel computational methods to address
problems in ecosystem science and sustainable management of the
biosphere.
My group is involved in many Ecosystem Informatics and
Computational Sustainability activities:
- Arthropod Identification. Our current understanding of
complex ecosystems is limited by a lack of data. One particularly
useful kind of data is population counts of "bugs" (small
arthropods that live in soils, lakes, streams, and the ocean). We
seek to develop devices for capturing, imaging, and sorting bugs
combined with general image processing/machine learning/pattern
recognition tools for counting and classifying them. We hope to
transform the ability of scientists to measure the health of
forests, streams, and estuaries. More generally, we are
interested in developing a wide range of novel instruments for
expanding the quality, quantity, and spatio-temporal resolution of
ecologically-relevant data. We recently received a grant of
$800,000 from the NSF to support our work in this area. See OSU
News Service Story. Project Pages. Image Databases. Our
research also contributes to computer vision and object
recognition more generally.
- Automated Data Cleaning for Sensor Networks. As large
scale sensor networks are deployed to collect ecological data,
methods are needed for automated quality assurance ("data
cleaning") for this data to detect and remove errors due to sensor
failure. Graduate student Ethan Dereszynski has developed Dynamic
Bayesian Network (DBN) models of sensor network data from the
Andrews Experimental Forest. These models are being applied to
automatically detect and flag anomalies in temperature data.
- Machine Learning for Species Distribution. One of the
central goals of ecology is to understand and predict the
distribution of species (including the bugs that we are studying
in the Insect Identification project). Given a data set that
records observations of the presence (or absence) of multiple
species at multiple locations, we wish to develop models that can
predict their presence/absence elsewhere. We are interested not
only in static distribution models, but also in process models
that capture the temporal and spatial of species distributions
(e.g., bird migration, flight times of moths, spread of invasive
species, survival of endangered species, etc.). Our species
distribution team includes faculty members (Matt Betts, myself,
and Weng-Keen Wong), post-doc Rebecca Hutchinson, and
graduate students Arwen Lettkeman and Paul Wilkins.
- Approximate Optimization for Bio-Economic Models. Many
sustainability applications require solving large spatio-temporal
optimization problems under uncertainty. We are collaborating
with economists Jo Albers and Claire Montgomery on methods for
approximate solution of spatio-temporal optimization problems
involving land management for wildfire control and
counter-measures for controlling invasive species.
- Summer
Institute in Ecosystem Informatics. Junior and Senior
undergraduates and first-year graduate students can spend 10 weeks
studying and researching at the H. J. Andrews Experimental Forest
under funding from this NSF summer program. As with the IGERT,
the goal is to introduce students to the research challenges at
the intersection of mathematics, computer science, and the
ecosystem sciences.
- Intelligent Desktop Assistants. We are involved in two
large efforts to develop intelligent assistants for the computer desktop.
- TaskTracer. When you come into work in the morning,
you don't want to say to your computer "I want to run Word", but
rather, "I want to work on my CS534 homework". In other words,
you would like a user interface that was organized around your
projects and activities rather than around application programs,
files, folders, etc. You would also like all of your information
in one place rather than scattered across the local file system,
network file systems, web sites, email folders, calendar,
contacts, etc. TaskTracer extends the Windows UI to provide
exactly this functionality. This research is supported by DARPA
through the CALO project and gifts from Intel. OSU
News Service story. Project Web
Site. Technology from TaskTracer is being commercialized at
Decho, Inc.. You can get the Technology
Preview at Smart Desktop.
- CALO. The CALO project seeks
to develop an AI personal assistant that can help you find
relevant documents, prepare for meetings, keep track of what is
going on during meetings, and autonomously execute tasks such as
arranging travel, scheduling meetings, executing administrative
workflows (e.g., purchasing and staffing), and so on. Our work on
CALO has focused on developing methods for integrating multiple,
separately-engineered components into a single learning and
reasoning system. We have also prototyped a novel system that
employs programming-by-demonstration to define new learning tasks
for CALO to solve autonomously.
- Learning via Interaction. Statistical machine learning
has focused primarily on learning from observational data. Human
learning often involves learning from demonstrations, explanations,
and feedback. I am interested in combining all of these methods to
develop intelligent systems that can learn both autonomously and
through interaction with human users and coaches. We have three
efforts in this direction:
- AI for Computer Games. The AI components of most computer
games are hand-authored rule-based systems. We are studying
methods for developing game AI agents via machine learning. One
approach is to apply reinforcement learning to automatically learn
an opponent for games. Another approach is to teach the game AI
by demonstrating how to play the game. A third method is to
provide "coaching feedback" to the learning system. A fourth
approach is to transfer knowledge learned on one game to rapidly
construct an AI agent for a new game. We use the RTS game Wargus as our
experimental platform. Wargus is based on the Strategus game engine. This
work is funded by DARPA under the Transfer Learning program and by
ARO under a MURI grant.
- End-User Debugging of Learned Programs. We are starting
to see end-user applications that incorporate machine learning
components (e.g., adaptive spam filters, adaptive email
management, adaptive user interfaces). How can we empower end
users to get these learning systems to behave properly? For
example, how can end users define new features, provide advice on
feature relevance, and yet also understand that no learning system
can be perfect? How can a learning system explain itself to the
end user?
- Bootstrapped Learning. Under the DARPA Bootstrapped
Learning program, we are studying how a teacher can create a
complex AI system through natural instructional methods that
combine direct instruction, demonstration, examples, and feedback.
The vision of Bootstrapped Learning is that the teacher constructs
a series of lessons where each lesson introduces a new concept or
skill that relies on the lessons previously taught. We are part
of the Student team, lead by SRI International. Alan Fern leads
our effort.
- Fundamental Machine Learning Research
- Evidence Trees. We have developed a new approach to
supervised learning in which ensembles of tree classifiers are
applied not to make classification decisions but to select which
training data points provide evidence relevant to making a
decision or prediction. This evidence can then be input to a
second-level decision making process, which could be another
classifier or some form of kernel density estimation. We are
exploring this in the context of computer vision and the Arthropod
Identification project.
- Sequential and Spatial
Supervised Learning (SSSL). Many emerging applications of
machine learning involve sequential or spatial data including the
scientific applications listed above as well as problems of
transaction monitoring, counter-terrorism, and fraud detection.
Most present-day applications of machine learning to
spatio-temporal require extensive ad hoc tool-building. Can we
build a new generation of generic machine learning tools that can
be applied "off the shelf" to solve these sequential and spatial
supervised learning problems?
- Transfer Learning. In collaboration with
researchers at Berkeley, Stanford, and MIT, we are studying new
learning methods that can transfer learned knowledge from one
setting or context to another. This requires learning to occur at
a more abstract and more relational level than in standard supervised
learning algorithms. Our primary application area is learning in
Wargus (see above).
- Integrated Learning. We are studying ways to
combine reasoning and multiple knowledge sources to learn
hierarchical task knowledge from a single demonstration of a
task coupled with rich interaction. Our primary application area
for this project is air traffic flight planning.
- Reviews, tutorials, and
books. I have written several review articles and tutorials on
machine learning.
There is a weekly AI Colloquium where we present and discuss research
we are doing here at Oregon State and where visitors also make
presentations from time to time. Students interested in joining my
group are strongly encouraged to attend these meetings. This quarter,
we are meeting Mondays 4-5pm in KEC 2057.
If you are seeking a research career in machine learning, data mining,
artificial intelligence and related areas, and you have a strong
background in mathematics and programming, please read my Information for Prospective Students
page. To see what courses I expect my Ph.D. students to take, please
see Recommended Courses for Ph.D. Students in
Machine Learning.
Journals and Book Series
Entrepreneurial Activities
- I am a co-founder of Strands
(formerly MyStrands; formerly MusicStrands), a recommendation company.
- I am a co-founder of Smart Desktop. Smart Desktop
is now part of Decho, Inc., which is a "cloud
computing" effort of EMC.
Decho is commercializing technology developed as part of the
TaskTracer system.
Former Students and Staff
- Hussein Almuallim,
Oil and Energy Professional, Calgary, Canada.
- Eric Altendorf, Google.
- Adam Ashenfelter, Strands, Inc., Corvallis, Oregon.
- Ghulum Bakiri, Department of Computer Science, Bahrain University
- Brian Breck.
- Waranun Bunjongsat.
- Giuseppe Cerbone. Independent Information Services Professional,
Milan, Italy.
- Martha Chamberlin.
- Hei Chan.
- Richard Charon.
- Eric Chown,
Associate Professor, Bowdoin College.
- Dan Corpron
- Diane Damon,
Damon Consulting, Portland, OR.
- Phuoc Do, Decho.
- Nicholas Flann
Associate Professor, Utah State University
- Greg Foltz.
- Dan Forrest.
- Tony Fountain, Staff Scientist, San Diego Supercomputer Center.
- Ashit Gandhi, Founder and Vice-President, Prism Gem, LLC - The Art of Diamond
Coloring.
- Colin Gerety, Fort Collins, CO.
- Brandon
Harvey, Project Manager, GarageGames, Eugene, Oregon.
- Hermann Hild,
President, SMI Cognitive Software GmbH .
- Saket Joshi, Graduate Student, Tufts University.
- Varad Joshi, Interconnectix, Portland, OR.
- Caroline Koff, Hewlett-Packard Corporation, Fort Collins, CO.
- Victoria Keiser.
Masters Thesis (PDF).
- Michael
Kelm, Research Scientist, Siemens Healthcare.
- Eun Bae Kong, Associate Professor and Department Head, Computer Science, Chungnam
National University, South Korea
- Bill Langford, Post-doc at RMIT, Melbourne, Australia.
- Dragos Margineantu, The Boeing Company.
- Prafulla Mishra
- Avis Ng.
- Soumya Ray, Assistant Professor, Case-Western Reserve University.
- Angelo Restificar, ZetaInteractive.
- Ritchey Ruff,
Certification Consultant and Lead Tester at Interactive Home Systems.
- Jianqiang
Shen. Research Scientist, Pearson
Knowledge Technologies. Doctoral
Dissertation (PDF).
- Rongkun Shen
- Shriprakash Sinha
- Simone
Stumpf. Whitehorse.
- Dan Vega, Red
Rover Software.
- Mark Vulfson. Microsoft Corporation.
- Xin Wang, Microsoft Corporation.
- Dietrich Wettschereck. Recommind.com.
- Pengcheng Wu.
- Michael Wynkoop, Qualcomm.
- Wei Zhang, The Boeing Company.
- Wei
Zhang, Post-doc. HP Labs.
Doctoral Dissertation (PDF).
- Valentina Zubek, Aureon Biosciences.
- CS519/GEO599: Principles of
Ecosystem Informatics, 2004-2005.
- CS 534, Spring 2005, Machine
Learning.
- CS430, Fall 2003, Introduction to
Artificial Intelligence
- CS539, Fall 2003, Seminar: Probabilistic
Relational Models
- CS 533, Applied Artificial
Intelligence for Engineeers.
- CS 539, Winter 2000, Selected Topics in
Artificial Intelligence: Probabilistic Agents
- CS 430/530, Fall 1999, Artificial Intelligence
Programming Techniques.
- CS 519, Fall 1996. Research Methods
in Computer Science.
- CS 450/550, Winter 1996, Introduction to Computer Graphics.
Machine Learning Resources
Conferences and Workshops
Twenty-first International
Joint Conference on Artificial Intelligence (IJCAI-09), Pasadena,
CA, July 11-17, 2009.
The 1st Asian
Conference on Machine Learning (ACML-2009), Nanjing,
China. November 2-4, 2009.
Neural Information Processing Systems
Conference (NIPS-2009), Vancouver BC, Canada, December 7-12,
2009.
Recent Conferences
First International Conference on Computational
Sustainability (Comp-Sust 2009), Ithaca, NY, June 8-11, 2009.
International
Conference on Machine Learning (ICML 2009),
Montreal, Canada June 14-18, 2009.
IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR 2009), Miami, Florida, June
20-25, 2009.
My Family's Musical Activities
- yOya. My son Noah
writes songs and plays keyboards for this band.
- The Stack. My son Noah
writes songs and plays rhythm guitar in this rock-and-roll band. He
also arranged lots of songs for Mens' Blue and White (Men's A Capella
at the Claremont Colleges); watch them on YouTube.
- Jubilate: The
Women's Choir of Corvallis. My wife Carol sings in this choir.
Tom Dietterich, tgd@cs.orst.edu