Thomas G. Dietterich
Distinguished Professor (Emeritus) and Director of Intelligent Systems
School of Electrical Engineering and Computer Science
1148 Kelley Engineering Center
Oregon State University
Office: KEC 2067
PGP Public Key
(Last updated September 30, 2015.)
Students and Staff
"If you invent a breakthrough in artificial intelligence,
so machines can learn," Mr. Gates responded, "that is worth
10 Microsofts." (Quoted in NY Times, Monday March 3, 2004)
The focus of my research is machine learning (and the associated
areas of Data Science and Big Data). How can we make computer
systems that adapt and learn from their experience? How can we
combine machine learning with other advances in AI to build
Integrated Intelligent Systems? How can we combine human knowledge
with massive data sets to expand scientific knowledge and build more
useful computer applications? My laboratory combines research on
machine learning and AI fundamentals with applications to problems
in science and engineering.
- Scientific Projects
- Ecosystem Informatics and Computational
Sustainability: Oregon State University is a leader in combining
computer science and the ecological sciences to build the new
discipline of Ecosystem Informatics. Ecosystem Informatics studies
methods for collecting, analyzing, and visualizing data on the
structure and function of ecosystems. It is an instance of an
important new direction in science: Data Exploration Science (see Jim
Oregon State is also part of the Institute for
Computational Sustainability led by Cornell University.
This effort seeks to develop novel computational methods to address
problems in ecosystem science and sustainable management of the
My group is involved in many Ecosystem Informatics and
Computational Sustainability activities:
- Machine Learning for Species Distribution. One of the
central goals of ecology is to understand and predict the
distribution of species (including the bugs that we are studying
in the Insect Identification project). Given a data set that
records observations of the presence (or absence) of multiple
species at multiple locations, we wish to develop models that can
predict their presence/absence elsewhere. We are interested not
only in static distribution models, but also in process models
that capture the temporal and spatial of species distributions
(e.g., bird migration, flight times of moths, return of salmon,
spread of invasive species, survival of endangered species, etc.).
Our species distribution team includes faculty members (Matt
Betts, myself, and Weng-Keen Wong), post-docs Rebecca Hutchinson
and Selina Chu, and graduate students Arwen Lettkeman and Liping Liu.
We collaborate very closely with
the Cornell Laboratory of
Ornithology and with
the DataONE Datanet. In
particular, we are studying methods for dealing with the many
shortcomings of the citizen science data collected by the Lab of
Ornithology in their
eBird project including (a) partial
detection, (b) wide range of birder expertise, and (c) highly biased
spatial distribution of observations.
- BirdCast. Another special case of species distribution
modeling is understanding bird migration. With the Lab of
Ornithology, we are developing methods for reconstructing and
predicting bird migration across North America. Our goal is to
understand what signals birds use to decide when to migrate and to
provide daily forecasts of bird migration by combining eBird
reports, weather radar, acoustic monitoring of flight calls, and
weather forecasts. The project web site is
- Approximate Optimization for Bio-Economic Models. Many
sustainability applications require solving large spatio-temporal
optimization problems under uncertainty. We are collaborating
with economists Jo
Albers and Claire
Montgomery on methods for approximate solution of
spatio-temporal optimization problems involving land management
for wildfire control and counter-measures for controlling invasive
- Project TAHMO: Deployment, Cleaning, and Analysis of Sensor
Network Data. We are part of
the Project TAHMO that seeks to
construct and deploy a network of 20,000 hydro-meteorological
stations in Africa. We are developing algorithms for sensor
placement, data cleaning, recovery from damaged sensors, and
analysis of the resulting data. We are building on our previous
work with Ethan Dereszynski on dynamic Bayesian network models for
sensor data cleaning.
- Arthropod Identification. Our current understanding of
complex ecosystems is limited by a lack of data. One particularly
useful kind of data is population counts of "bugs" (small
arthropods that live in soils, lakes, streams, and the ocean).
The BugID project seeks to develop devices for
capturing, imaging, and sorting bugs combined with general image
processing/machine learning/pattern recognition tools for counting
and classifying them. We hope to transform the ability of
scientists to measure the health of forests, streams, and
estuaries. More generally, we are interested in developing a wide
range of novel instruments for expanding the quality, quantity,
and spatio-temporal resolution of ecologically-relevant data. Our
research also contributes to computer vision and object
recognition more generally.
- NIPS 2012
Posner Lecture: Challenges for Machine Learning in
- ICML 2011 Tutorial
on Machine Learning in Ecology and Ecosystem
- Intelligent Desktop Assistants. We have been involved in two
large efforts to develop intelligent assistants for the computer desktop.
- TaskTracer. When you come into work in the morning,
you don't want to say to your computer "I want to run Word", but
rather, "I want to work on my CS534 homework". In other words,
you would like a user interface that was organized around your
projects and activities rather than around application programs,
files, folders, etc. You would also like all of your information
in one place rather than scattered across the local file system,
network file systems, web sites, email folders, calendar,
contacts, etc. TaskTracer extends the Windows UI to provide
exactly this functionality. This research is supported by DAPRA
with previous support from Google, Intel, and the DARPA CALO project.
News Service story. Project Web
- CALO. The goal of the CALO project
was to develop an AI personal assistant that can help you find
relevant documents, prepare for meetings, keep track of what is
going on during meetings, and autonomously execute tasks such as
arranging travel, scheduling meetings, executing administrative
workflows (e.g., purchasing and staffing), and so on. Our work on
CALO focused on developing methods for integrating multiple,
separately-engineered components into a single learning and
reasoning system. We also prototyped a novel system that
employs programming-by-demonstration to define new learning tasks
for CALO to solve autonomously. We are currently editing a book
describing the results of the CALO project.
- Next Generation Phenomics. An important goal in biology
is to reconstruct the tree of life. As part of
the Project AVATOL team, we are
developing computer vision and machine learning methods to
automatically discover and score phenotype characters (features)
from images of biological specimens. These scores can then be
combined with other information (e.g., genetic sequences,
functional measurements) to reconstruct phylogenetic trees.
Phenomic information is particularly valuable for sets of
closely-related species (where DNA differences may not reflect
functional differences) and for extinct species known only through
The computer science challenges involve learning to score
known characters, which typically include shape, texture, color,
and topological features of specimens, from weakly-labeled data
and discovering new characters that are shared across some
taxonomic groups but not others.
- Fundamental Machine Learning and Artificial Intelligence Research
- Reviews, tutorials, and
books. I have written several review articles and tutorials on
If you are seeking a research career in machine learning, data mining,
artificial intelligence and related areas, and you have a strong
background in mathematics and programming, please read my Information for Prospective Students
If you are interested in robotics, I encourage you to visit
the Robotics Team
Pages to learn more about our excellent robotics program.
Professional Service, Journals, and Book Series
- I am a co-founder of Strands
(formerly MyStrands; formerly MusicStrands), a recommendation company.
- I am a co-founder of Smart Desktop. Smart Desktop
is now part of Decho, Inc., which is a "cloud
computing" effort of EMC.
Decho was a spinout of the TaskTracer project.
- I am a co-founder and Chief Scientist of BigML. The
goal of this startup is to develop large scale cloud-based machine
- Andrew Emmott, Graduate Student.
- Risheek Garrepalli, Graduate Student (Jointly supervised with Alan Fern).
- Jed Irvine,
Senior Faculty Research Assistant (Software Engineer).
- Michael Lam, Graduate Student (jointly advised with Sinisa Todorovic).
- Si Liu, Graduate Student (jointly advised with
Debashis Mondal, Statistics).
- Sean McGregor, Graduate Student.
- Michael Slater,
Graduate Student (On leave from Faculty Research Assistant).
- Pat Sullivan, Assistant and Grants Coordinator.
- Tadesse Zemicheal, Graduate Student.
Former Students and Staff
- Majid Alkaee Taleghan. Machine Learning Scientist at Context Relevant.
- Hussein Almuallim,
Oil and Energy Professional, Calgary, Canada.
- Eric Altendorf, Google.
- Adam Ashenfelter, BigML, Inc., Corvallis, Oregon.
- Ghulum Bakiri, President at MicroCenter, Bahrain.
- Christian Baumberger. Software Engineer at Zuehlke Group
- Xinlong Bao. Google Pittsburgh.
- Brian Breck.
- Waranun Bunjongsat.
- Giuseppe Cerbone. Independent Information Services Professional, Milan, Italy.
- Martha Chamberlin.
- Hei Chan. Assistant Professor / Project Researcher at the
Transdisciplinary Research Integration Center, UCLA.
- Richard Charon.
- Eric Chown, Full Professor, Bowdoin College.
- Selina Chu, JPL, Pasadena, CA.
- Dan Corpron
- Mark Crowley, Assistant Professor, Department of Electrical and
Computer Engineering, University of Waterloo.
- Diane Damon, Damon Consulting, Portland, OR.
Dereszynski, Research Scientist, WebTrends, Portland, OR.
- Phuoc Do, Vida Lab.
- Nicholas Flann Associate Professor, Utah State University
- Greg Foltz.
- Dan Forrest.
- Tony Fountain, Director of the Cyberinfrastructure Lab for Environmental Observing Systems (CLEOS), UC San Diego.
- Ashit Gandhi, Founder and Vice-President, Prism Gem, LLC - The Art of Diamond Coloring.
- Colin Gerety, Fort Collins, CO.
- Brandon Harvey, Symantec and Linn-Benton Community College.
- Arwen Griffioen.
- Guohua Hao, Senior Data Scientist at iHeartRadio.
- Hermann Hild, President, SMI Cognitive Software GmbH .
- Jesse Hostetler
Hutchinson, Assistant Professor of Computer Science and
Fishers and Wildlife.
- Saket Joshi, Member of Technical Staff at Cycorp.
- Varad Joshi, Director of Engineering at Elemental Technologies.
- Caroline Koff, Hewlett-Packard Corporation, Fort Collins, CO.
Keiser, Research Programmer, CMU. Masters Thesis (PDF).
- Michael Kelm, Research Scientist, Siemens Healthcare.
- Eun Bae Kong, Professor, Computer Science, Chungnam National University, South Korea
- Bill Langford, Research Associate at RMIT, Melbourne, Australia.
Lin, VMWare, Seattle.
- Liping Liu,
Postdoc with David Blei, Columbia University.
- Dragos Margineantu, The Boeing Company.
- Gonzalo Martinez, Assistant Professor, Autonomous University of Madrid.
- Prafulla Mishra, Software Development Manager at eBay.
- Avis Ng.
- Soumya Ray, Assistant Professor, Case-Western Reserve University.
- Angelo Restificar, Principal Machine Learning Engineer, EBay, Seattle.
- Ritchey Ruff, Senior SDET, Microsoft.
- Dan Sheldon, Assistant Professor, University of Massachusetts, Amherst.
- Jianqiang Shen. Research Scientist, PARC. Doctoral dissertation.
- Rongkun Shen.
Post-doc, Oregon Health and Science University, Portland.
Shindler, Lecturer at the University of Southern California
- Shriprakash Sinha. Ph.D. student TU Delft.
Sorower, Scientist at Philips Research North America
Stumpf. Senior Lecturer, City University London.
- Tao Sun, Graduate Student at UMass Amherst.
- Dan Vega, Senior Software Engineer at Valley Inception, LLC.
- Mark Vulfson. Microsoft Corporation.
- Kiri Wagstaff, Principle Researcher at JPL.
Wang, Senior Scientist at Inome (Intelius).
- Dietrich Wettschereck. Consultant, Cologne, Germany.
- Pengcheng Wu.
- Michael Wynkoop, Qualcomm.
- Qing Yao, College of Informatics and Electronics. Zhejiang Sci-Tech University. Hangzhou, China.
- Wei Zhang, The Boeing Company.
Zhang. Senior Software Engineer, Google. Doctoral Dissertation (PDF).
Zubek, Principal Statistician, Boehringer Ingelheim.
- CS519/GEO599: Principles of
Ecosystem Informatics, 2004-2005.
- CS 534, Spring 2005, Machine
- CS430, Fall 2003, Introduction to
- CS539, Fall 2003, Seminar: Probabilistic
- CS 533, Applied Artificial
Intelligence for Engineeers.
- CS 539, Winter 2000, Selected Topics in
Artificial Intelligence: Probabilistic Agents
- CS 430/530, Fall 1999, Artificial Intelligence
- CS 519, Fall 1996. Research Methods
in Computer Science.
- CS 450/550, Winter 1996, Introduction to Computer Graphics.
Machine Learning Resources
My Family's Musical Activities
Tom Dietterich, firstname.lastname@example.org