Information & Data Management and Analytics (IDEA) Laboratory


The myriad amounts of digital data, i.e., Big Data, can bring about exciting advancements in various areas of science and technology. We set forth the foundations of and build systems for easy, effective, and efficient data management and analytics. Our research lies primarily in the areas of databases and data management.
  • Email: termehca [at] oregonstate.edu
  • Address: 3053 Kelley Engineering Center, Corvallis, OR 97330-5501
Cape Perpetua

Current projects

  • CHARM: Autonmous Communication of Humans and Information Sources

    One can gain invaluable insights by integrating and analyzing available data sources, such as traditional data systems, sensors, and social media. Data sources must also interact with each other to provide the information needed for many important queries and analyses. Unfortunately, different data sources express information in different forms. Humans also have their own ways of expressing their information and needs. Hence, humans and data sources cannot communicate effectively, which keeps many valuable insights out of our reach. CHARM aims at designing algorithms and systems that enable information sources and humans to automatically develop an effective mutual understanding and common language through interaction. Check out the project webpage for publication and more information.
  • READY: Representation Independent Data Analytics

    The output of data analytics algorithms highly depend on the structure and representation of their input data. To use current database analytics algorithms, users have to find the desired representation for these algorithms and transform (wrangle) their data to these representations. These tasks are hard and time-consuming and major obstacles for unlocking the value of data. READY aims at developing algorithms that return the desired results no matter how their input data is represented. Check out the project webpage for more information.

Recent News

  • Our paper on data interaction game received an ACM SIGMOD Research Highlight Award.
  • A couple of new manuscripts:
    • In the first one, we show how to learn accurate models directly over heterogeneous and dirty data without cleaning them; in the second
    • In the second one, we present a graph search algorithm that adapts to the evolution in data representation
    • .
  • We present the fundamental ideas behind our VDBMS system, which usably manages large scale variable and heterogeneous data at VLDB-Poly 2018
  • Ben discusses the bases of autonomous entity matching and integration at VLDB-Poly 2018
  • We have a coupe of papers in the VLDB Journal 2018: 1) Yodsawalai has the paper Cost-Effective Conceptual Design Using Taxonomies, which addresses the tradeoff between the usability and overhead of organizing data in a structured form; and 2) Jose publishes the paper Logically Scalable and Efficient Relational Learning, which extends his work on designing efficient learning algorithms that are robust against the logical representations of the data.
  • Jose demonstrates CastorX, a system that efficiently learns over multiple heterogeneous databases using novel sampling techniques, at VLDB 2018. He presented a summary of its fundamental ideas at SIGMOD-DEEM 2018.
  • Ben presents his work on helping humans and large-scale data sources to progressively and automatically develop a mutual language for effective communication via reinforcement learning at SIGMOD 2018. His paper is selected as one of the best papers of the conference.
  • Jose demonstrates AutoMode, a system that automatically sets the language bias for learning systems over relational data at ICDE 2018.
  • People usually believe that to get effective results for vague queries, e.g., ambiguous keyword queries, data systems have to spend a lot of time and explore many potential answers in the data. We present a lightening talk on how to query large databases both effectively and efficiently using caching techniques at ICDE 2018.
  • We present our work on managing and managing evolving and heterogeneous relational databases at DBPL 2017.
  • Ben presents an overview on our work of modeling human users and data sources as rational agents who want to establish a common language, their learning mechanisms, and interesting equilibria that appear in their interactions in HILDA 2017.
  • Jose will present his paper Schema Independent Relational Learning in SIGMOD 2017. His paper measures the robustness of learning algorithms to data representation and proposes a representationally robust, accurate, and efficient learning over relational data. Here is the one-slide teaser.
  • Jose presents his paper on automatically setting the language bias of learning systems over relational data, the so-called "black magic" of relational learning, in SIGMOD-DEEM 2017

Selected awards

  • ACM SIGMOD Highlight Award, 2018.
  • SIGMOD best papers selection, 2018.
  • Distinguished PC member of SIGMOD 2017.
  • Best student paper award, ICDE 2011.
  • Yahoo! Key Scientific Challenges Award, 2011.
  • ICDE best papers selection, 2011.

Template by BlackTie.co