My research is aimed at helping people to use human expertise to combine computational resources into new software. Achieving this requires helping people to learn needed expertise, to acquire code and other resources, to find their way around within that code, and to actually create and run programs. Our research group's main target populations are end-user programmers and novice programmers.

As much as possible, our research group tests ideas in multiple programming contexts, to explore how well results generalize. That way--again, as much as possible--our results are not bound to any particular programming tool or environment. In this sense, the ideas produced by research are much more valuable than the tools we happen to study or create to explore those ideas. We are willing to consider both textual and visual languages, as well as both commercial and academic programming tools.

Support for learning from code examples 

Learning a new programming language or API can be difficult, particularly for end-user programmers, who often cannot afford to spend much time on learning APIs. Many studies show people learn from code examples (even students learning functional programming!), but we have found that the usual approaches of searching the web, forums, or repository of examples are sometimes insufficient. Our approach is to integrate selected examples into structured, interactive resources to support learning -- such as resources situated within a development environment and/or intelligent tutors that monitor learning.

Problem: Programmers try learning from online code examples, but the usual approaches are not particularly effective

Solution: Situated learning and interactive scaffolded learning (via intelligent tutors)

Related funding

None yet

CYCU: Code You Can Use 

Our studies show the vast majority of code posted by end-user programmers to repositories is never reused by anybody. Much of this code is too specialized or low-quality to justify reuse. Our approach is to develop heuristics for identifying reusable code. Such heuristics can be used to create repository search engines that filter code examples based on reusability, as well as to create programming tools that inform people about where code needs to be improved during reuse.

We are actively working to combine this CYCU line of research with the Learning line of research above. Specifically, we are using our heuristics to create systems that identify reusable code examples and synthesize those examples into situated intelligent tutors to promote learning.

Problem: End-user programmers post many kinds of code to online repositories, but very little of it is reused

Solution: Heuristics can identify reusable code to guide reuse, refactoring, and learning

Related funding

  • Enabling users to search for LabVIEW code--Phase 1: Prototype, National Instruments 7/14-6/15 (grant: $58,281, my share: $58,281)
  • Helping LabVIEW users to create high-quality VIs, National Instruments 9/13-9/14 (grant: $54,096, my share: $54,096)
  • Guiding the design of effective LabVIEW programs, National Instruments 1/12-5/13 (grant: $46,593, my share: $46,593)
  • Automated detection of problematic code structures in visual programs, National Instruments 9/11-6/12 (grant: $10,000, my share: $10,000)
  • A first empirical test of low ceremony evidence for assessing quality attributes, National Science Foundation 9/11-9/12 (grant: $62,101, my share: $62,101)

Helping programmers find information within code

Professional programmers spend a third of their time during maintenance just navigating through code to find information. In collaboration with Professor Margaret Burnett at OSU and Dr. Rachel Bellamy at IBM, we are investigating how to reduce the time needed for people to find information in code. Our research has shown that Information Foraging Theory (IFT) can be adapted to account for how programmers navigate. Our approach is to develop models that track and predict where programmers need to go in code. We incorporate these models into tools that help programmers quickly obtain this information.

I plan to generalize these models and tools at some point to support other programmer populations. Specifically, if we can track and predict end-user and novice programmers' information needs, then we can proactively acquire relevant code snippets, direct people to visit and/or reuse this code, and even synthesize these snippets into effective resources to aid learning.

Problem: Programmers forage for information by following complex but rational paths through code

Solution: Tools can aid foraging by tracking information needs and offering assistive support

Related funding

  • SHF: Medium: Collaborative Research: Information Foraging Theory: From Scientific Principles to Engineering Practice (REU Supplement), National Science Foundation 9/15-8/16 (grant: $16,000, my share: $8,000)
  • SHF: Medium: Collaborative Research: Information Foraging Theory: From Scientific Principles to Engineering Practice (REU Supplement), National Science Foundation 9/14-8/15 (grant: $14,000, my share: $7,000)
  • SHF: Medium: Collaborative Research: Information Foraging Theory: From Scientific Principles to Engineering Practice, National Science Foundation 9/13-8/16 (grant: $932,620, my share: $268,767)

PexPipe: A sensor+cloud platform for scientific end-user programming

Scientists and engineers often adopt a visual domain-specific language due to its easy learnability, but later encounter problems when trying to create high-performance programs. In response, they typically resort to general textual languages (e.g., C). Our approach is to provide a cloud+mobile dataflow programming platform that enables scientists to easily deploy sensors, collect data to the cloud, efficiently analyze data, and visualize results. Funded by USDA, we are now actively developing the components of this platform. Funded by National Instruments, and in collaboration with Andrew Dove at NI, are also studying how to help people find and fix performance programs in existing visual dataflow languages (e.g., LabVIEW).

After the platform is created, we will incorporate support for learning and code reuse by drawing upon the results from projects above. This will demonstrate that those results not only apply to textual languages but also to visual dataflow languages. Moreover, integrating these results should aid in adoption of our platform via our industrial partners.

Problem: Scientists struggle to create high-performance code with complex dataflow

Solution: A platform for easily collecting sensor data and analyzing on the cloud

Related funding

  • Personal Environmental Exposure Assessment using Wristbands for Epidemiological Studies in Disadvantaged Communities, NIH 12/14-11/19 (grant: $2,573,097, my share: $43,800)
  • The Wave~Ripples For Change: Obesity Prevention for Active Youth In Afterschool Programs Using Virtual- And Real-World Experiential Learning, USDA 2/13-1/18 (grant: $4,671,604, my share: $767,512)

Discontinued and other lines of work

Other work included lines of research on general end-user population characteristics, techniques for analyzing spreadsheets, and techniques for validating and reformatting strings. Based on colleagues' advice during mid-tenure review (2012), I've selectively discontinued lines of work (including a few that had not yet led to any publications) so I can focus on a smaller number of projects (above).

General descriptions of end-user populations

Aiding spreadsheet programming

Validating and reformatting strings (including with topes)

Other old papers

Funding related to other/discontinued lines of work

  • Efficiently computing spreadsheet differences, Google 9/11-9/12 (grant: $43,476, my share: $21,738)
  • Transforming Spreadsheets into Validated Object Models, LogicBlox Corporation 10/10-9/11 (grant: $11,878, my share: $11,878)
  • VL/HCC 2011 Doctoral Consortium, National Science Foundation 9/11 (grant: $20,903, my share: $20,903)
  • American-European Collaboration Workshop at ISEUD-2011, Air Force Research Laboratory 6/11 (grant: $5,000, my share: $5,000)