End-User Software Engineering - Empirical Results to Date

We have conducted about 2 dozen empirical studies, some formative to inform our end-user software engineering design work and some summative to evaluate it. Here is a brief synopsis of the more interesting ones.

  1. WYSIWYT testing study, led by Karen Rothermel, Fall 99. This was the one written up in our ICSE 2000 paper.
  2. WYSIWYT maintenance study, led by Vijay Krishna, Fall 2000: end-user subjects, described by our ICSM'2001 paper.
  3. Time: Fall 2000, end-user participants, led by Miguel Arredondo-Castro. The purpose was to compare end users' comprehension and performance using a graphical approach (termed Model SNF) vs. using the traditional textual approach of earlier and fby. SNF was much better. The tie to end-user software engineering is the role of time-based patterns and comprehension in helping with the "oracle" problem. Written up in this JVLC paper: "End-User Programming of Time as an 'Ordinary' Dimension in Grid-Oriented Visual Programming Languages,"
  4. WYWISYT testing given recursion: Winter 2001 study on testing recursive spreadsheets, led by Bing Ren and Andy Ko, this study's purpose was to inform our design about which of the 2 possible alternatives we should choose. We decided to choose the 'Copy Representative' approach as a result of this study. Here is the paper (appeared in IEEE VL/HCC'01): "Visually Testing Recursive Programs in Spreadsheet Languages."
  5. Time and the oracle problem: Winter 2001: end-user participants, again led by Miguel Arredondo-Castro. The purpose was to investigate if participants using the temporal window could perform better as oracles than participants that did not use the temporal window. The answer was yes.
  6. Assertions: Spring 2001 Assertions Think Aloud Study led by Christine Wallace. This used end users as subjects. One group had assertions, the other group did not (a total of 10 subject). This study was used to determine if end users could understand and use assertions. The findings said that end users could use and understand assertions. There were some additional interesting findings dealing with how people want to do testing. Written up in a short paper (at IEEE VL/HCC 02).
  7. Assertions: Winter 2002 -- Grid Assertions, led by Laura Beckwith. Think aloud study with 5 subjects. Was used to help design how guards on grids should be handled. The experiment was conducted on paper. The subsequent paper is in the proceedings of IEEE HCC2002.
  8. Help-Me-Test: Winter 2002, led by Prashant Shah. HMT = "Help Me Test". Think aloud study with 12 subjects (6 with HMT and 6 without) doing a maintenance task. The HMT subjects were much better at the task. Another thing we learned was that people do as much testing as they can first with their own values, and when the going gets tough turn to HMT. Once they do turn to HMT, they seem to really like it, and they turn to it often.
  9. Assertions: Spring 2002 Assertions Study, led by Omkar Pendse. Statistical study, intended as a follow-up to the think-aloud version done by Chris Wallace (see Spring 2001 assertions think-aloud). One group had assertions (pre-planted by us computer scientists), and one group had no assertions. Both had WYSIWYT. The task was debugging. The assertions subjects were significantly better at fixing the bugs. The paper containing this study appeared in ICSE 2003: "End-User Software Engineering with Assertions in the Spreadsheet Paradigm".
  10. Surprise-Explain-Reward + Assertions: Summer 2002 study, by Ledah Casburn, Aaron Wilson, Orion Granatir, Laura Beckwith. The purpose was to build upon Omkar's by asking "will people actually put any assertions in?", given our Surprise-Explain-Reward strategy. The task was debugging. Pretty much replicates Omkar's experiment. The answer was "yes". Written up in our description of Surprise-Explain-Reward that appeared at CHI'03: "Harnessing Curiosity to Increase Correctness in End-User Programming".
  11. Fault localization: Fall 2002, performed by Joey Ruthruff, Rogan Creswick, Shreenivasarao Prabhakararao, Marc Fisher, and Martin Main. The purpose of the experiment was to formatively evaluate three fault localization techniques: A "Blocking" Technique, developed by Dusty Reichwein, a "Test-Count" Technique, developed by Marc Fisher as a simpler substitute for Dusty's technique that fits in well with our test-reuse capabilities, and a "Nearest-Consumers" Technique, developed by Joey Ruthruff, Margaret Burnett and Rogan Creswick as an inexpensive technique that tries to mimic the blocking capabilities of Dusty's technique. We used the transcripts taken from the Spring 2001 Help-Me-Test experiment as end user test suites to evaluate the effectiveness and robustness of each technique. The results appeared at ACM SoftViz (2003): "End-User Software Visualizations for Fault Localization".
  12. Fault localization: Winter 2003. Performed by Shrinu Prabhakararao,Joey Ruthruff, Orion Granatir, Rogan Creswick, Martin Main, and Mike Durham, along with Margaret Burnett and Curtis Cook. This was a think-aloud study, to investigate how visual fault localization techniques affect and interact with the debugging efforts of end-user programmers. The subjects treated fault localization techniques as a resource to be called upon when they had exhausted their own debugging abilities. It often helped when they turned to fault localization. One key way the fault localization technique helped was to lead them into a suitable strategy. The paper about this study appeared in IEEE VL/HCC'03: "Strategies and Behaviors of End-User Programmers with Interactive Fault Localization".
  13. Surprise-Explain-Reward + attention: Summer 2003. Performed by Shrinu Prabhakararao, T.J. Robertson, Joey Ruthruff, Laura Beckwith and Amit Phalgune along with Margaret Burnett and Curtis Cook. This was a controlled experiment, to investigate the Impact of two interruption styles on end-user debugging. We found several reasons to use negotiated-style interruptions for informing the user about the "surprise" and no reason to use immediate-style interruptions. This study was written up in our CHI'04 paper.

Last modified: May 28, 2004, by Margaret Burnett.