# Publications

## Journal Articles

- Casts and Costs: Harmonizing Safety and Performance in Gradual Typing[Abstract, PDF, Code]Proc. of the ACM on Programming Languages (PACMPL) issue ACM SIGPLAN Int. Conf. on Functional Programming (ICFP), vol. 2, 2018, 98:1–98:30
Gradual typing allows programmers to use both static and dynamic typing in a single program. However, a well-known problem with sound gradual typing is that the interactions between static and dynamic code can cause significant performance degradation. These performance pitfalls are hard to predict and resolve, and discourage users from using gradual typing features. For example, when migrating to a more statically typed program, often adding a type annotation will trigger a slowdown that can be resolved by adding more annotations elsewhere, but since it’s not clear where the additional annotations must be added, the easier solution is to simply remove the annotation.

To address these problems, we develop: (1) a static cost semantics that accurately predicts the overhead of static-dynamic interactions in a gradually typed program, (2) a technique for efficiently inferring such costs for all combinations of inferable type assignments in a program, and (3) a method for translating the results of this analysis into specific recommendations and explanations that can help programmers understand, debug, and optimize the performance of gradually typed programs. We have implemented our approach in Herder, a tool for statically analyzing the performance of different typing configurations for Reticulated Python programs. An evaluation on 15 Python programs shows that Herder can use this analysis to accurately and efficiently recommend type assignments that optimize the performance of these programs without sacrificing type safety.

- Extending Type Inference to Variational Programs[Abstract, PDF]ACM Trans. on Programming Languages and Systems (TOPLAS), vol. 36, num. 1, 2014, 1:1–1:54
Through the use of conditional compilation and related tools, many software projects can be used to generate a huge number of related programs. The problem of typing such variational software is difficult. The brute-force strategy of generating all variants and typing each one individually is (1) usually infeasible for efficiency reasons and (2) produces results that do not map well to the underlying variational program. Recent research has focused mainly on efficiency and addressed only the problem of type checking. In this work we tackle the more general problem of variational type inference and introduce variational types to represent the result of typing a variational program. We introduce the variational lambda calculus (VLC) as a formal foundation for research on typing variational programs. We define a type system for VLC in which VLC expressions are mapped to correspondingly variational types. We show that the type system is correct by proving that the typing of expressions is preserved over the process of variation elimination, which eventually results in a plain lambda calculus expression and its corresponding type. We identify a set of equivalence rules for variational types and prove that the type unification problem modulo these equivalence rules is unitary and decidable; we also present a sound and complete unification algorithm. Based on the unification algorithm, the variational type inference algorithm is an extension of algorithm W. We show that it is sound and complete and computes principal types. We also consider the extension of VLC with sum types, a necessary feature for supporting variational data types, and demonstrate that the previous theoretical results also hold under this extension. Finally, we characterize the complexity of variational type inference and demonstrate the efficiency gains over the brute-force strategy.

- A Visual Language for Explaining Probabilistic Reasoning[Abstract, PDF]Journal of Visual Languages and Computing (JVLC), vol. 24, num. 2, 2013, 88–109
We present an explanation-oriented, domain-specific, visual language for explaining probabilistic reasoning. Explanation-oriented programming is a new paradigm that shifts the focus of programming from the computation of results to explanations of how those results were computed. Programs in this language therefore describe explanations of probabilistic reasoning problems. The language relies on a storytelling metaphor of explanation, where the reader is guided through a series of well-understood steps from some initial state to the final result. Programs can also be manipulated according to a set of laws to automatically generate equivalent explanations from one explanation instance. This increases the explanatory value of the language by allowing readers to cheaply derive alternative explanations if they do not understand the first. The language is comprised of two parts: a formal textual notation for specifying explanation-producing programs and the more elaborate visual notation for presenting those explanations. We formally define the abstract syntax of explanations and define the semantics of the textual notation in terms of the explanations that are produced.

- The Choice Calculus: A Representation for Software Variation[Abstract, PDF]ACM Trans. on Software Engineering and Methodology (TOSEM), vol. 21, num. 1, 2011, 6:1–6:27
Many areas of computer science are concerned with some form of variation in software–from managing changes to software over time, to supporting families of related artifacts. We present the choice calculus, a fundamental representation for software variation that can serve as a common language of discourse for variation research, filling a role similar to the lambda calculus in programming language research. We also develop an associated theory of software variation, including sound transformations of variation artifacts, the definition of strategic normal forms, and a design theory for variation structures, which will support the development of better algorithms and tools.

- A Domain-Specific Language for Experimental Game Theory[Abstract, PDF, Code]Journal of Functional Programming (JFP), vol. 19, 2009, 645–661
Experimental game theory is increasingly important for research in many fields. Unfortunately, it is poorly supported by computer tools. We have created Hagl, a domain-specific language embedded in Haskell, to reduce the development time of game-theoretic experiments and make the definition and exploration of games and strategies simple and fun.

## Peer-Reviewed Book Chapters

- Variation Programming with the Choice Calculus[Abstract, PDF, Code]Generative and Transformational Techniques in Software Engineering IV (GTTSE 2011), Revised and Extended Papers, LNCS vol. 7680, Springer, 2013, 55–100
The choice calculus provides a language for representing and transforming variation in software and other structured documents. Variability is captured in localized choices between alternatives. The space of all variations is organized by dimensions, which provide scoping and structure to choices. The variation space can be reduced through a process of selection, which eliminates a dimension and resolves all of its associated choices by replacing each with one of their alternatives. The choice calculus also allows the definition of arbitrary functions for the flexible construction and transformation of all kinds of variation structures. In this tutorial we will first present the motivation, general ideas, and principles that underlie the choice calculus. This is followed by a closer look at the semantics. We will then present practical applications based on several small example scenarios and consider the concepts of “variation programming” and “variation querying”. The practical applications involve work with a Haskell library that supports variation programming and experimentation with the choice calculus.

- Semantics-Driven DSL Design[Abstract, PDF]Formal and Practical Aspects of Domain-Specific Languages: Recent Developments, (ed. Marjan Mernik), IGI Global, 2012, 56–80
Convention dictates that the design of a language begins with its syntax. We argue that early emphasis should be placed instead on the identification of general, compositional semantic domains, and that grounding the design process in semantics leads to languages with more consistent and more extensible syntax. We demonstrate this semantics-driven design process through the design and implementation of a DSL for defining and manipulating calendars, using Haskell as a metalanguage to support this discussion. We emphasize the importance of compositionality in semantics-driven language design, and describe a set of language operators that support an incremental and modular design process.

## Peer-Reviewed Conference and Workshop Papers

- Should Variation Be Encoded Explicitly in Databases?[Abstract, PDF]Int. Working Conf. on Variability Modelling of Software-Intensive Systems (VaMoS), 2021
Variation occurs in databases in many different forms and contexts. For example, a single database schema evolves over time, data from different sources may be combined, and the various configurations of a software product line (SPL) may have different data needs. While approaches have been developed to deal with many such scenarios, particularly in the fields of database evolution and data integration, there is no solution that treats variation as a general and orthogonal concern in databases. This is a problem when various kinds of variation intersect, such as during the evolution of a SPL. Previously, we have proposed variational databases (VDB) as a general way to represent variation in both the structure and content of databases. Although the model underlying VDB is simple, encoding variation explicitly in databases introduces complexity akin to using preprocessing directives in software. In this paper, we explore the feasibility of VDB and its associated variational query language for encoding different kinds of database variability. We develop two use cases that illustrate how different kinds of variation can be encoded and integrated in VDB, and how the corresponding information needs can be expressed as variational queries. We then use these use cases to discuss the benefits and drawbacks of such a direct encoding of variation in data and queries.

- Variational Satisfiability Solving[Abstract, PDF]ACM SIGSOFT Int. Systems and Software Product Line Conf. (SPLC), 2020, 18:1–18:12
Incremental satisfiability (SAT) solving is an extension of classic SAT solving that allows users to efficiently solve a set of related SAT problems by identifying and exploiting shared terms. However, using incremental solvers effectively is hard since performance is sensitive to a problem’s structure and the order sub-terms are fed to the solver, and the burden to track results is placed on the end user. For analyses that generate sets of related SAT problems, such as those in software product lines, incremental SAT solvers are either not used at all, used but not explicitly stated so in the literature, or used but suffer from the aforementioned usability problems. This paper translates the ordering problem to an encoding problem and automates the use of incremental SAT solving. We introduce variational SAT solving, which differs from incremental SAT solving by accepting all related problems as a single variational input and returning all results as a single variational output. Our central idea is to make explicit the operations of incremental SAT solving, thereby encoding differences between related SAT problems as local points of variation. Our approach automates the interaction with the incremental solver and enables methods to automatically optimize sharing of the input. To evaluate our methods we construct a prototype variational SAT solver and perform an empirical analysis on two real-world datasets that applied incremental solvers to software evolution scenarios. We show, assuming a variational input, that the prototype solver scales better for these problems than naive incremental solving while also removing the need to track individual results.

- Toward Efficient Analysis of Variation in Time and Space[Abstract, PDF]Int. Workshop on Variability and Evolution of Software-Intensive Systems (VariVolution), 2019, 57–64
Variation is central to today’s software development. There are two fundamental dimensions to variation: Variation in time refers to the fact that software exists in numerous revisions that typically replace each other (i.e., a newer version supersedes an older one). Variation in space refers to differences among variants that are designed to coexist in parallel. There are numerous analyses to cope with variation in space (i.e., product-line analyses) and others that cope with variation in time (i.e., regression analyses). The goal of this work is to discuss to which extent product-line analyses can be applied to revisions and, conversely, where regression analyses can be applied to variants. In addition, we discuss challenges related to the combination of product-line and regression analyses. The overall goal is to increase the efficiency of analyses by exploiting the inherent commonality between variants and revisions.

- Declarative GUIs: Simple, Consistent, and Verified[Abstract, PDF, Code]ACM SIGPLAN Int. Symp. on Principles and Practice of Declarative Programming (PPDP), 2018, 4:1–4:15
Graphical user interfaces (GUIs) are ubiquitous in real-world software and a notorious source of bugs that are difficult to catch through software testing. Model checking has been used to prove the absence of certain kinds of bugs, but model checking works on an abstract model of the GUI application, which might be inconsistent with its implementation. We present a library for developing directly verified, state-dependent GUI applications in the dependently typed programming language Agda. In the library, the type of a GUI’s controller depends on a specification of the GUI itself, statically enforcing consistency between them. Arbitrary properties can be defined and proved in terms of user interactions and state transitions. Our library connects to a custom-built Haskell back-end for declarative vector-based GUI elements. Compared to an earlier version of our library built on an existing imperative GUI framework, the more declarative back-end supports simpler definitions and proofs.

As a practical application of our library to a safety-critical domain, we present a case study developed in cooperation with the Medical University of Vienna. The case study implements a healthcare process for prescribing anticoagulants, which is highly error-prone when followed manually. Our implementation generates GUIs from an abstract description of a data-aware business process, making our approach easy to reuse and adapt to other safety-critical processes. We prove medically relevant safety properties about the executable GUI application, such as that given certain inputs, certain states must or must not be reached.

- Developing GUI Applications in a Verified Setting[Abstract, PDF, Code]Int. Symp. on Dependable Software Engineering: Theories, Tools, and Applications (SETTA), Springer, 2018, 89–107
Although there have been major achievements in verified software, work on verifying graphical user interfaces (GUI) applications is underdeveloped relative to their ubiquity and societal importance. In this paper, we present a library for the development of verified, state-dependent GUI applications in the dependently typed programming language Agda. The library uses Agda’s expressive type system to ensure that the GUI, its controller, and the underlying model are all consistent, significantly reducing the scope for GUI-related bugs. We provide a way to specify and prove correctness properties of GUI applications in terms of user interactions and state transitions. Critically, GUI applications and correctness properties are not restricted to finite state machines and may involve the execution of arbitrary interactive programs. Additionally, the library connects to a standard, imperative GUI framework, enabling the development of native GUI applications with expected features, such as concurrency. We present applications of our library to building GUI applications to manage healthcare processes. The correctness properties we consider are the following: (1) That a state can only be reached by passing through a particular intermediate state, for example, that a particular treatment can only be reached after having conducted an X-Ray. (2) That one eventually reaches a particular state, for example, that one eventually decides on a treatment. The specification of such properties is defined in terms of a GUI application simulator, which simulates all possible sequences of interactions carried out by the user.

- Managing Structurally Heterogeneous Databases in Software Product Lines[Abstract, PDF]VLDB Workshop: Polystores and Other Systems for Heterogeneous Data (Poly), Springer, 2018
Data variations are prevalent while developing software product lines (SPLs). A SPL enables a software vendor to quickly produce different variants of their software tailored to variations in their clients’ business requirements, conventions, desired feature sets, and deployment environments. In database-backed software, the database of each variant may have a different schema and content, giving rise to numerous data variants. Users often need to query and/or analyze all variants in a SPL simultaneously. For example, a software vendor wants to perform common tests or inquiries over all variants. Unfortunately, there is no systematic approach to managing and querying data variations and users have to use their intuition to perform such tasks, often resorting to repeating a task for each variant. We introduce

*VDBMS*(Variational Database Management System), a system that provides a compact, expressive, and structured representation of variation in relational databases. In contrast to data integration systems that provide a unified representation for all data sources, VDBMS makes variations explicit in both the schema and query. Although variations can make VDBMS queries more complex than plain queries, a strong static type system ensures that all variants of the query are consistent with the corresponding variants of the database. Additionally,*variational queries*make it possible to compactly represent and efficiently run queries over a huge range of data variations in a single query. This directly supports many tasks that would otherwise be intractable in highly variational database-backed SPLs. - A Domain Analysis of Data Structure and Algorithm Explanations in the Wild[Abstract, PDF, Data]ACM SIGCSE Technical Symp. on Computer Science Education (SIGCSE), 2018, 870–875
Explanations of data structures and algorithms are complex interactions of several notations, including natural language, mathematics, pseudocode, and diagrams. Currently, such explanations are created ad hoc using a variety of tools and the resulting artifacts are static, reducing explanatory value. We envision a domain-specific language for developing rich, interactive explanations of data structures and algorithms. In this paper, we analyze this domain to sketch requirements for our language. We perform a grounded theory analysis to generate a qualitative coding system for explanation artifacts collected online. This coding system implies a common structure among explanations of algorithms and data structures. We believe this structure can be reused as the semantic basis of a domain-specific language for creating interactive explanation artifacts. This work is part of our effort to develop the paradigm of explanation-oriented programming, which shifts the focus of programming from computing results to producing rich explanations of how those results were computed.

- Variational Databases[Abstract, PDF]Int. Symp. on Database Programming Languages (DBPL), ACM, 2017, 11:1–11:4
Data variations are prevalent in real-world applications. For example, software vendors handle variations in the business requirements, conventions, and environmental settings of a software product using hundreds of features each combination of which creates a different version of the product. In database-backed software, the database of each version may have a different schema and different content. Variations in the value and representation of each element in a dataset give rise to numerous variants in these applications. Users often would like to express information needs over all such variants. For example, a software vendor would like to perform common tests over all versions of its product, e.g., whether each relation in a relational database has a primary key. Hence, users need a query interface that hides the variational nature of the data and processes a query over all variations of a dataset. We propose a novel abstraction called a

*variational database*that provides a compact and structured representation of general forms of data variations and enables users to query database variations easily. - How Good are Your Types? Using Mutation Analysis to Evaluate the Effectiveness of Type AnnotationsInt. Workshop on Mutation Analysis (Mutation), IEEE, 2017, 122–127Best presentation[Abstract, PDF]
Software engineers primarily use two orthogonal means to reduce susceptibility to faults: software testing and static type checking. While many strategies exist to evaluate the effectiveness of a test suite in catching bugs, there are few that evaluate the effectiveness of type annotations in a program. This problem is most relevant in the context of gradual or optional typing, where programmers are free to choose which parts of a program to annotate and in what detail. Mutation analysis is one strategy that has proven useful for measuring test suite effectiveness by emulating potential software faults. We propose that mutation analysis can be used to evaluate the effectiveness of type annotations too. We analyze mutants produced by the MutPy mutation framework against both a test suite and against type-annotated programs. We show that, while mutation analysis can be useful for evaluating the effectiveness of type annotations, we require stronger mutation operators that target type information in programs to be an effective mutation analysis tool.

- A Choice of Variational Stacks: Exploring Variational Data Structures[Abstract, PDF]Int. Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), ACM, 2017, 28–35
Many applications require not only representing variability in software and data, but also computing with it. To do so efficiently requires variational data structures that make the variability explicit in the underlying data and the operations used to manipulate it. Variational data structures have been developed ad hoc for many applications, but there is little general understanding of how to design them or what tradeoffs exist among them. In this paper, we strive for a more systematic exploration and analysis of a variational data structure. We want to know how different design decisions affect the performance and scalability of a variational data structure, and what properties of the underlying data and operation sequences need to be considered. Specifically, we study several alternative designs of a variational stack, a data structure that supports efficiently representing and computing with multiple variants of a plain stack, and that is a common building block in many algorithms. The different variational stacks are presented as a small product line organized by three design decisions. We analyze how these design decisions affect the performance of a variational stack with different usage profiles. Finally, we evaluate how these design decisions affect the performance of the variational stack in a real-world scenario: in the interpreter when executing real software containing variability.

- Formula Choice Calculus[Abstract, PDF, Code]Int. Workshop on Feature-Oriented Software Development (FOSD), ACM, 2016, 49–57
The choice calculus is a simple metalanguage and associated theory that has been successfully applied to several problems of interest to the feature-oriented software development community. The formal presentation of the choice calculus essentially restricts variation points, called choices, to vary based on the inclusion or not of a single feature, while in practice variation points may depend on several features. Therefore, in both theoretical applications of the choice calculus, and in tools inspired by the choice calculus, the syntax of choices has often been generalized to depend on an arbitrary propositional formula of features. The purpose of this paper is to put this syntactic generalization on more solid footing by also generalizing the associated theory. Specifically, after defining the syntax and denotational semantics of the formula choice calculus (FCC), we define and prove the soundness of a syntactic equivalence relation on FCC expressions. This effort validates previous work which has implicitly assumed the soundness of rules in the equivalence relation, and also reveals several rules that are specific to FCC. Finally, we describe some further generalizations to FCC and their limits.

- Concepts, Operations, and Feasibility of a Projection-Based Variation Control System[Abstract, PDF, Code/Data]IEEE Int. Conf. on Software Maintenance and Evolution (ICSME), 2016, 323–333
Highly configurable software often uses preprocessor annotations to handle variability. However, understanding, maintaining, and evolving code with such annotations is difficult, mainly because a developer has to work with all variants at a time. Dedicated methods and tools that allow working on a subset of all variants could ease the engineering of highly configurable software. We investigate the potential of one kind of such tools: projection-based variation control systems integrated with projectional editors. For such systems we aim to understand: (i) what end-user operations they need to support, and (ii) whether they can realize the actual evolution of real-world, highly configurable software. We conduct an experiment that investigates variability-related evolution patterns and that evaluates the feasibility of a projection-based variation control system by replaying parts of the history of a real-world 3D printer firmware project. Among others, we show that the prototype variation control system does indeed support the evolution of a highly configurable system and that in general, it does not degrade the code.

- A Calculus for Variational Programming[Abstract, PDF]European Conf. on Object-Oriented Programming (ECOOP), LIPIcs vol. 56, 2016, 6:1–6:28
Variation is ubiquitous in software. Many applications can benefit from making this variation explicit, then manipulating and computing with it directly—a technique we call “variational programming”. This idea has been independently discovered in several application domains, such as efficiently analyzing and verifying software product lines, combining bounded and symbolic model-checking, and computing with alternative privacy profiles. Although these domains share similar core problems, and there are also many similarities in the solutions, there is no dedicated programming language support for variational programming. This makes the various implementations tedious, prone to errors, hard to maintain and reuse, and difficult to compare.

In this paper we present a calculus that forms the basis of a programming language with explicit support for representing, manipulating, and computing with variation in programs and data. We illustrate how such a language can simplify the implementation of variational programming tasks. We present the syntax and semantics of the core calculus, a sound type system, and a type inference algorithm that produces principal types.

- Programs for People: What We Can Learn from Lab Protocols[Abstract, PDF, Data]IEEE Int. Symp. on Visual Languages and Human-Centric Computing (VL/HCC), 2015, 203–211
Humans play an active role in the execution of certain kinds of programs, such as spreadsheets, workflows, and interactive notebooks. Interacting closely with execution is especially useful when end-users are learning from examples while doing their work. In order to better understand the language features needed to support this kind of use, we investigated a particularly rigid and formalized category of “program” people write for each other: lab protocols. These protocols present a linear, idealized process despite the complex contingencies of the lab work they describe. However, they employ a variety of techniques for limiting or expanding the semantic interpretation of individual steps and for integrating outside protocols. We use these observations to derive implications for the design of interactive and mixed-initiative programming languages.

- Variational Data Structures: Exploring Trade-Offs in Computing With Variability[Abstract, PDF]ACM SIGPLAN Symp. on New Ideas in Programming and Reflections on Software (Onward!), 2014, 213–226
Variation is everywhere, and in the construction and analysis of customizable software it is paramount. In this context, there arises a need for variational data structures for efficiently representing and computing with related variants of an underlying data type. So far, variational data structures have been explored and developed ad hoc. This paper is a first attempt and a call to action for systematic and foundational research in this area. Research on variational data structures will benefit not only customizable software, but many other application domains that must cope with variability. In this paper, we show how support for variation can be understood as a general and orthogonal property of data types, data structures, and algorithms. We begin a systematic exploration of basic variational data structures, exploring the tradeoffs among different implementations. Finally, we retrospectively analyze the design decisions in our own previous work where we have independently encountered problems requiring variational data structures.

- Projectional Editing of Variational SoftwareACM SIGPLAN Int. Conf. on Generative Programming and Component Engineering (GPCE), 2014, 29–38Best paper[Abstract, PDF]
Editing the source code of variational software is complicated by the presence of variation annotations, such as #ifdef statements, and by code that is only included in some configurations. When editing some configurations and not others, it would be easier to edit a simplified version of the source code that includes only the configurations we currently care about. In this paper, we present a projectional editing model for variational software. Using our approach, a programmer can partially configure a variational program, edit this simplified view of the code, and then automatically update the original, fully variational source code. The model is based on an isolation principle where edits affect only the variants that are visible in the view. We show that this principle has several nice properties that are suggested by related work on bidirectional transformations.

- An Abstract Representation of Variational Graphs[Abstract, PDF]Int. Workshop on Feature-Oriented Software Development (FOSD), ACM, 2013, 25–32
In the context of software product lines, there is often a need to represent graphs containing variability. For example, extending traditional modeling techniques or program analyses to variational software requires a corresponding notion of variational graphs. In this paper, we introduce a general model of variational graphs and a theoretical framework for discussing variational graph algorithms. Specifically, we present an abstract syntax based on tagging for succinctly representing variational graphs and other data types relevant to variational graph algorithms, such as variational sets and paths. We demonstrate how (non-variational) graph algorithms can be generalized to operate on variational graphs, to accept variational inputs, and produce variational outputs. Finally, we discuss a filtering operation on variational graphs and how this interacts with variational graph algorithms.

- Adding Configuration to the Choice Calculus[Abstract, PDF]Int. Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), ACM, 2013, 13:1–13:8
The choice calculus is a formal language for representing variation in software artifacts. Variability is organized in the choice calculus through the use of dimensions, where each dimension represents a decision that must be made in order to obtain a particular variant. However, the process of selecting alternatives from dimensions was relegated to an external operation. This precludes many interesting variation and reuse patterns, such as nested product lines, and theoretical results, such as a syntactic description of configuration, that would be possible if selection were a part of the language itself.

In this paper we add a selection operation to the choice calculus and illustrate how that increases the expressiveness of the calculus. We investigate some alternative semantics of this operation and study their impact and utility. Specifically, we will examine selection in the context of static and dynamically scoped dimension declarations, as a well as a modest and greedy form of dimension elimination. We also present a design for a type system to ensure configuration safety and modularity of nested product lines.

- A Calculus for Modeling and Implementing Variation[Abstract, PDF]ACM SIGPLAN Int. Conf. on Generative Programming and Component Engineering (GPCE), 2012, 132–140
We present a formal calculus for modeling and implementing variation in software. It unifies the compositional and annotative approaches to feature implementation and supports the development of abstractions that can be used to directly relate feature models to their implementation. Since the compositional and annotative approaches are complementary, the calculus enables implementers to use the best combination of tools for the job and focus on inherent feature interactions, rather than those introduced by biases in the representation. The calculus also supports the abstraction of recurring variational patterns and provides a metaprogramming platform for organizing variation in artifacts.

- An Error-Tolerant Type System for Variational Lambda Calculus[Abstract, PDF]ACM SIGPLAN Int. Conf. on Functional Programming (ICFP), 2012, 29–40
Conditional compilation and software product line technologies make it possible to generate a huge number of different programs from a single software project. Typing each of these programs individually is usually impossible due to the sheer number of possible variants. Our previous work has addressed this problem with a type system for variational lambda calculus (VLC), an extension of lambda calculus with basic constructs for introducing and organizing variation. Although our type inference algorithm is more efficient than the brute-force strategy of inferring the types of each variant individually, it is less robust since type inference will fail for the entire variational expression if any one variant contains a type error. In this work, we extend our type system to operate on VLC expressions containing type errors. This extension directly supports locating ill-typed variants and the incremental development of variational programs. It also has many subtle implications for the unification of variational types. We show that our extended type system possesses a principal typing property and that the underlying unification problem is unitary. Our unification algorithm computes partial unifiers that lead to result types that (1) contain errors in as few variants as possible and (2) are most general. Finally, we perform an empirical evaluation to determine the overhead of this extension compared to our previous work, to demonstrate the improvements over the brute-force approach, and to explore the effects of various error distributions on the inference process.

- #ifdef Confirmed Harmful: Promoting Understandable Software Variation[Abstract, PDF]IEEE Int. Symp. on Visual Languages and Human-Centric Computing (VL/HCC), 2011, 143–150
Maintaining variation in software is a difficult problem that poses serious challenges for the understanding and editing of software artifacts. Although the C preprocessor (CPP) is often the default tool used to introduce variability to software, because of its simplicity and flexibility, it is infamous for its obtrusive syntax and has been blamed for reducing the comprehensibility and maintainability of software. In this paper, we address this problem by developing a prototype for managing software variation at the source code level. We evaluate the difference between our prototype and CPP with a user study, which indicates that the prototype helps users reason about variational code faster and more accurately than CPP. Our results also support the research of others, providing evidence for the effectiveness of related tools, such as CIDE and FeatureCommander.

- A DSEL for Studying and Explaining Causation[Abstract, PDF, Code]IFIP Working Conf. on Domain-Specific Languages (DSL), 2011, 143–167
We present a domain-specific embedded language (DSEL) in Haskell that supports the philosophical study and practical explanation of causation. The language provides constructs for modeling situations comprised of events and functions for reliably determining the complex causal relationships that emerge between these events. It enables the creation of visual explanations of these causal relationships and a means to systematically generate alternative, related scenarios, along with corresponding outcomes and causes. The DSEL is based on neuron diagrams, a visual notation that is well established in practice and has been successfully employed for causation explanation and research. In addition to its immediate applicability by users of neuron diagrams, the DSEL is extensible, allowing causation experts to extend the notation to introduce special-purpose causation constructs. The DSEL also extends the notation of neuron diagrams to operate over non-boolean values, improving its expressiveness and offering new possibilities for causation research and its applications.

- Semantics First! Rethinking the Language Design Process[Abstract, PDF]ACM SIGPLAN Int. Conf. on Software Language Engineering (SLE), LNCS vol. 6940, Springer, 2011, 243–262
The design of languages is still more of an art than an engineering discipline. Although recently tools have been put forward to support the language design process, such as language workbenches, these have mostly focused on a syntactic view of languages. While these tools are quite helpful for the development of parsers and editors, they provide little support for the underlying design of the languages. In this paper we illustrate how to support the design of languages by focusing on their semantics first. Specifically, we will show that powerful and general language operators can be employed to adapt and grow sophisticated languages out of simple semantics concepts. We use Haskell as a metalanguage and will associate generic language concepts, such as semantics domains, with Haskell-specific ones, such as data types. We do this in a way that clearly distinguishes our approach to language design from the traditional syntax-oriented one. This will reveal some unexpected correlations, such as viewing type classes as language multipliers. We illustrate the viability of our approach with several real-world examples.

- Program Fields for Continuous Software[Abstract, PDF]ACM SIGSOFT Workshop on the Future of Software Engineering Research (FoSER), 2010, 105–108
We propose program fields, a formal representation for groups of related programs, as a new abstraction to support future software engineering research in several areas. We will discuss opportunities offered by program fields and research questions that have to be addressed.

- Causal Reasoning with Neuron Diagrams[Abstract, PDF]IEEE Int. Symp. on Visual Languages and Human-Centric Computing (VL/HCC), 2010, 101–108
The principle of causation is fundamental to science and society and has remained an active topic of discourse in philosophy for over two millennia. Modern philosophers often rely on “neuron diagrams”, a domain-specific visual language for discussing and reasoning about causal relationships and the concept of causation itself. In this paper we formalize the syntax and semantics of neuron diagrams. We discuss existing algorithms for identifying causes in neuron diagrams, show how these approaches are flawed, and propose solutions to these problems. We separate the standard representation of a dynamic execution of a neuron diagram from its static definition and define two separate, but related semantics, one for the causal effects of neuron diagrams and one for the identification of causes themselves. Most significantly, we propose a simple language extension that supports a clear, consistent, and comprehensive algorithm for automatic causal inference.

- Visual Explanations of Probabilistic Reasoning[Abstract, PDF]IEEE Int. Symp. on Visual Languages and Human-Centric Computing (VL/HCC), 2009, 23–27
Continuing our research in explanation-oriented language design, we present a domain-specific visual language for explaining probabilistic reasoning. Programs in this language, called explanation objects, can be manipulated according to a set of laws to automatically generate many equivalent explanation instances. We argue that this increases the explanatory power of our language by allowing a user to view a problem from many different perspectives.

- A Formal Representation of Software-Hardware System Design[Abstract, PDF]ASME Int. Design Engineering Technical Conf. & Computers and Information in Engineering Conf. (IDETC), 2009, 1387–1398
The design of hardware-software systems is a complex and difficult task exacerbated by the very different tools used by designers in each field. Even in small projects, tracking the impact, motivation and context of individual design decisions between designers and over time quickly becomes intractable. In an attempt to bridge this gap, we present a general, low-level model of the system design process. We formally define the concept of a design decision, and provide a hierarchical representation of both the design space and the context in which decisions are made. This model can serve as a foundation for software-hardware system design tools which will help designers cooperate more efficiently and effectively. We provide a high-level example of the use of such a system in a design problem provided through collaboration with NASA.

- A DSL for Explaining Probabilistic ReasoningIFIP Working Conf. on Domain-Specific Languages (DSL), LNCS vol. 5658, Springer, 2009, 335–359Best paper[Abstract, PDF]
We propose a new focus in language design where languages provide constructs that not only describe the computation of results, but also produce explanations of how and why those results were obtained. We posit that if users are to understand computations produced by a language, that language should provide explanations to the user. As an example of such an explanation-oriented language we present a domain-specific language for explaining probabilistic reasoning, a domain that is not well understood by non-experts. We show the design of the DSL in several steps. Based on a story-telling metaphor of explanations, we identify generic constructs for building stories out of events, and obtaining explanations by applying stories to specific examples. These generic constructs are then adapted to the particular explanation domain of probabilistic reasoning. Finally, we develop a visual notation for explaining probabilistic reasoning.

- Varying Domain Representations in Hagl – Extending the Expressiveness of a DSL for Experimental Game Theory[Abstract, PDF, Code]IFIP Working Conf. on Domain-Specific Languages (DSL), LNCS vol. 5658, Springer, 2009, 310–334
Experimental game theory is an increasingly important research tool in many fields, providing insight into strategic behavior through simulation and experimentation on game theoretic models. Unfortunately, despite relying heavily on automation, this approach has not been well supported by tools. Here we present our continuing work on Hagl, a domain-specific language embedded in Haskell, intended to drastically reduce the development time of such experiments and support a highly explorative research style. In this paper we present a fundamental redesign of the underlying game representation in Hagl. These changes allow us to better leverage domain knowledge by allowing different classes of games to be represented differently, exploiting existing domain representations and algorithms. In particular, we show how this supports analytical extensions to Hagl, and makes strategies for state-based games vastly simpler and more efficient.

- A Visual Language for Representing and Explaining Strategies in Game Theory[Abstract, PDF]IEEE Int. Symp. on Visual Languages and Human-Centric Computing (VL/HCC), 2008, 101–108
We present a visual language for strategies in game theory, which has potential applications in economics, social sciences, and in general science education. This language facilitates explanations of strategies by visually representing the interaction of players’ strategies with game execution. We have utilized the cognitive dimensions framework in the design phase and recognized the need for a new cognitive dimension of “traceability” that considers how well a language can represent the execution of a program. We consider how traceability interacts with other cognitive dimensions and demonstrate its use in analyzing existing languages. We conclude that the design of a visual representation for execution traces should be an integral part of the design of visual languages because understanding a program is often tightly coupled to its execution.

## Theses and Other Papers

- The Choice Calculus: A Formal Language of Variation[Abstract, PDF]PhD thesis, Oregon State University, 2013
In this thesis I present the choice calculus, a formal language for representing variation in software and other structured artifacts. The choice calculus is intended to support variation research in a way similar to the lambda calculus in programming language research. Specifically, it provides a simple formal basis for presenting, proving, and communicating theoretical results. It can serve as a common language of discourse for researchers working on different views of similar problems and provide a shared back end in tools.

This thesis collects a large amount of work on the choice calculus. It defines the syntax and denotational semantics of the language along with modular language extensions that add features important to variation research. It presents several theoretical results related to the choice calculus, such as an equivalence relation that supports semantics-preserving transformations of choice calculus expressions, and a type system for ensuring that an expression is well formed. It also presents a Haskell DSEL based on the choice calculus for exploring the concept of variational programming.

- Domain-Specific Language Support for Experimental Game Theory[Abstract, PDF, Code]MS thesis, Oregon State University, 2011
Experimental game theory is the use of game theoretic abstractions—games, players, and strategies—in experiments and simulations. It is often used in cases where traditional, analytical game theory fails or is difficult to apply. This thesis collects three previously published papers that provide domain-specific language (DSL) support for defining and executing these experiments, and for explaining their results.

Despite the widespread use of software in this field, there is a distinct lack of tool support for common tasks like modeling games and running simulations. Instead, most experiments are created from scratch in general-purpose programming languages. We have addressed this problem with Hagl, a DSL embedded in Haskell that allows the concise, declarative definition of games, strategies, and executable experiments. Hagl raises the level of abstraction for experimental game theory, reducing the effort to conduct experiments and freeing experimenters to focus on hard problems in their domain instead of low-level implementation details.

While analytical game theory is most often used as a prescriptive tool, a way to analyze a situation and determine the best course of action, experimental game theory is often applied descriptively to explain why agents interact and behave in a certain way. Often these interactions are complex and surprising. To support this explanatory role, we have designed visual DSL for explaining the interaction of strategies for iterated games. This language is used as a vehicle to introduce the notational quality of traceability and the new paradigm of explanation-oriented programming.

- Features and Feature Models: A Survey of Variation Representations[Abstract]Compendium of Computer Science Doctoral Qualifying Exams, Oregon State University, 2010
This survey explores and organizes existing work on the long-term management of software that varies in multiple dimensions. It focuses, in particular, on the representation of features in software product lines, and on capturing relationships between features in feature models.