Publications

Journal articles

  1. Variational Satisfiability Solving: Efficiently Solving Lots of Related SAT Problems
    Jeffrey M. Young, Paul Maximilian Bittner, Eric Walkingshaw, and Thomas Thüm
    Empirical Software Engineering (EMSE), vol. 28, num. 14, Springer, 2023, 1–53
    [Abstract, PDF, Code]

    Incremental satisfiability (SAT) solving is an extension of classic SAT solving that enables solving a set of related SAT problems by identifying and exploiting shared terms. However, using incremental solvers effectively is hard since performance is sensitive to the input order of subterms and results must be tracked manually. For analyses that generate sets of related SAT problems, such as those in software product lines, incremental solvers are either not used or their use is not clearly described in the literature. This paper translates the ordering problem to an encoding problem and automates the use of incremental solving. We introduce variational SAT solving, which differs from incremental solving by accepting all related problems as a single variational input and returning all results as a single variational output. Variational solving syntactically encodes differences in related SAT problems as local points of variation. With this syntax, our approach automates the interaction with the incremental solver and enables a method to automatically optimize sharing in the input. To evaluate these ideas, we formalize a variational SAT algorithm, construct a prototype variational solver, and perform an empirical analysis on two real-world datasets that applied incremental solvers to software evolution scenarios. We show, assuming a variational input, that the prototype solver scales better for these problems than four off-the-shelf incremental solvers while also automatically tracking individual results.

  2. Migrating Gradual Types
    John Peter Campora III, Sheng Chen, Martin Erwig, and Eric Walkingshaw
    Journal of Functional Programming (JFP), vol. 32, num. e14, 2022, 1–60
    [Abstract, PDF, Code]

    Gradual typing allows programs to enjoy the benefits of both static typing and dynamic typing. While it is often desirable to migrate a program from more dynamically typed to more statically typed or vice versa, gradual typing itself does not provide a way to facilitate this migration. This places the burden on programmers who have to manually add or remove type annotations. Besides the general challenge of adding type annotations to dynamically typed code, there are subtle interactions between these annotations in gradually typed code that exacerbate the situation. For example, to migrate a program to be as static as possible, in general, all possible combinations of adding or removing type annotations from parameters must be tried out and compared. In this paper, we address this problem by developing migrational typing, which efficiently types all possible ways of replacing dynamic types with fully static types for a gradually typed program. The typing result supports automatically migrating a program to be as static as possible or introducing the least number of dynamic types necessary to remove a type error. The approach can be extended to support user-defined criteria about which annotations to modify. We have implemented migrational typing and evaluated it on large programs. The results show that migrational typing scales linearly with the size of the program and takes only 2–4 times longer than plain gradual typing.

  3. Casts and Costs: Harmonizing Safety and Performance in Gradual Typing
    John Peter Campora III, Sheng Chen, and Eric Walkingshaw
    Proc. of the ACM on Programming Languages (PACMPL) issue ACM SIGPLAN Int. Conf. on Functional Programming (ICFP), vol. 2, 2018, 98:1–98:30
    [Abstract, PDF, Code]

    Gradual typing allows programmers to use both static and dynamic typing in a single program. However, a well-known problem with sound gradual typing is that the interactions between static and dynamic code can cause significant performance degradation. These performance pitfalls are hard to predict and resolve, and discourage users from using gradual typing features. For example, when migrating to a more statically typed program, often adding a type annotation will trigger a slowdown that can be resolved by adding more annotations elsewhere, but since it’s not clear where the additional annotations must be added, the easier solution is to simply remove the annotation.

    To address these problems, we develop: (1) a static cost semantics that accurately predicts the overhead of static-dynamic interactions in a gradually typed program, (2) a technique for efficiently inferring such costs for all combinations of inferable type assignments in a program, and (3) a method for translating the results of this analysis into specific recommendations and explanations that can help programmers understand, debug, and optimize the performance of gradually typed programs. We have implemented our approach in Herder, a tool for statically analyzing the performance of different typing configurations for Reticulated Python programs. An evaluation on 15 Python programs shows that Herder can use this analysis to accurately and efficiently recommend type assignments that optimize the performance of these programs without sacrificing type safety.

  4. Migrating Gradual Types
    John Peter Campora III, Sheng Chen, Martin Erwig, and Eric Walkingshaw
    Proc. of the ACM on Programming Languages (PACMPL) issue ACM SIGPLAN Symp. on Principles of Programming Languages (POPL), vol. 2, 2018, 15:1–15:29
    [Abstract, PDF, Code]

    Gradual typing allows programs to enjoy the benefits of both static typing and dynamic typing. While it is often desirable to migrate a program from more dynamically-typed to more statically-typed or vice versa, gradual typing itself does not provide a way to facilitate this migration. This places the burden on programmers who have to manually add or remove type annotations. Besides the general challenge of adding type annotations to dynamically typed code, there are subtle interactions between these annotations in gradually typed code that exacerbate the situation. For example, to migrate a program to be as static as possible, in general, all possible combinations of adding or removing type annotations from parameters must be tried out and compared.

    In this paper, we address this problem by developing migrational typing, which efficiently types all possible ways of adding or removing type annotations from a gradually typed program. The typing result supports automatically migrating a program to be as static as possible, or introducing the least number of dynamic types necessary to remove a type error. The approach can be extended to support user-defined criteria about which annotations to modify. We have implemented migrational typing and evaluated it on large programs. The results show that migrational typing scales linearly with the size of the program and takes only 2–4 times longer than plain gradual typing.

  5. Extending Type Inference to Variational Programs
    Sheng Chen, Martin Erwig, and Eric Walkingshaw
    ACM Trans. on Programming Languages and Systems (TOPLAS), vol. 36, num. 1, 2014, 1:1–1:54
    [Abstract, PDF]

    Through the use of conditional compilation and related tools, many software projects can be used to generate a huge number of related programs. The problem of typing such variational software is difficult. The brute-force strategy of generating all variants and typing each one individually is (1) usually infeasible for efficiency reasons and (2) produces results that do not map well to the underlying variational program. Recent research has focused mainly on efficiency and addressed only the problem of type checking. In this work we tackle the more general problem of variational type inference and introduce variational types to represent the result of typing a variational program. We introduce the variational lambda calculus (VLC) as a formal foundation for research on typing variational programs. We define a type system for VLC in which VLC expressions are mapped to correspondingly variational types. We show that the type system is correct by proving that the typing of expressions is preserved over the process of variation elimination, which eventually results in a plain lambda calculus expression and its corresponding type. We identify a set of equivalence rules for variational types and prove that the type unification problem modulo these equivalence rules is unitary and decidable; we also present a sound and complete unification algorithm. Based on the unification algorithm, the variational type inference algorithm is an extension of algorithm W. We show that it is sound and complete and computes principal types. We also consider the extension of VLC with sum types, a necessary feature for supporting variational data types, and demonstrate that the previous theoretical results also hold under this extension. Finally, we characterize the complexity of variational type inference and demonstrate the efficiency gains over the brute-force strategy.

  6. A Visual Language for Explaining Probabilistic Reasoning
    Martin Erwig and Eric Walkingshaw
    Journal of Visual Languages and Computing (JVLC), vol. 24, num. 2, 2013, 88–109
    [Abstract, PDF]

    We present an explanation-oriented, domain-specific, visual language for explaining probabilistic reasoning. Explanation-oriented programming is a new paradigm that shifts the focus of programming from the computation of results to explanations of how those results were computed. Programs in this language therefore describe explanations of probabilistic reasoning problems. The language relies on a storytelling metaphor of explanation, where the reader is guided through a series of well-understood steps from some initial state to the final result. Programs can also be manipulated according to a set of laws to automatically generate equivalent explanations from one explanation instance. This increases the explanatory value of the language by allowing readers to cheaply derive alternative explanations if they do not understand the first. The language is comprised of two parts: a formal textual notation for specifying explanation-producing programs and the more elaborate visual notation for presenting those explanations. We formally define the abstract syntax of explanations and define the semantics of the textual notation in terms of the explanations that are produced.

  7. The Choice Calculus: A Representation for Software Variation
    Martin Erwig and Eric Walkingshaw
    ACM Trans. on Software Engineering and Methodology (TOSEM), vol. 21, num. 1, 2011, 6:1–6:27
    [Abstract, PDF]

    Many areas of computer science are concerned with some form of variation in software–from managing changes to software over time, to supporting families of related artifacts. We present the choice calculus, a fundamental representation for software variation that can serve as a common language of discourse for variation research, filling a role similar to the lambda calculus in programming language research. We also develop an associated theory of software variation, including sound transformations of variation artifacts, the definition of strategic normal forms, and a design theory for variation structures, which will support the development of better algorithms and tools.

  8. A Domain-Specific Language for Experimental Game Theory
    Eric Walkingshaw and Martin Erwig
    Journal of Functional Programming (JFP), vol. 19, 2009, 645–661
    [Abstract, PDF, Code]

    Experimental game theory is increasingly important for research in many fields. Unfortunately, it is poorly supported by computer tools. We have created Hagl, a domain-specific language embedded in Haskell, to reduce the development time of game-theoretic experiments and make the definition and exploration of games and strategies simple and fun.

Peer-reviewed book chapters

  1. Variation Programming with the Choice Calculus
    Martin Erwig and Eric Walkingshaw
    Generative and Transformational Techniques in Software Engineering IV (GTTSE 2011), Revised and Extended Papers, LNCS vol. 7680, Springer, 2013, 55–100
    [Abstract, PDF, Code]

    The choice calculus provides a language for representing and transforming variation in software and other structured documents. Variability is captured in localized choices between alternatives. The space of all variations is organized by dimensions, which provide scoping and structure to choices. The variation space can be reduced through a process of selection, which eliminates a dimension and resolves all of its associated choices by replacing each with one of their alternatives. The choice calculus also allows the definition of arbitrary functions for the flexible construction and transformation of all kinds of variation structures. In this tutorial we will first present the motivation, general ideas, and principles that underlie the choice calculus. This is followed by a closer look at the semantics. We will then present practical applications based on several small example scenarios and consider the concepts of “variation programming” and “variation querying”. The practical applications involve work with a Haskell library that supports variation programming and experimentation with the choice calculus.

  2. Semantics-Driven DSL Design
    Martin Erwig and Eric Walkingshaw
    Formal and Practical Aspects of Domain-Specific Languages: Recent Developments, (ed. Marjan Mernik), IGI Global, 2012, 56–80
    [Abstract, PDF]

    Convention dictates that the design of a language begins with its syntax. We argue that early emphasis should be placed instead on the identification of general, compositional semantic domains, and that grounding the design process in semantics leads to languages with more consistent and more extensible syntax. We demonstrate this semantics-driven design process through the design and implementation of a DSL for defining and manipulating calendars, using Haskell as a metalanguage to support this discussion. We emphasize the importance of compositionality in semantics-driven language design, and describe a set of language operators that support an incremental and modular design process.

Peer-reviewed conference and workshop papers

  1. A Variational Database Management System
    Parisa Ataei, Fariba Khan, and Eric Walkingshaw
    ACM SIGPLAN Int. Conf. on Generative Programming: Concepts and Experiences (GPCE), 2021, 29–42
    [Abstract, PDF, Code]

    Many problems require working with data that varies in its structure and content. Current approaches, such as schema evolution or data integration tools, are highly tailored to specific kinds of variation in databases. While these approaches work well in their roles, they do not address all kinds of variation and do address the interaction of different kinds of variation in databases. In this paper, we define a framework for capturing variation as a generic and orthogonal concern in relational databases. We define variational schemas, variational databases, and variational queries for capturing variation in the structure, content, and information needs of relational databases, respectively. We define a type system that ensures variational queries are consistent with respect to a variational schema. Finally, we design and implement a variational database management system as an abstraction layer over a traditional relational database management system. Using previously developed use cases, we show the feasibility of our framework and demonstrate the performance of different approaches used in our system.

  2. Should Variation Be Encoded Explicitly in Databases?
    Parisa Ataei, Qiaoran Li, and Eric Walkingshaw
    Int. Working Conf. on Variability Modelling of Software-Intensive Systems (VaMoS), 2021, 3:1–3:9
    [Abstract, PDF]

    Variation occurs in databases in many different forms and contexts. For example, a single database schema evolves over time, data from different sources may be combined, and the various configurations of a software product line (SPL) may have different data needs. While approaches have been developed to deal with many such scenarios, particularly in the fields of database evolution and data integration, there is no solution that treats variation as a general and orthogonal concern in databases. This is a problem when various kinds of variation intersect, such as during the evolution of a SPL. Previously, we have proposed variational databases (VDB) as a general way to represent variation in both the structure and content of databases. Although the model underlying VDB is simple, encoding variation explicitly in databases introduces complexity akin to using preprocessing directives in software. In this paper, we explore the feasibility of VDB and its associated variational query language for encoding different kinds of database variability. We develop two use cases that illustrate how different kinds of variation can be encoded and integrated in VDB, and how the corresponding information needs can be expressed as variational queries. We then use these use cases to discuss the benefits and drawbacks of such a direct encoding of variation in data and queries.

  3. Variational Satisfiability Solving
    Jeffrey M. Young, Eric Walkingshaw, and Thomas Thüm
    ACM SIGSOFT Int. Systems and Software Product Line Conf. (SPLC), 2020, 18:1–18:12
    [Abstract, PDF, Code]

    Incremental satisfiability (SAT) solving is an extension of classic SAT solving that allows users to efficiently solve a set of related SAT problems by identifying and exploiting shared terms. However, using incremental solvers effectively is hard since performance is sensitive to a problem’s structure and the order sub-terms are fed to the solver, and the burden to track results is placed on the end user. For analyses that generate sets of related SAT problems, such as those in software product lines, incremental SAT solvers are either not used at all, used but not explicitly stated so in the literature, or used but suffer from the aforementioned usability problems. This paper translates the ordering problem to an encoding problem and automates the use of incremental SAT solving. We introduce variational SAT solving, which differs from incremental SAT solving by accepting all related problems as a single variational input and returning all results as a single variational output. Our central idea is to make explicit the operations of incremental SAT solving, thereby encoding differences between related SAT problems as local points of variation. Our approach automates the interaction with the incremental solver and enables methods to automatically optimize sharing of the input. To evaluate our methods we construct a prototype variational SAT solver and perform an empirical analysis on two real-world datasets that applied incremental solvers to software evolution scenarios. We show, assuming a variational input, that the prototype solver scales better for these problems than naive incremental solving while also removing the need to track individual results.

  4. Toward Efficient Analysis of Variation in Time and Space
    Thomas Thüm, Leopoldo Teixeira, Klaus Schmid, Eric Walkingshaw, Mukelabai Mukelabai, Mahsa Varshosaz, Goetz Botterweck, Ina Schaefer, and Timo Kehrer
    Int. Workshop on Variability and Evolution of Software-Intensive Systems (VariVolution), 2019, 57–64
    [Abstract, PDF]

    Variation is central to today’s software development. There are two fundamental dimensions to variation: Variation in time refers to the fact that software exists in numerous revisions that typically replace each other (i.e., a newer version supersedes an older one). Variation in space refers to differences among variants that are designed to coexist in parallel. There are numerous analyses to cope with variation in space (i.e., product-line analyses) and others that cope with variation in time (i.e., regression analyses). The goal of this work is to discuss to which extent product-line analyses can be applied to revisions and, conversely, where regression analyses can be applied to variants. In addition, we discuss challenges related to the combination of product-line and regression analyses. The overall goal is to increase the efficiency of analyses by exploiting the inherent commonality between variants and revisions.

  5. Declarative GUIs: Simple, Consistent, and Verified
    Stephan Adelsberger, Anton Setzer, and Eric Walkingshaw
    ACM SIGPLAN Int. Symp. on Principles and Practice of Declarative Programming (PPDP), 2018, 4:1–4:15
    [Abstract, PDF, Code]

    Graphical user interfaces (GUIs) are ubiquitous in real-world software and a notorious source of bugs that are difficult to catch through software testing. Model checking has been used to prove the absence of certain kinds of bugs, but model checking works on an abstract model of the GUI application, which might be inconsistent with its implementation. We present a library for developing directly verified, state-dependent GUI applications in the dependently typed programming language Agda. In the library, the type of a GUI’s controller depends on a specification of the GUI itself, statically enforcing consistency between them. Arbitrary properties can be defined and proved in terms of user interactions and state transitions. Our library connects to a custom-built Haskell back-end for declarative vector-based GUI elements. Compared to an earlier version of our library built on an existing imperative GUI framework, the more declarative back-end supports simpler definitions and proofs.

    As a practical application of our library to a safety-critical domain, we present a case study developed in cooperation with the Medical University of Vienna. The case study implements a healthcare process for prescribing anticoagulants, which is highly error-prone when followed manually. Our implementation generates GUIs from an abstract description of a data-aware business process, making our approach easy to reuse and adapt to other safety-critical processes. We prove medically relevant safety properties about the executable GUI application, such as that given certain inputs, certain states must or must not be reached.

  6. Developing GUI Applications in a Verified Setting
    Stephan Adelsberger, Anton Setzer, and Eric Walkingshaw
    Int. Symp. on Dependable Software Engineering: Theories, Tools, and Applications (SETTA), Springer, 2018, 89–107
    [Abstract, PDF, Code]

    Although there have been major achievements in verified software, work on verifying graphical user interfaces (GUI) applications is underdeveloped relative to their ubiquity and societal importance. In this paper, we present a library for the development of verified, state-dependent GUI applications in the dependently typed programming language Agda. The library uses Agda’s expressive type system to ensure that the GUI, its controller, and the underlying model are all consistent, significantly reducing the scope for GUI-related bugs. We provide a way to specify and prove correctness properties of GUI applications in terms of user interactions and state transitions. Critically, GUI applications and correctness properties are not restricted to finite state machines and may involve the execution of arbitrary interactive programs. Additionally, the library connects to a standard, imperative GUI framework, enabling the development of native GUI applications with expected features, such as concurrency. We present applications of our library to building GUI applications to manage healthcare processes. The correctness properties we consider are the following: (1) That a state can only be reached by passing through a particular intermediate state, for example, that a particular treatment can only be reached after having conducted an X-Ray. (2) That one eventually reaches a particular state, for example, that one eventually decides on a treatment. The specification of such properties is defined in terms of a GUI application simulator, which simulates all possible sequences of interactions carried out by the user.

  7. Managing Structurally Heterogeneous Databases in Software Product Lines
    Parisa Ataei, Arash Termehchy, and Eric Walkingshaw
    VLDB Workshop: Polystores and Other Systems for Heterogeneous Data (Poly), Springer, 2018
    [Abstract, PDF]

    Data variations are prevalent while developing software product lines (SPLs). A SPL enables a software vendor to quickly produce different variants of their software tailored to variations in their clients’ business requirements, conventions, desired feature sets, and deployment environments. In database-backed software, the database of each variant may have a different schema and content, giving rise to numerous data variants. Users often need to query and/or analyze all variants in a SPL simultaneously. For example, a software vendor wants to perform common tests or inquiries over all variants. Unfortunately, there is no systematic approach to managing and querying data variations and users have to use their intuition to perform such tasks, often resorting to repeating a task for each variant. We introduce VDBMS (Variational Database Management System), a system that provides a compact, expressive, and structured representation of variation in relational databases. In contrast to data integration systems that provide a unified representation for all data sources, VDBMS makes variations explicit in both the schema and query. Although variations can make VDBMS queries more complex than plain queries, a strong static type system ensures that all variants of the query are consistent with the corresponding variants of the database. Additionally, variational queries make it possible to compactly represent and efficiently run queries over a huge range of data variations in a single query. This directly supports many tasks that would otherwise be intractable in highly variational database-backed SPLs.

  8. A Domain Analysis of Data Structure and Algorithm Explanations in the Wild
    Jeffrey M. Young and Eric Walkingshaw
    ACM SIGCSE Technical Symp. on Computer Science Education (SIGCSE), 2018, 870–875
    [Abstract, PDF, Data]

    Explanations of data structures and algorithms are complex interactions of several notations, including natural language, mathematics, pseudocode, and diagrams. Currently, such explanations are created ad hoc using a variety of tools and the resulting artifacts are static, reducing explanatory value. We envision a domain-specific language for developing rich, interactive explanations of data structures and algorithms. In this paper, we analyze this domain to sketch requirements for our language. We perform a grounded theory analysis to generate a qualitative coding system for explanation artifacts collected online. This coding system implies a common structure among explanations of algorithms and data structures. We believe this structure can be reused as the semantic basis of a domain-specific language for creating interactive explanation artifacts. This work is part of our effort to develop the paradigm of explanation-oriented programming, which shifts the focus of programming from computing results to producing rich explanations of how those results were computed.

  9. Variational Databases
    Parisa Ataei, Arash Termehchy, and Eric Walkingshaw
    Int. Symp. on Database Programming Languages (DBPL), ACM, 2017, 11:1–11:4
    [Abstract, PDF]

    Data variations are prevalent in real-world applications. For example, software vendors handle variations in the business requirements, conventions, and environmental settings of a software product using hundreds of features each combination of which creates a different version of the product. In database-backed software, the database of each version may have a different schema and different content. Variations in the value and representation of each element in a dataset give rise to numerous variants in these applications. Users often would like to express information needs over all such variants. For example, a software vendor would like to perform common tests over all versions of its product, e.g., whether each relation in a relational database has a primary key. Hence, users need a query interface that hides the variational nature of the data and processes a query over all variations of a dataset. We propose a novel abstraction called a variational database that provides a compact and structured representation of general forms of data variations and enables users to query database variations easily.

  10. How Good are Your Types? Using Mutation Analysis to Evaluate the Effectiveness of Type Annotations
    Rahul Gopinath and Eric Walkingshaw
    Int. Workshop on Mutation Analysis (Mutation), IEEE, 2017, 122–127
    Best presentation
    [Abstract, PDF]

    Software engineers primarily use two orthogonal means to reduce susceptibility to faults: software testing and static type checking. While many strategies exist to evaluate the effectiveness of a test suite in catching bugs, there are few that evaluate the effectiveness of type annotations in a program. This problem is most relevant in the context of gradual or optional typing, where programmers are free to choose which parts of a program to annotate and in what detail. Mutation analysis is one strategy that has proven useful for measuring test suite effectiveness by emulating potential software faults. We propose that mutation analysis can be used to evaluate the effectiveness of type annotations too. We analyze mutants produced by the MutPy mutation framework against both a test suite and against type-annotated programs. We show that, while mutation analysis can be useful for evaluating the effectiveness of type annotations, we require stronger mutation operators that target type information in programs to be an effective mutation analysis tool.

  11. A Choice of Variational Stacks: Exploring Variational Data Structures
    Meng Meng, Jens Meinicke, Chu-Pan Wong, Eric Walkingshaw, and Christian Kästner
    Int. Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), ACM, 2017, 28–35
    [Abstract, PDF]

    Many applications require not only representing variability in software and data, but also computing with it. To do so efficiently requires variational data structures that make the variability explicit in the underlying data and the operations used to manipulate it. Variational data structures have been developed ad hoc for many applications, but there is little general understanding of how to design them or what tradeoffs exist among them. In this paper, we strive for a more systematic exploration and analysis of a variational data structure. We want to know how different design decisions affect the performance and scalability of a variational data structure, and what properties of the underlying data and operation sequences need to be considered. Specifically, we study several alternative designs of a variational stack, a data structure that supports efficiently representing and computing with multiple variants of a plain stack, and that is a common building block in many algorithms. The different variational stacks are presented as a small product line organized by three design decisions. We analyze how these design decisions affect the performance of a variational stack with different usage profiles. Finally, we evaluate how these design decisions affect the performance of the variational stack in a real-world scenario: in the interpreter when executing real software containing variability.

  12. Formula Choice Calculus
    Spencer Hubbard and Eric Walkingshaw
    Int. Workshop on Feature-Oriented Software Development (FOSD), ACM, 2016, 49–57
    [Abstract, PDF, Code]

    The choice calculus is a simple metalanguage and associated theory that has been successfully applied to several problems of interest to the feature-oriented software development community. The formal presentation of the choice calculus essentially restricts variation points, called choices, to vary based on the inclusion or not of a single feature, while in practice variation points may depend on several features. Therefore, in both theoretical applications of the choice calculus, and in tools inspired by the choice calculus, the syntax of choices has often been generalized to depend on an arbitrary propositional formula of features. The purpose of this paper is to put this syntactic generalization on more solid footing by also generalizing the associated theory. Specifically, after defining the syntax and denotational semantics of the formula choice calculus (FCC), we define and prove the soundness of a syntactic equivalence relation on FCC expressions. This effort validates previous work which has implicitly assumed the soundness of rules in the equivalence relation, and also reveals several rules that are specific to FCC. Finally, we describe some further generalizations to FCC and their limits.

  13. Concepts, Operations, and Feasibility of a Projection-Based Variation Control System
    Ștefan Stănciulescu, Thorsten Berger, Eric Walkingshaw, and Andrzej Wąsowski
    IEEE Int. Conf. on Software Maintenance and Evolution (ICSME), 2016, 323–333
    [Abstract, PDF, Code/Data]

    Highly configurable software often uses preprocessor annotations to handle variability. However, understanding, maintaining, and evolving code with such annotations is difficult, mainly because a developer has to work with all variants at a time. Dedicated methods and tools that allow working on a subset of all variants could ease the engineering of highly configurable software. We investigate the potential of one kind of such tools: projection-based variation control systems integrated with projectional editors. For such systems we aim to understand: (i) what end-user operations they need to support, and (ii) whether they can realize the actual evolution of real-world, highly configurable software. We conduct an experiment that investigates variability-related evolution patterns and that evaluates the feasibility of a projection-based variation control system by replaying parts of the history of a real-world 3D printer firmware project. Among others, we show that the prototype variation control system does indeed support the evolution of a highly configurable system and that in general, it does not degrade the code.

  14. A Calculus for Variational Programming
    Sheng Chen, Martin Erwig, and Eric Walkingshaw
    European Conf. on Object-Oriented Programming (ECOOP), LIPIcs vol. 56, 2016, 6:1–6:28
    [Abstract, PDF]

    Variation is ubiquitous in software. Many applications can benefit from making this variation explicit, then manipulating and computing with it directly—a technique we call “variational programming”. This idea has been independently discovered in several application domains, such as efficiently analyzing and verifying software product lines, combining bounded and symbolic model-checking, and computing with alternative privacy profiles. Although these domains share similar core problems, and there are also many similarities in the solutions, there is no dedicated programming language support for variational programming. This makes the various implementations tedious, prone to errors, hard to maintain and reuse, and difficult to compare.

    In this paper we present a calculus that forms the basis of a programming language with explicit support for representing, manipulating, and computing with variation in programs and data. We illustrate how such a language can simplify the implementation of variational programming tasks. We present the syntax and semantics of the core calculus, a sound type system, and a type inference algorithm that produces principal types.

  15. Programs for People: What We Can Learn from Lab Protocols
    Keeley Abbott, Christopher Bogart, and Eric Walkingshaw
    IEEE Int. Symp. on Visual Languages and Human-Centric Computing (VL/HCC), 2015, 203–211
    [Abstract, PDF, Data]

    Humans play an active role in the execution of certain kinds of programs, such as spreadsheets, workflows, and interactive notebooks. Interacting closely with execution is especially useful when end-users are learning from examples while doing their work. In order to better understand the language features needed to support this kind of use, we investigated a particularly rigid and formalized category of “program” people write for each other: lab protocols. These protocols present a linear, idealized process despite the complex contingencies of the lab work they describe. However, they employ a variety of techniques for limiting or expanding the semantic interpretation of individual steps and for integrating outside protocols. We use these observations to derive implications for the design of interactive and mixed-initiative programming languages.

  16. Variational Data Structures: Exploring Trade-Offs in Computing With Variability
    Eric Walkingshaw, Christian Kästner, Martin Erwig, Sven Apel, and Eric Bodden
    ACM SIGPLAN Symp. on New Ideas in Programming and Reflections on Software (Onward!), 2014, 213–226
    [Abstract, PDF]

    Variation is everywhere, and in the construction and analysis of customizable software it is paramount. In this context, there arises a need for variational data structures for efficiently representing and computing with related variants of an underlying data type. So far, variational data structures have been explored and developed ad hoc. This paper is a first attempt and a call to action for systematic and foundational research in this area. Research on variational data structures will benefit not only customizable software, but many other application domains that must cope with variability. In this paper, we show how support for variation can be understood as a general and orthogonal property of data types, data structures, and algorithms. We begin a systematic exploration of basic variational data structures, exploring the tradeoffs among different implementations. Finally, we retrospectively analyze the design decisions in our own previous work where we have independently encountered problems requiring variational data structures.

  17. Projectional Editing of Variational Software
    Eric Walkingshaw and Klaus Ostermann
    ACM SIGPLAN Int. Conf. on Generative Programming and Component Engineering (GPCE), 2014, 29–38
    Best paper
    [Abstract, PDF]

    Editing the source code of variational software is complicated by the presence of variation annotations, such as #ifdef statements, and by code that is only included in some configurations. When editing some configurations and not others, it would be easier to edit a simplified version of the source code that includes only the configurations we currently care about. In this paper, we present a projectional editing model for variational software. Using our approach, a programmer can partially configure a variational program, edit this simplified view of the code, and then automatically update the original, fully variational source code. The model is based on an isolation principle where edits affect only the variants that are visible in the view. We show that this principle has several nice properties that are suggested by related work on bidirectional transformations.

  18. An Abstract Representation of Variational Graphs
    Martin Erwig, Eric Walkingshaw, and Sheng Chen
    Int. Workshop on Feature-Oriented Software Development (FOSD), ACM, 2013, 25–32
    [Abstract, PDF]

    In the context of software product lines, there is often a need to represent graphs containing variability. For example, extending traditional modeling techniques or program analyses to variational software requires a corresponding notion of variational graphs. In this paper, we introduce a general model of variational graphs and a theoretical framework for discussing variational graph algorithms. Specifically, we present an abstract syntax based on tagging for succinctly representing variational graphs and other data types relevant to variational graph algorithms, such as variational sets and paths. We demonstrate how (non-variational) graph algorithms can be generalized to operate on variational graphs, to accept variational inputs, and produce variational outputs. Finally, we discuss a filtering operation on variational graphs and how this interacts with variational graph algorithms.

  19. Adding Configuration to the Choice Calculus
    Martin Erwig, Klaus Ostermann, Tillmann Rendel, and Eric Walkingshaw
    Int. Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), ACM, 2013, 13:1–13:8
    [Abstract, PDF]

    The choice calculus is a formal language for representing variation in software artifacts. Variability is organized in the choice calculus through the use of dimensions, where each dimension represents a decision that must be made in order to obtain a particular variant. However, the process of selecting alternatives from dimensions was relegated to an external operation. This precludes many interesting variation and reuse patterns, such as nested product lines, and theoretical results, such as a syntactic description of configuration, that would be possible if selection were a part of the language itself.

    In this paper we add a selection operation to the choice calculus and illustrate how that increases the expressiveness of the calculus. We investigate some alternative semantics of this operation and study their impact and utility. Specifically, we will examine selection in the context of static and dynamically scoped dimension declarations, as a well as a modest and greedy form of dimension elimination. We also present a design for a type system to ensure configuration safety and modularity of nested product lines.

  20. A Calculus for Modeling and Implementing Variation
    Eric Walkingshaw and Martin Erwig
    ACM SIGPLAN Int. Conf. on Generative Programming and Component Engineering (GPCE), 2012, 132–140
    [Abstract, PDF]

    We present a formal calculus for modeling and implementing variation in software. It unifies the compositional and annotative approaches to feature implementation and supports the development of abstractions that can be used to directly relate feature models to their implementation. Since the compositional and annotative approaches are complementary, the calculus enables implementers to use the best combination of tools for the job and focus on inherent feature interactions, rather than those introduced by biases in the representation. The calculus also supports the abstraction of recurring variational patterns and provides a metaprogramming platform for organizing variation in artifacts.

  21. An Error-Tolerant Type System for Variational Lambda Calculus
    Sheng Chen, Martin Erwig, and Eric Walkingshaw
    ACM SIGPLAN Int. Conf. on Functional Programming (ICFP), 2012, 29–40
    [Abstract, PDF]

    Conditional compilation and software product line technologies make it possible to generate a huge number of different programs from a single software project. Typing each of these programs individually is usually impossible due to the sheer number of possible variants. Our previous work has addressed this problem with a type system for variational lambda calculus (VLC), an extension of lambda calculus with basic constructs for introducing and organizing variation. Although our type inference algorithm is more efficient than the brute-force strategy of inferring the types of each variant individually, it is less robust since type inference will fail for the entire variational expression if any one variant contains a type error. In this work, we extend our type system to operate on VLC expressions containing type errors. This extension directly supports locating ill-typed variants and the incremental development of variational programs. It also has many subtle implications for the unification of variational types. We show that our extended type system possesses a principal typing property and that the underlying unification problem is unitary. Our unification algorithm computes partial unifiers that lead to result types that (1) contain errors in as few variants as possible and (2) are most general. Finally, we perform an empirical evaluation to determine the overhead of this extension compared to our previous work, to demonstrate the improvements over the brute-force approach, and to explore the effects of various error distributions on the inference process.

  22. #ifdef Confirmed Harmful: Promoting Understandable Software Variation
    Duc Le, Eric Walkingshaw, and Martin Erwig
    IEEE Int. Symp. on Visual Languages and Human-Centric Computing (VL/HCC), 2011, 143–150
    [Abstract, PDF]

    Maintaining variation in software is a difficult problem that poses serious challenges for the understanding and editing of software artifacts. Although the C preprocessor (CPP) is often the default tool used to introduce variability to software, because of its simplicity and flexibility, it is infamous for its obtrusive syntax and has been blamed for reducing the comprehensibility and maintainability of software. In this paper, we address this problem by developing a prototype for managing software variation at the source code level. We evaluate the difference between our prototype and CPP with a user study, which indicates that the prototype helps users reason about variational code faster and more accurately than CPP. Our results also support the research of others, providing evidence for the effectiveness of related tools, such as CIDE and FeatureCommander.

  23. A DSEL for Studying and Explaining Causation
    Eric Walkingshaw and Martin Erwig
    IFIP Working Conf. on Domain-Specific Languages (DSL), 2011, 143–167
    [Abstract, PDF, Code]

    We present a domain-specific embedded language (DSEL) in Haskell that supports the philosophical study and practical explanation of causation. The language provides constructs for modeling situations comprised of events and functions for reliably determining the complex causal relationships that emerge between these events. It enables the creation of visual explanations of these causal relationships and a means to systematically generate alternative, related scenarios, along with corresponding outcomes and causes. The DSEL is based on neuron diagrams, a visual notation that is well established in practice and has been successfully employed for causation explanation and research. In addition to its immediate applicability by users of neuron diagrams, the DSEL is extensible, allowing causation experts to extend the notation to introduce special-purpose causation constructs. The DSEL also extends the notation of neuron diagrams to operate over non-boolean values, improving its expressiveness and offering new possibilities for causation research and its applications.

  24. Semantics First! Rethinking the Language Design Process
    Martin Erwig and Eric Walkingshaw
    ACM SIGPLAN Int. Conf. on Software Language Engineering (SLE), LNCS vol. 6940, Springer, 2011, 243–262
    [Abstract, PDF]

    The design of languages is still more of an art than an engineering discipline. Although recently tools have been put forward to support the language design process, such as language workbenches, these have mostly focused on a syntactic view of languages. While these tools are quite helpful for the development of parsers and editors, they provide little support for the underlying design of the languages. In this paper we illustrate how to support the design of languages by focusing on their semantics first. Specifically, we will show that powerful and general language operators can be employed to adapt and grow sophisticated languages out of simple semantics concepts. We use Haskell as a metalanguage and will associate generic language concepts, such as semantics domains, with Haskell-specific ones, such as data types. We do this in a way that clearly distinguishes our approach to language design from the traditional syntax-oriented one. This will reveal some unexpected correlations, such as viewing type classes as language multipliers. We illustrate the viability of our approach with several real-world examples.

  25. Program Fields for Continuous Software
    Martin Erwig and Eric Walkingshaw
    ACM SIGSOFT Workshop on the Future of Software Engineering Research (FoSER), 2010, 105–108
    [Abstract, PDF]

    We propose program fields, a formal representation for groups of related programs, as a new abstraction to support future software engineering research in several areas. We will discuss opportunities offered by program fields and research questions that have to be addressed.

  26. Causal Reasoning with Neuron Diagrams
    Martin Erwig and Eric Walkingshaw
    IEEE Int. Symp. on Visual Languages and Human-Centric Computing (VL/HCC), 2010, 101–108
    [Abstract, PDF]

    The principle of causation is fundamental to science and society and has remained an active topic of discourse in philosophy for over two millennia. Modern philosophers often rely on “neuron diagrams”, a domain-specific visual language for discussing and reasoning about causal relationships and the concept of causation itself. In this paper we formalize the syntax and semantics of neuron diagrams. We discuss existing algorithms for identifying causes in neuron diagrams, show how these approaches are flawed, and propose solutions to these problems. We separate the standard representation of a dynamic execution of a neuron diagram from its static definition and define two separate, but related semantics, one for the causal effects of neuron diagrams and one for the identification of causes themselves. Most significantly, we propose a simple language extension that supports a clear, consistent, and comprehensive algorithm for automatic causal inference.

  27. Visual Explanations of Probabilistic Reasoning
    Martin Erwig and Eric Walkingshaw
    IEEE Int. Symp. on Visual Languages and Human-Centric Computing (VL/HCC), 2009, 23–27
    [Abstract, PDF]

    Continuing our research in explanation-oriented language design, we present a domain-specific visual language for explaining probabilistic reasoning. Programs in this language, called explanation objects, can be manipulated according to a set of laws to automatically generate many equivalent explanation instances. We argue that this increases the explanatory power of our language by allowing a user to view a problem from many different perspectives.

  28. A Formal Representation of Software-Hardware System Design
    Eric Walkingshaw, Paul Strauss, Martin Erwig, John Mueller, and Irem Tumer
    ASME Int. Design Engineering Technical Conf. & Computers and Information in Engineering Conf. (IDETC), 2009, 1387–1398
    [Abstract, PDF]

    The design of hardware-software systems is a complex and difficult task exacerbated by the very different tools used by designers in each field. Even in small projects, tracking the impact, motivation and context of individual design decisions between designers and over time quickly becomes intractable. In an attempt to bridge this gap, we present a general, low-level model of the system design process. We formally define the concept of a design decision, and provide a hierarchical representation of both the design space and the context in which decisions are made. This model can serve as a foundation for software-hardware system design tools which will help designers cooperate more efficiently and effectively. We provide a high-level example of the use of such a system in a design problem provided through collaboration with NASA.

  29. A DSL for Explaining Probabilistic Reasoning
    Martin Erwig and Eric Walkingshaw
    IFIP Working Conf. on Domain-Specific Languages (DSL), LNCS vol. 5658, Springer, 2009, 335–359
    Best paper
    [Abstract, PDF]

    We propose a new focus in language design where languages provide constructs that not only describe the computation of results, but also produce explanations of how and why those results were obtained. We posit that if users are to understand computations produced by a language, that language should provide explanations to the user. As an example of such an explanation-oriented language we present a domain-specific language for explaining probabilistic reasoning, a domain that is not well understood by non-experts. We show the design of the DSL in several steps. Based on a story-telling metaphor of explanations, we identify generic constructs for building stories out of events, and obtaining explanations by applying stories to specific examples. These generic constructs are then adapted to the particular explanation domain of probabilistic reasoning. Finally, we develop a visual notation for explaining probabilistic reasoning.

  30. Varying Domain Representations in Hagl – Extending the Expressiveness of a DSL for Experimental Game Theory
    Eric Walkingshaw and Martin Erwig
    IFIP Working Conf. on Domain-Specific Languages (DSL), LNCS vol. 5658, Springer, 2009, 310–334
    [Abstract, PDF, Code]

    Experimental game theory is an increasingly important research tool in many fields, providing insight into strategic behavior through simulation and experimentation on game theoretic models. Unfortunately, despite relying heavily on automation, this approach has not been well supported by tools. Here we present our continuing work on Hagl, a domain-specific language embedded in Haskell, intended to drastically reduce the development time of such experiments and support a highly explorative research style. In this paper we present a fundamental redesign of the underlying game representation in Hagl. These changes allow us to better leverage domain knowledge by allowing different classes of games to be represented differently, exploiting existing domain representations and algorithms. In particular, we show how this supports analytical extensions to Hagl, and makes strategies for state-based games vastly simpler and more efficient.

  31. A Visual Language for Representing and Explaining Strategies in Game Theory
    Martin Erwig and Eric Walkingshaw
    IEEE Int. Symp. on Visual Languages and Human-Centric Computing (VL/HCC), 2008, 101–108
    [Abstract, PDF]

    We present a visual language for strategies in game theory, which has potential applications in economics, social sciences, and in general science education. This language facilitates explanations of strategies by visually representing the interaction of players’ strategies with game execution. We have utilized the cognitive dimensions framework in the design phase and recognized the need for a new cognitive dimension of “traceability” that considers how well a language can represent the execution of a program. We consider how traceability interacts with other cognitive dimensions and demonstrate its use in analyzing existing languages. We conclude that the design of a visual representation for execution traces should be an integral part of the design of visual languages because understanding a program is often tightly coupled to its execution.

Graduate theses

  1. The Choice Calculus: A Formal Language of Variation
    Eric Walkingshaw
    PhD dissertation, Oregon State University, 2013
    [Abstract, PDF]

    In this thesis I present the choice calculus, a formal language for representing variation in software and other structured artifacts. The choice calculus is intended to support variation research in a way similar to the lambda calculus in programming language research. Specifically, it provides a simple formal basis for presenting, proving, and communicating theoretical results. It can serve as a common language of discourse for researchers working on different views of similar problems and provide a shared back end in tools.

    This thesis collects a large amount of work on the choice calculus. It defines the syntax and denotational semantics of the language along with modular language extensions that add features important to variation research. It presents several theoretical results related to the choice calculus, such as an equivalence relation that supports semantics-preserving transformations of choice calculus expressions, and a type system for ensuring that an expression is well formed. It also presents a Haskell DSEL based on the choice calculus for exploring the concept of variational programming.

  2. Domain-Specific Language Support for Experimental Game Theory
    Eric Walkingshaw
    MS thesis, Oregon State University, 2011
    [Abstract, PDF, Code]

    Experimental game theory is the use of game theoretic abstractions—games, players, and strategies—in experiments and simulations. It is often used in cases where traditional, analytical game theory fails or is difficult to apply. This thesis collects three previously published papers that provide domain-specific language (DSL) support for defining and executing these experiments, and for explaining their results.

    Despite the widespread use of software in this field, there is a distinct lack of tool support for common tasks like modeling games and running simulations. Instead, most experiments are created from scratch in general-purpose programming languages. We have addressed this problem with Hagl, a DSL embedded in Haskell that allows the concise, declarative definition of games, strategies, and executable experiments. Hagl raises the level of abstraction for experimental game theory, reducing the effort to conduct experiments and freeing experimenters to focus on hard problems in their domain instead of low-level implementation details.

    While analytical game theory is most often used as a prescriptive tool, a way to analyze a situation and determine the best course of action, experimental game theory is often applied descriptively to explain why agents interact and behave in a certain way. Often these interactions are complex and surprising. To support this explanatory role, we have designed visual DSL for explaining the interaction of strategies for iterated games. This language is used as a vehicle to introduce the notational quality of traceability and the new paradigm of explanation-oriented programming.

  3. Features and Feature Models: A Survey of Variation Representations
    Eric Walkingshaw
    Compendium of Computer Science Doctoral Qualifying Exams, Oregon State University, 2010
    [Abstract]

    This survey explores and organizes existing work on the long-term management of software that varies in multiple dimensions. It focuses, in particular, on the representation of features in software product lines, and on capturing relationships between features in feature models.

Advised theses

I was the major advisor for the students whose work is included below.

  1. Theory and Implementation of a Variational Database Management System
    Parisa Ataei
    PhD dissertation, Oregon State University, 2021
    [Abstract, PDF]

    In this thesis, I present the variational database management system, a formal framework and its implementation for representing variation in relational databases and managing variational information needs. A variational database is intended to support any kind of variation in a database. Specific kinds of variation in databases have already been studied and are well-supported, for example, schema evolution systems address the variation of a database’s schema over time and data integration systems address variation caused by accessing data from multiple data sources simultaneously. However, many other kinds of variation in databases arise in practice, and different kinds of variation often interact, but these scenarios are not well-supported by the existing work. For example, neither the schema evolution systems nor the database integration systems can address variation that arises when data sources combined in one database evolve over time.

    This thesis collects a large amount of work: It defines the variational database framework and the syntax and denotational semantics of the variational relational algebra, a query language for variational databases. It presents two use cases of the variational database framework that are based on existing datasets and scenarios that are partially supported by existing techniques. It presents the variational database management system which is a prototype of variational databases and variational relational algebra as an abstract layer written in Haskell on top of a traditional RDBMS. It also presents several theoretical results related to the framework and the query language, such as syntax-based equivalence rules that preserve the semantics of a query, a type system for ensuring that a variational query is well-formed with respect to the underlying variational schema, and a confluence property of the variational relational algebra type system with respect to the relational algebra type system and its denotational semantics.

  2. Variational Satisfiability Solving
    Jeffrey M. Young
    PhD dissertation, Oregon State University, 2021
    [Abstract, PDF]

    Over the last two decades, satisfiability and satisfiability-modulo theory (SAT/SMT) solvers have grown powerful enough to be general purpose reasoning engines throughout software engineering and computer science. However, most practical use cases of SAT/SMT solvers require not just solving a single SAT/SMT problem, but solving sets of related SAT/SMT problems. This discrepancy was directly addressed by the SAT/SMT community with the invention of incremental SAT/SMT solving. However, incremental SAT/SMT solvers require users to hand write a program which dictates the terms that are shared between problems and terms which are unique. By placing the onus on users, incremental solvers couple the users’ solution to the users’ exact sequence of SAT/SMT problems–making the solution overly specific–and require the user to write extra infrastructure to coordinate or handle the results.

    This dissertation argues that the aforementioned problems result from accidental complexity produced by solving a problem that is variational without the concept of variation, similar to problematic use of GOTO statements in the absence of WHILE loop constructs. To demonstrate the argument, this thesis applies theory from variational programming to the domain of SAT/SMT solvers to create the first variational SAT solver. To do so, the thesis formalizes a variational propositional logic and specifies variational SAT solving as a compiler that compiles variational SAT problems to instructions for an industrial strength SAT solver.

  3. Formal Verification of the Variational Database Management System
    Fariba Khan
    MS thesis, Oregon State University, 2021
    [Abstract, PDF]

    Variation in data is abundant and ubiquitous in real-world applications. Managing variation in databases is, however, difficult and has been extensively studied by the database community. Schema evolution, data integration, and database versioning are examples of well-studied forms of database variation with effective context-specific solutions. However, variation appears in different forms and contexts in databases, and existing approaches cannot be generalized to handle arbitrary forms of variation irrespective of the context. Moreover, in practice, different forms of variation intersect in a particular context. Variational databases (VDB) provide a fundamental solution to variation management by explicitly encoding variation into relational databases that allows addressing different kinds of variation simultaneously. To support expressing variation in information need, traditional relational algebra (RA) is extended to variational relational algebra (VRA). VRA comes with a static type system that checks the validity of variational queries written in VRA. This thesis extends the formalization and formally verifies properties of the variational database management system (VDBMS). Variational sets and set operations definitions are formally verified and VDBs are formally encoded using them. Then, the correctness of the VRA type system with respect to the RA type system is formally specified and verified. VDBMS also allows writing variational queries without repeating variations that are already encoded in the VDB and sub-queries. These implicitly annotated v-queries get explicitly annotated by the system. Therefore, this thesis further formally verifies the process of explicitly annotating variational queries.

  4. Integrating Side Effects in Variational Programs Using Algebraic Effects
    Ghadeer Alkubaish
    MS thesis, Oregon State University, 2020
    [Abstract, PDF]

    Variational programming supports efficiently executing many related programs at once by encoding all of the programs in one “variational program” that captures the differences among them statically and explicitly. An open problem in variational programming is how to handle side effects–if two program variants perform different side effects, we cannot separate the effect of one variant from the other since the outside world is not variational. A potential solution is to create variation-aware execution environments for variational programs, for example, a variational file system that keeps track of file variants corresponding to program variants. However, it is infeasible to do this for all kinds of effects. Also, there are different ways to handle the interaction of effects and variation that are incompatible with each other, preventing a one-size-fits-all solution. In this thesis, we argue that algebraic effects can be used to resolve the problem of combining variation and effects by enabling programmers to flexibly and incrementally extend a variational programming environment to handle new kinds of effects. We present a proof-of-concept prototype in the Eff programming language that demonstrates how a variational programming environment can be extended to support file input/output. Crucially, such extensions are done at the library level, which enables handling new kinds of effects and handling existing effects in multiple ways, both of which are essential in variational programming applications.

  5. Application of the Variational Database Management System to Schema Evolution and Software Product Lines
    Qiaoran Li
    MS project, Oregon State University, 2019
    [Abstract, PDF]

    As a general solution to the problem of managing structural and content variability in relational databases, in previous work we have introduced the Variational Database Management System (VDBMS). VDBMS consists of a representation of a variational database (VDB) and a corresponding typed query language (v-query). However, since this is a novel database representation, there are no existing instances of VDBs or v-queries that can be used to evaluate the VDBMS. In this project, we present two case studies to demonstrate the use of VDBMS and support its evaluation. The case studies were constructed by systematically encoding variability scenarios from prior work and generating corresponding VDBs by adapting existing widely-used data sets. The first case study shows how to use the VDBMS to manage database variants under a schema evolution scenario. The second case study demonstrates how to integrate the VDBMS with a database-backed software product line. Each case study provides a VDB and a set of v-queries that will be used to evaluate the VDBMS. Additionally, we provide some insights into generating VDBs from relational databases that could assist future VDBMS users.

  6. Imperative Programming with Variational Effects
    Alexander Grasley
    MS thesis, Oregon State University, 2018
    [Abstract, PDF]

    Variation is a commonly encountered element of modern software systems. Recent research into variation has led to increasing interest in the development of variational programming languages and corresponding variational models of execution. Variational imperative languages pose a particular challenge due to the complex interactions between side effects and variation. The development of interpreters for variational imperative languages has produced a number of techniques to address these interactions. This thesis builds upon and formalizes these techniques by defining a formal operational semantics for a simple imperative language that supports both variation and common side effects. We also provide an example of the successful implementation of these techniques in the form of the Resource DSL. One area in which variation is frequently encountered is in defining configurations and resource requirements for the deployment of modern software systems. To this end, we have developed the Resource DSL, a language that aids in the specification of resource requirements for highly configurable software systems.

  7. The Ownership Monad
    Michael G. McGirr
    MS project, Oregon State University, 2018
    [Abstract, PDF]

    The Rust programming language is a systems programming language with a strong static type system. A central feature of Rust’s type system is its unique concept of “ownership”, which enables Rust to give a user safe, low-level control over resources without the overhead of garbage collection. In Haskell, most data is immutable and many of the problems addressed by ownership in Rust simply don’t arise. However, ownership is still useful in Haskell in the context of mutable references and concurrency. This project report introduces the Ownership Monad, a monad that implements aspects of Rust’s ownership system as a library in Haskell. This will demonstrate many of the rules enforced by the ownership system in Rust in order to better understand the ownership-based method of tracking resources. This report will further explore the benefits such a system can provide for tracking resource use between concurrent threads.

  8. Formative Work Toward a Mixed-Initiative Programming Language
    Keeley Abbott
    MS thesis, Oregon State University, 2017
    [Abstract, PDF]

    Mixed-initiative programming entails collaboration between a computer system, and a human to achieve some desired goal or set of goals. Often these goals change or are amended in real time during the course of program execution. As such, the plans these programs are based on must adapt and evolve to accommodate this iterative process. This thesis collects a literature review of research done in the field of mixed-initiative programs that provides an understanding of the problems faced when attempting to integrate computer systems and human users, a previously published paper with formative work in understanding how humans write programs for other humans, and finally some initial work done to develop an embedded domain-specific language for mixed-initiative drone control.

  9. Principles of Variational Databases
    Parisa Ataei
    MS thesis, Oregon State University, 2017
    [Abstract, PDF]

    Data variations are prevalent in real-world applications. For example, software vendors have to handle numerous variations in the business requirements, conventions, and environmental settings of a software product. In database-backed software, the database of each version may have a different schema and content. As another example, data scientists often need to use a subset of the available databases because using non-relevant information may reduce the effectiveness of the results. Such variations give rise to numerous data variants in these applications. Users often would like to query and/or analyze all such variants simultaneously. For example, a software vendor would like to perform common tests over all versions of its product and a data scientist would like to find the subset of information over which the analytics algorithm delivers the most accurate results. Currently, there is not any systematic and principled approach to managing and querying data variations and users have to use their intuition to perform such analyses. We propose a novel abstraction called a variational database that provides a compact and structured representation of general forms of data variations for relational databases. As opposed to data integration approaches that provide a unified representation of all data sources, variational databases make variations explicit in both the schema definition and the query language without introducing too much complexity.

  10. Implementation Techniques for Variational Data Structures
    Meng Meng
    MS thesis, Oregon State University, 2017
    [Abstract, PDF]

    Many applications require not only representing variability in software and data, but also computing with it. To do so efficiently requires variational data structures that make variability explicit in the underlying data and the operations used to manipulate it. Variational data structures have been developed ad hoc for many applications, but there is little general understanding of how to design them or what tradeoffs exist among them.

    In this thesis, we introduce the concept of holes to represent variational data structures of different sizes and shapes. Moreover, we strive for a more systematic exploration and analysis of a variational data structure. We want to know how different design decisions affect the performance and scalability of a variational data structure, and what properties of the underlying data and operation sequences need to be considered.

    Specifically, we study several alternative designs of a variational stack and analyze how these design decisions affect the performance of a variational stack with different usage profiles. We evaluate variational stacks in a real-world scenario: in the interpreter VarexJ when executing real software containing variability. Finally, we discuss different ways of representing variational priority queues and show how this affects the performance of the variational Dijkstra’s algorithm.

  11. View-Based Editing of Variational Code
    Miles Van de Wetering
    Honors BS thesis, Oregon State University, 2017
    [Abstract, PDF]

    This paper discusses the merits of providing users variational views when editing variational code. I provide a plugin for the popular Atom Integrated Development Environment (IDE) which replaces #ifdef annotations commonly used by the C PreProcessor (CPP) with colored backgrounds, thus reducing code clutter and attempting to help programmers quickly distinguish code that belongs to different features. I also provide a number of helpful features designed to help the programmer create, remove, and refactor feature code. Finally, I present a user study conducted in order to determine how helpful each of the two main features (code folding and background color) are to programmers – it was determined that while there were no significant differences in efficiency or accuracy, the user experience was considerably enhanced.

  12. A Template CoprHD Storage Driver Based on the Southbound SDK
    Shujin Wu
    MS project, Oregon State University, 2017
    [Abstract, PDF]

    CoprHD is an open source software-defined storage and API platform which creates an abstraction layer over multi-vendor heterogeneous storage systems. It offers the ability to discover, pool and automate the management of the storage ecosystem with the help of storage drivers establishing connections between CoprHD and storage systems. On the demand of attracting more attentions from third-party storage companies, CoprHD community proposed a southbound driver SDK to simplify the process of developing a storage driver for CoprHD. ScaleIO storage driver, being the first one based on this southbound SDK, is implemented by us with Intel and EMC to serve the purposes to verify the southbound SDK and explore an effective way for the third-party driver development. This ScaleIO storage driver also acts as a template driver for the CoprHD community.

  13. A Formal Foundation for Variational Programming Using the Choice Calculus
    Spencer Hubbard
    MS thesis, Oregon State University, 2016
    [Abstract, PDF]

    In this thesis, we present semantic equivalence rules for an extension of the choice calculus and sound operations for an implementation of variational lists. The choice calculus is a calculus for describing variation and the formula choice calculus is an extension with formulas. We prove semantic equivalence rules for the formula choice calculus. Variational lists are functional data structures for representing and computing with variation in lists using the choice calculus. We prove map and bind operations are sound for an implementation of variational lists. These proofs are written and verified in the language of the Coq proof assistant.