Abstracts


The State of the Art in End-User Software Engineering

A. J. Ko, R. Abraham, L. Beckwith, A. Blackwell, M. M. Burnett, M. Erwig, J. Lawrence, C. Scaffidi, H. Lieberman, B. Myers, M. B. Rosson, G. Rothermel, M. Shaw, and S. Wiedenbeck. ACM Computing Surveys, 2010, to appear

Most programs today are written not by professional software developers, but by people with expertise in other domains working towards goals supported by computation. For example, a teacher might write a grading spreadsheet to save time grading or an interaction designer might use an interface builder to test some user interface design ideas. Although these end-user programmers may not have the same goals as professional developers, they do face many of the same software engineering challenges, including requirements gathering, de-sign, specification, reuse, testing, and debugging. This article summarizes and classifies research on these activities, defining the area of End-User Software Engineering (EUSE) and related terminology. The article then discusses empirical research about end-user software engineering activities and the technologies designed to support them. The article also addresses challenges in de-signing EUSE tools, including the power of surprise in affecting tool use and the influence of gender.

Paper.pdf (2.1M)


Declarative Scripting in Haskell

Tim Bauer and Martin Erwig. Software Language Engineering, 2009, to appear

We present a domain-specific language embedded within the Haskell programming language to build scripts in a declarative and type- safe manner. We can categorize script components into various orthog- onal dimensions, or concerns, such as IO interaction, configuration, or error handling. In particular, we provide special support for two dimen- sions that are often neglected in scripting languages, namely creating deadlines for computations and tagging and tracing of computations. Arbitrary computations may be annotated with a textual tag explaining its purpose. Upon failure a detailed context for that error is automat- ically produced. The deadline combinator allows one to set a timeout on an operation. If it fails to complete within that amount of time, the computation is aborted. Moreover, this combinator works with the tag combinator so as to produce a contextual trace.

Paper.pdf (164K)


A Domain-Specific Language for Experimental Game Theory

Eric Walkingshaw and Martin Erwig. Journal of Functional Programming Vol. 19, No. 6, 645-661, 2009

Experimental game theory refers to the use of game theoretic models in simulations and experiments to understand strategic behavior. It is an increasingly important research tool in many fields, including economics, biology and many social sciences, but computer support for such projects is primarily found in custom programs written in general purpose languages.

Here we present Hagl (short for "Haskell game language", and loosely intended to evoke the homophone "haggle"), a domain-specific language embedded in Haskell, intended to drastically reduce the development time of such experiments, and make the definition and exploration of games and strategies simple and fun.

Paper.pdf (135K)


Software Engineering for Spreadsheets

Martin Erwig. IEEE Software Vol. 29, No. 5, 25-30, 2009

Spreadsheets are popular end-user programming tools. Many people use spreadsheet-computed values to make critical decisions, so spreadsheets must be correct. Proven software engineering principles can assist the construction and maintenance of dependable spreadsheets. However, how can we make this practical for end users? One way is to exploit spreadsheets' idiosyncratic structure to translate software engineering concepts such as type checking and debugging to an end-user programming domain. The simplified computational model and the spatial embedding of formulas, which provides rich contextual information, can simplify these concepts, leading to effective tools for end users.

Paper.pdf (315K)


Automatic Detection of Dimension Errors in Spreadsheets

Chris Chambers and Martin Erwig. Journal of Visual Languages and Computing Vol. 20, No. 4, 269-283, 2009

Spreadsheets are widely used, and studies have shown that most end-user spreadsheets contain non-trivial errors. Most of the currently available tools that try to mitigate this problem require varying levels of user intervention. This paper presents a system, called UCheck, that detects errors in spreadsheets automatically. UCheck carries out automatic header and unit inference, and reports unit errors to the users. UCheck is based on two static analyses phases that infer header and unit information for all cells in a spreadsheet.

We have tested UCheck on a wide variety of spreadsheets and found that it works accurately and reliably. The system was also used in a continuing education course for high school teachers, conducted through Oregon State University, aimed at making the participants aware of the need for quality control in the creation of spreadsheets.

Paper.pdf (548K)


Combining Spatial and Semantic Label Analysis

Chris Chambers and Martin Erwig. IEEE Symp. on Visual Languages and Human-Centric Computing (VL/HCC'09), 225-232, 2009

Labels in spreadsheets can be exploited for finding errors in spreadsheet formulas. Previous approaches have either used the positional information of labels or their interpretation as dimension for checking the consistency of formulas. In this paper we demonstrate how these two approaches can be combined. We have formalized a combined reasoning system and have implemented a corresponding prototype system. We have evaluated the system on the EUSES spreadsheet corpus. The evaluation has demonstrated that adding a syntactic, spatial analysis to a dimension inference can significantly improve the rate of detected errors.

Paper.pdf (1.2M)


Visual Explanations of Probabilistic Reasoning

Martin Erwig and Eric Walkingshaw. IEEE Symp. on Visual Languages and Human-Centric Computing (VL/HCC'09), 23-27, 2009

Continuing our research in explanation-oriented language design, we present a domain-specific visual language for explaining probabilistic reasoning. Programs in this language, called explanation objects, can be manipulated according to a set of laws to automatically generate many equivalent explanation instances. We argue that this increases the explanatory power of our language by allowing a user to view a problem from many different perspectives.

Paper.pdf (272K)


Varying Domain Representations in Hagl - Extending the Expressiveness of a DSL for Experimental Game Theory

Eric Walkingshaw and Martin Erwig. IFIP Working Conference on Domain Specific Languages (DSLWC'09), LNCS 5658, 310-334, 2009

Experimental game theory is an increasingly important research tool in many fields, providing insight into strategic behavior through simulation and experimentation on game theoretic models. Unfortunately, despite relying heavily on automation, this approach has not been well supported by tools. Here we present our continuing work on Hagl, a domain-specific language embedded in Haskell, intended to drastically reduce the development time of such experiments and support a highly explorative research style. In this paper we present a fundamental redesign of the underlying game representation in Hagl. These changes allow us to better leverage domain knowledge by allowing different classes of games to be represented differently, exploiting existing domain representations and algorithms. In particular, we show how this supports analytical extensions to Hagl, and makes strategies for state-based games vastly simpler and more efficient.

Paper.pdf (212K)


A DSL for Explaining Probabilistic Reasoning

Martin Erwig and Eric Walkingshaw. IFIP Working Conference on Domain Specific Languages (DSLWC'09), LNCS 5658, 335-359, 2009
Best Paper Award

We propose a new focus in language design where languages provide constructs that not only describe the computation of results, but also produce explanations of how and why those results were obtained. We posit that if users are to understand computations produced by a language, that language should provide explanations to the user. As an example of such an explanation-oriented language we present a domain-specific language for explaining probabilistic reasoning, a domain that is not well understood by non-experts. We show the design of the DSL in several steps. Based on a story-telling metaphor of explanations, we identify generic constructs for building stories out of events, and obtaining explanations by applying stories to specific examples. These generic constructs are then adapted to the particular explanation domain of probabilistic reasoning. Finally, we develop a visual notation for explaining probabilistic reasoning.

Paper.pdf (592K)


A Formal Representation of Software-Hardware System Design

Eric Walkingshaw, Paul Strauss, Martin Erwig, Jonathan Mueller, Irem Tumer ASME Int. Design Engineering Technical Conference & Computers and Information in Engineering Conference (IDETC'09), 2009

The design of hardware-software systems is a complex and difficult task exacerbated by the very different tools used by designers in each field. Even in small projects, tracking the impact, motivation and context of individual design decisions between designers and over time quickly becomes intractable. In an attempt to bridge this gap, we present a general, low-level model of the system design process. We formally define the concept of a design decision, and provide a hierarchical representation of both the design space and the context in which decisions are made. This model can serve as a foundation for software-hardware system design tools which will help designers cooperate more efficiently and effectively. We provide a high-level example of the use of such a system in a design problem provided through collaboration with NASA.

Paper.pdf (380K)


Mutation Operators for Spreadsheets

Robin Abraham and Martin Erwig. IEEE Transactions on Software Engineering, Vol. 35, No. 1, 94-108, 2009

Based on (1) research into mutation testing for general purpose programming languages, and (2) spreadsheet errors that have been reported in literature, we have developed a suite of mutation operators for spreadsheets. We present an evaluation of the mutation adequacy of du-adequate test suites generated by a constraint-based automatic test-case generation system we have developed in previous work. The results of the evaluation suggest additional constraints that can be incorporated into the system to target mutation adequacy.

In addition to being useful in mutation testing of spreadsheets, the operators can be used in the evaluation of error-detection tools and also for seeding spreadsheets with errors for empirical studies. We describe two case studies where the suite of mutation operators helped us carry out such empirical evaluations. The main contribution of this paper is the suite of mutation operators for spreadsheets that can (1) help with systematic mutation testing of spreadsheets, and (2) be used for carrying out empirical evaluations of spreadsheet tools.

Paper.pdf (2.4M)


Spreadsheet Programming

Robin Abraham and Margaret M. Burnett and Martin Erwig. Encyclopedia of Computer Science and Engineering, (ed. B.J. Wah), 2804-2810, 2009

Spreadsheets are among the most widely used programming systems in the world. Individuals and businesses use spreadsheets for a wide variety of applications, ranging from performing simple calculations to building complex financial models. In this article, we first discuss how spreadsheet programs are actually functional programs. We then describe concepts in spreadsheet programming, followed by a brief history of spreadsheet systems. Widespread use of spreadsheets, coupled with their high error-proneness and the impact of spreadsheet errors, has motivated research into techniques aimed at the prevention, detection, and correction of errors in spreadsheets. We present an overview of research effort that seeks to rectify this problem.

Paper.pdf (660K)


Test-Driven Goal-Directed Debugging in Spreadsheets

Robin Abraham and Martin Erwig. IEEE Symp. on Visual Languages and Human-Centric Computing (VL/HCC'08), 131-138, 2008

We present an error-detection and -correction approach for spreadsheets that automatically generates questions about input/output pairs and, depending on the feedback given by the user, proposes changes to the spreadsheet that would correct detected errors. This approach combines and integrates previous work on automatic test-case generation and goal-directed debugging. We have implemented this method as an extension to MS Excel. We carried out an evaluation of the system using spreadsheets seeded with faults using mutation operators. The evaluation shows among other things that up to 93% of the first-order mutants and 98% of the second-order mutants were detected by the system using the automatically generated test cases.

Paper.pdf (780K)


Dimension Inference in Spreadsheets

Chris Chambers and Martin Erwig. IEEE Symp. on Visual Languages and Human-Centric Computing (VL/HCC'08), 123-130, 2008
Best Paper Award

We present a reasoning system for inferring dimension information in spreadsheets. This system can be used to check the consistency of spreadsheet formulas and can be employed to detect errors in spreadsheets. We have prototypically implemented the system as an add-in to Excel. In an evaluation of this implementation we were able to detect dimension errors in almost 50% of the investigated spreadsheets, which shows (i) that the system works reliably in practice and (ii) that dimension information can be well exploited to uncover errors in spreadsheets.

Paper.pdf (258K)


A Visual Language for Representing and Explaining Strategies in Game Theory

Martin Erwig and Eric Walkingshaw. IEEE Symp. on Visual Languages and Human-Centric Computing (VL/HCC'08), 101-108, 2008

We present a visual language for strategies in game theory, which has potential applications in economics, social sciences, and in general science education. This language facilitates explanations of strategies by visually representing the interaction of players' strategies with game execution. We have utilized the cognitive dimensions framework in the design phase and recognized the need for a new cognitive dimension of "traceability" that considers how well a language can represent the execution of a program. We consider how traceability interacts with other cognitive dimensions and demonstrate its use in analyzing existing languages. We conclude that the design of a visual representation for execution traces should be an integral part of the design of visual languages because understanding a program is often tightly coupled to its execution.

Paper.pdf (399K)


The Inverse Ocean Modeling System. Part I: Implementation

Andrew F. Bennett, Boon S. Chua, Martin Erwig, Zhe Fu, Rich D. Loft, Julia C. Muccino, and Ben Pflaum
Journal of Atmospheric and Oceanic Technology, Vol. 25, Issue 9, 1608-1622, 2008

The Inverse Ocean Modeling (IOM) system constructs and runs weak-constraint, four-dimensional variational data assimilation (W4DVAR) for any dynamical model and any observing array. The dynamics and the observing algorithms may be nonlinear but must be functionally smooth. The user need only provide the model and the observing algorithms, together with an interpolation scheme that relates the model numerics to the observer's coordinates. All other model-dependent elements of the Inverse Ocean Modeling assimilation algorithm, including adjoint generators and Monte Carlo estimates of posteriors, have been derived and coded as templates in Parametric FORTRAN. This language has been developed for the IOM but has wider application in scientific programming. Guided by the Parametric FORTRAN templates, and by model information entered via a graphical user interface (GUI), the IOM generates conventional FORTRAN code for each of the many algorithm elements, customized to the user's model. The IOM also runs the various W4DVAR assimilations, which are monitored by the GUI. The system is supported by a Web site that includes interactive tutorials for the assimilation algorithm.

Paper.pdf


A Type System Based on End-User Vocabulary

Robin Abraham, Martin Erwig, and Scott Andrew. IEEE Symp. on Visual Languages and Human-Centric Computing (VL/HCC'07), 215-222, 2007

In previous work we have developed a system that automatically checks for unit errors in spreadsheets. In this paper we describe our experiences using the system in a workshop on spreadsheet safety aimed at high school teachers and students. We present the results from a think-aloud study we conducted with five high school teachers and one high school student as the subjects. The study is the first ever to investigate the usability of a type system in spreadsheets. We discovered that end users can effectively use the system to debug a variety of errors in their spreadsheets. This result is encouraging given that type systems are difficult even for programmers. The subjects had difficulty debugging "non-local" unit errors. Guided by the results of the study we devised new methods to improve the error-location inference. We also extended the system to generate change suggestions for cells with unit errors, which when applied, would correct unit errors. These extensions solved the problem that the study revealed in the original system.

Paper.pdf (680K)


ClassSheets - Model-Based, Object-Oriented Design of Spreadsheet Applications

Jan-Christopher Bals, Fabian Christ, Gregor Engels, and Martin Erwig. Journal of Object Technologies Vol. 6, No. 9, 2007

Using spreadsheets is the preferred method to calculate, display or store anything that fits into a table-like structure. They are often used by end users to create applications. But they have one critical drawback - they are very error-prone. To reduce the error-proneness, we purpose a new way of object-oriented modeling of spreadsheets prior to using them. These spreadsheet models, termed ClassSheets, are used to generate concrete spreadsheets on the instance level. By this approach sources of errors are reduced and spreadsheet applications are easier to understand.

Paper.pdf


Exploiting Domain-Specific Structures For End-User Programming Support Tools

Robin Abraham and Martin Erwig. End-User Software Engineering (eds. Burnett, Engels, Myers, and Rothermel), 2007

In previous work we have tried to transfer ideas that have been successful in general-purpose programming languages and mainstream software engineering into the realm of spreadsheets, which is one important example of an end-user programming environment. More specifically, we have addressed the questions of how to employ the concepts of type checking, program generation and maintenance, and testing in spreadsheets. While the primary objective of our work has been to offer improvements for end-user productivity, we have tried to follow two particular principles to guide our research.

In this paper we will illustrate our research approach with several examples.

Paper.pdf


An Update Calculus for Expressing Type-Safe Program Updates

Martin Erwig and Deling Ren. Science of Computer Programming Vol. 67, No. 2-3, 199-222, 2007

The dominant share of software development costs is spent on software maintenance, particularly the process of updating programs in response to changing requirements. Currently, such program changes tend to be performed using text editors, an unreliable method that often causes many errors. In addition to syntax and type errors, logical errors can be easily introduced since text editors cannot guarantee that changes are performed consistently over the whole program. All these errors can cause a correct and perfectly running program to become instantly unusable. It is not surprising that this situation exists because the ``text-editor method'' reveals a low-level view of programs that fails to reflect the structure of programs.

We address this problem by pursuing a programming-language-based approach to program updates. To this end we discuss in this paper the design and requirements of an update language for expressing update programs. We identify as the essential part of any update language a \emph{scope update} that performs coordinated update of the definition and all uses of a symbol. As the underlying basis for update languages, we define an update calculus for updating lambda-calculus programs. We develop a type system for the update calculus that infers the possible type changes that can be caused by an update program. We demonstrate that type-safe update programs that fulfill certain structural constraints preserve the type-correctness of lambda terms. The update calculus can serve as a basis for higher-level update languages, such as for Haskell or Java.

Paper.pdf (384K)


Parametric Fortran: Program Generation in Scientific Computing

Martin Erwig and Zhe Fu and Ben Pflaum Journal of Software Maintenance and Evolution Vol. 19, No. 3, 155-182, 2007

Parametric Fortran is an extension of Fortran that supports defining Fortran program templates by allowing the parameterization of arbitrary Fortran constructs. A Fortran program template can be translated into a regular Fortran program guided by values for the parameters. This paper describes the design, implementation, and applications of Parametric Fortran. Parametric Fortran is particularly useful in scientific computing. The applications include defining generic functions, removing duplicated code, and automatic differentiation. The described techniques have been successfully employed in a project that implements a generic inverse ocean modeling system.

Paper.pdf (415K)


GoalDebug: A Spreadsheet Debugger for End Users

Robin Abraham and Martin Erwig. 29th ACM/IEEE Int. Conf. on Software Engineering (ICSE'07), 251-260, 2007

We present a spreadsheet debugger targeted at end users. Whenever the computed output of a cell is incorrect, the user can supply an expected value for a cell, which is employed by the system to generate a list of change suggestions for formulas that, when applied, would result in the user-specified output. The change suggestions are ranked using a set of heuristics.

In previous work, we had presented the system as a proof of concept. In this paper, we describe a systematic evaluation of the effectiveness of inferred change suggestions and the employed ranking heuristics. Based on the results of the evaluation we have extended both, the change inference process and the ranking of suggestions. An evaluation of the improved system shows that change inference process and the ranking heuristics have both been substantially improved and that the system performs effectively.

Paper.pdf (282K)


UCheck: A Spreadsheet Unit Checker for End Users

Robin Abraham and Martin Erwig. Journal of Visual Languages and Computing Vol. 18, No. 1, 71-95, 2007

Spreadsheets are widely used, and studies have shown that most end-user spreadsheets contain non-trivial errors. Most of the currently available tools that try to mitigate this problem require varying levels of user intervention. This paper presents a system, called UCheck, that detects errors in spreadsheets automatically. UCheck carries out automatic header and unit inference, and reports unit errors to the users. UCheck is based on two static analyses phases that infer header and unit information for all cells in a spreadsheet.

We have tested UCheck on a wide variety of spreadsheets and found that it works accurately and reliably. The system was also used in a continuing education course for high school teachers, conducted through Oregon State University, aimed at making the participants aware of the need for quality control in the creation of spreadsheets.

Paper.pdf (2M)


A Generic Recursion Toolbox for Haskell (Or: Scrap Your Boilerplate Systematically)

Deling Ren and Martin Erwig. ACM SIGPLAN Haskell Workshop, 13-24, 2006

Haskell programmers who deal with complex data types often need to apply functions to specific nodes deeply nested inside of terms. Typically, implementations for those applications require so-called boilerplate code, which recursively visits the nodes and carries the functions to the places where they need to be applied. The scrap-your-boilerplate approach proposed by Lämmel and Peyton Jones tries to solve this problem by defining a general traversal design pattern that performs the traversal automatically so that the programmers can focus on the code that performs the actual transformation.

In practice we often encounter applications that require variations of the recursion schema and call for more sophisticated generic traversals. Defining such traversals from scratch requires a profound understanding of the underlying mechanism and is everything but trivial.

In this paper we analyze the problem domain of recursive traversal strategies, by integrating and extending previous approaches. We then extend the scrap-your-boilerplate approach by rich traversal strategies and by a combination of transformations and accumulations, which leads to a comprehensive recursive traversal library in a statically typed framework.

We define a two-layer library targeted at general programmers and programmers with knowledge in traversal strategies. The high-level interface defines a universal combinator that can be customized to different one-pass traversal strategies with different coverage and different traversal order. The lower-layer interface provides a set of primitives that can be used for defining more sophisticated traversal strategies such as fixpoint traversals. The interface is simple and succinct. Like the original scrap-your-boilerplate approach, it makes use of rank-2 polymorphism and functional dependencies, implemented in GHC.

Paper.pdf (196K)


AutoTest: A Tool for Automatic Test Case Generation in Spreadsheets

Robin Abraham and Martin Erwig. IEEE Symp. on Visual Languages and Human-Centric Computing (VL/HCC'06), 43-50, 2006

In this paper we present a system that helps users test their spreadsheets using automatically generated test cases. The system generates the test cases by backward propagation and solution of constraints on cell values. These constraints are obtained from the formula of the cell that is being tested when we try to execute all feasible du-paths within the formula. AutoTest generates test cases that execute all feasible du-pairs. If infeasible du-associations are present in the spreadsheet, the system is capable of detecting and reporting all of these to the user. We also present a comparative evaluation of our approach against the "Help Me Test" mechanism in Forms/3 and show that our approach is faster and produces test suites that give better du-coverage.

Paper.pdf (522K)


Sharing Reasoning about Faults in Spreadsheets: An Empirical Study

Joey Lawrence, Robin Abraham, Margaret Burnett, and Martin Erwig. IEEE Symp. on Visual Languages and Human-Centric Computing (VL/HCC'06), 35-42, 2006

Although researchers have developed several ways to reason about the location of faults in spreadsheets, no single form of reasoning is without limitations. Multiple types of errors can appear in spreadsheets, and various fault localization techniques differ in the kinds of errors that they are effective in locating. In this paper, we report empirical results from an emerging system that attempts to improve fault localization for end-user programmers by sharing the results of the reasoning systems found in WYSIWYT and UCheck. By evaluating the visual feedback from each fault localization system, we shed light on where these different forms of reasoning and combinations of them complement - and contradict - one another, and which heuristics can be used to generate the best advice from a combination of these systems.

Paper.pdf (934K)


Type Inference for Spreadsheets

Robin Abraham and Martin Erwig. ACM Int. Symp. on Principles and Practice of Declarative Programming (PPDP'06), 73-84, 2006

Spreadsheets are the most popular programming systems in use today. Since spreadsheets are visual, first-order functional languages, research into the foundations of spreadsheets is therefore a highly relevant topic for the principles and, in particular, the practice, of declarative programming. Since the error rate in spreadsheets is very high and since those errors have significant impact, methods and tools that can help detect and remove errors from spreadsheets are very much needed. Type systems have traditionally played a strong role in detecting errors in programming languages, and it is therefore reasonable to ask whether type systems could not be helpful in improving the current situation of spreadsheet programming. In this paper we introduce a type system and a type inference algorithm for spreadsheets and demonstrate how this algorithm and the underlying typing concept can identify programming errors in spreadsheets. In addition, we also demonstrate how the type inference algorithm can be employed to infer models, or specifications, for spreadsheets, which can be used to prevent future errors in spreadsheets.

Paper.pdf (1M)


Inferring Templates from Spreadsheets

Robin Abraham and Martin Erwig. 28th ACM/IEEE Int. Conf. on Software Engineering (ICSE'06), 182-191, 2006

We present a study investigating the performance of a system for automatically inferring spreadsheet templates. These templates allow users to safely edit spreadsheets, that is, certain kinds of errors such as range, reference, and type errors can be provably prevented. Since the inference of templates is inherently ambiguous, such a study is required to demonstrate the effectiveness of any such automatic system. The study results show that the system considered performs significantly better than subjects with intermediate to expert level programming expertise. These results are important because the translation of existing spreadsheets into a system based on safety-guaranteeing templates cannot be performed without automatic support. We carried out post-hoc analyses of the video recordings of the subjects' interactions with the spreadsheets and found that expert-level subjects spend less time and inspect fewer cells in the spreadsheet and develop more accurate templates than less experienced subjects.

Paper.pdf (1M)


Toward Sharing Reasoning to Improve Fault Localization in Spreadsheets

Joey Lawrence, Robin Abraham, Margaret Burnett, and Martin Erwig. 2nd Workshop on End-User Software Engineering (WEUSE'06), 35-42, 2006

Although researchers have developed several ways to reason about the location of faults in spreadsheets, no single form of reasoning is without limitations. Multiple types of errors can appear in spreadsheets, and various fault localization techniques differ in the kinds of errors that they are effective in locating. Because end users who debug spreadsheets consistently follow the advice of fault localization systems, it is important to ensure that fault localization feedback corresponds as closely as possible to where the faults actually appear.

In this paper, we describe an emerging system that attempts to improve fault localization for end-user programmers by sharing the results of the reasoning systems found in WYSIWYT and UCheck. By understanding the strengths and weaknesses of the reasoning found in each system, we expect to identify where different forms of reasoning complement one another, when different forms of reasoning contradict one another, and which heuristics can be used to select the best advice from each system. By using multiple forms of reasoning in conjunction with heuristics to choose among recommendations from each system, we expect to produce unified fault localization feedback whose combination is better than the sum of the parts.

Paper.pdf (524K)


Gencel: A Program Generator for Correct Spreadsheets

Martin Erwig, Robin Abraham, Irene Cooperstein, and Steve Kollmansberger. Journal of Functional Programming Vol. 16, No. 3, 293-325, 2006

A huge discrepancy between theory and practice exists in one popular application area of functional programming-spreadsheets. Although spreadsheets are the most frequently used (functional) programs, they fall short of the quality level that is expected of functional programs, which is evidenced by the fact that existing spreadsheets contain many errors, some of which have serious impacts.

We have developed a template specification language that allows the definition of spreadsheet templates that describe possible spreadsheet evolutions. This language is based on a table calculus that formally captures the process of creating and modifying spreadsheets. We have developed a type system for this calculus that can prevent type, reference, and omission errors from occurring in spreadsheets. On the basis of the table calculus we have developed Gencel, a system for generating reliable spreadsheets. We have implemented a prototype version of Gencel as an extension of Excel.

Paper.pdf (693K)


Visual Type Inference

Martin Erwig. Journal of Visual Languages and Computing Vol. 17, No. 2, 161-186, 2006

We describe a type-inference algorithm that is based on labeling nodes with type information in a graph that represents type constraints. This algorithm produces the same results as the famous algorithm of Milner, but is much simpler to use, which is of importance especially for teaching type systems and type inference.

The proposed algorithm employs a more concise notation and yields inferences that are shorter than applications of the traditional algorithm. Simplifications result, in particular, from three facts: (1) We do not have to maintain an explicit type environment throughout the algorithm because the type environment is represented implicitly through node labels. (2) The use of unification is simplified through label propagation along graph edges. (3) The typing decisions in our algorithm are dependency-driven (and not syntax-directed), which reduces notational overhead and bookkeeping.

Paper.pdf (217K)


Modeling Genome Evolution with a DSEL for Probabilistic Programming

Martin Erwig and Steve Kollmansberger. 8th Int. Symp. on Practical Aspects of Declarative Languages (PADL'06), LNCS 3819, 134-149, 2006

Many scientific applications benefit from simulation. However, programming languages used in simulation, such as C++ or Matlab, approach problems from a deterministic procedural view, which seems to differ, in general, from many scientists' mental representation. We apply a domain-specific language for probabilistic programming to the biological field of gene modeling, showing how the mental-model gap may be bridged. Our system assisted biologists in developing a model for genome evolution by separating the concerns of model and simulation and providing implicit probabilistic non-determinism.

Paper.pdf (199K)


Generic Programming in Fortran

Martin Erwig, Zhe Fu and Ben Pflaum. ACM SIGPLAN 2006 Workshop on Partial Evaluation and Program Manipulation (PEPM'06), 130-193, 2006

Parametric Fortran is an extension of Fortran that supports the construction of generic programs by allowing the parameterization of arbitrary Fortran constructs. A parameterized Fortran program can be translated into a regular Fortran program guided by values for the parameters. This paper describes the extensions of Parametric Fortran by two new language features, accessors and list parameters. These features particularly address the code duplication problem, which is a major problem in the context of scientific computing. The described techniques have been successfully employed in a project that implements a generic inverse ocean modeling system.

Paper.pdf (147K)


Probabilistic Functional Programming in Haskell

Martin Erwig and Steve Kollmansberger. Journal of Functional Programming, Vol. 16, No. 1, 21-34, 2006

At the heart of functional programming rests the principle of referential transparency, which in particular means that a function f applied to a value x always yields one and the same value y=f(x). This principle seems to be violated when contemplating the use of functions to describe probabilistic events, such as rolling a die: It is not clear at all what exactly the outcome will be, and neither is it guaranteed that the same value will be produced repeatedly. However, these two seemingly incompatible notions can be reconciled if probabilistic values are encapsulated in a data type.

In this paper, we will demonstrate such an approach by describing a probabilistic functional programming (PFP) library for Haskell. We will show that the proposed approach not only facilitates probabilistic programming in functional languages, but in particular can lead to very concise programs and simulations. In particular, a major advantage of our system is that simulations can be specified independently from their method of execution. That is, we can either fully simulate or randomize any simulation without altering the code which defines it.

Paper.pdf (160K)

Haskell Source Code


ClassSheets: Automatic Generation of Spreadsheet Applications from Object-Oriented Specifications

Gregor Engels and Martin Erwig. 20th IEEE/ACM Int. Conf. on Automated Software Engineering (ASE'05), 124-133, 2005

Spreadsheets are widely used in all kinds of business applications. Numerous studies have shown that they contain many errors that sometimes have dramatic impacts. One reason for this situation is the low-level, cell-oriented development process of spreadsheets. We improve this process by introducing and formalizing a higher-level object-oriented model termed ClassSheet. While still following the tabular look-and-feel of spreadsheets, ClassSheets allow the developer to express explicitly business object structures within a spreadsheet, which is achieved by integrating concepts from the UML (Unified Modeling Language). A stepwise automatic transformation process generates a spreadsheet application that is consistent with the ClassSheet model. Thus, by deploying the formal underpinning of ClassSheets, a large variety of errors can be prevented that occur in many existing spreadsheet applications today. The presented ClassSheet approach links spreadsheet applications to the object-oriented modeling world and advocates an automatic model-driven development process for spreadsheet applications of high quality.

Paper.pdf (607K)


Goal-Directed Debugging of Spreadsheets

Robin Abraham and Martin Erwig. IEEE Symp. on Visual Languages and Human-centric Computing (VL/HCC'05), 37-44, 2005

We present a semi-automatic debugger for spreadsheet systems that is specifically targeted at end-user programmers. Users can report expected values for cells that yield incorrect results. The system then generates change suggestions that could correct the error. Users can interactively explore, apply, refine, or reject these change suggestions. The computation of change suggestions is based on a formal inference system that propagates expected values backwards across formulas. The system is fully integrated into Microsoft Excel and can be used to automatically detect and correct various kinds of errors in spreadsheets. Test results show that the system works accurately and reliably.

Paper.pdf (1M)


Visual Specifications of Correct Spreadsheets

Robin Abraham, Martin Erwig, Steve Kollmansberger, and Ethan Seifert. IEEE Symp. on Visual Languages and Human-Centric Computing (VL/HCC'05), 189-196, 2005

We introduce a visual specification language for spreadsheets that allows the definition of spreadsheet templates. A spreadsheet generator can automatically create Excel spreadsheets from these templates together with customized update operations. It can be shown that spreadsheets created in this way are free from a large class of errors, such as reference, omission, and type errors. We present a formal definition of the visual language for templates and describe the process of generating spreadsheets from templates. In addition, we present an editor for templates and analyze the editor using the Cognitive Dimensions framework.

Paper.pdf (500K)


How to Communicate Unit Error Messages in Spreadsheets

Robin Abraham and Martin Erwig. 1st Workshop on End-User Software Engineering (WEUSE'05), 52-56, 2005

In previous work we have designed and implemented an automatic reasoning system for spreadsheets, called UCheck, that infers unit information for cells in a spreadsheet. Based on this unit information, UCheck can identify cells in the spreadsheet that contain erroneous formulas. However, the information about an erroneous cell is reported to the user currently in a rather crude way by simply coloring the cell, which does not tell anything about the nature of error and thus offers no help to the user as to how to fix it. In this paper we describe an extension of UCheck, called UFix, which improves the error messages reported to the spreadsheet user dramatically. The approach essentially consists of three steps: First, we identify different categories of spreadsheet errors from an end-user's perspective. Second, we map units that indicate erroneous formulas to these error categories. Finally, we create customized error messages from the unit information and the identified error category. In many cases, these error messages also provide suggestions on how to fix the reported errors.

Paper.pdf (1M)


Automatic Generation and Maintenance of Correct Spreadsheets

Martin Erwig, Robin Abraham, Irene Cooperstein, and Steve Kollmansberger. 27th ACM/IEEE Int. Conf. on Software Engineering (ICSE'05), 136-145, 2005

Existing spreadsheet systems allow users to change cells arbitrarily, which is a major source of spreadsheet errors. We propose a system that prevents errors in spreadsheets by restricting spreadsheet updates to only those that are logically and technically correct. The system is based on the concept of table specifications that describe the principal structure of the initial spreadsheet and all of its future versions. We have developed a program generator that translates a table specification into an initial spreadsheet together with customized update operations for changing cells and inserting/deleting rows and columns for this particular specification. We have designed a type system for table specifications that ensures the following form of "spreadsheet maintenance safety": Update operations that are generated from a type-correct table specification are proved to transform the spreadsheet only according to the specification and to never produce any omission, reference, or type errors. Finally, we have developed a prototype as an extension to Excel, which has been shown by a preliminary usability study to be well accepted by end users.

Paper.pdf (235K)


Software Reuse for Scientific Computing Through Program Generation

Martin Erwig and Zhe Fu. ACM Transactions on Software Engineering and Methodology, Vol. 14, No. 2, 168-198, 2005

We present a program-generation approach to address a software-reuse challenge in the area of scientific computing. More specifically, we describe the design of a program generator for the specification of subroutines that can be generic in the dimensions of arrays, parameter lists, and called subroutines. We describe the application of that approach to a real-world problem in scientific computing, which requires the generic description of inverse ocean modeling tools. In addition to a compiler that can transform generic specifications into efficient Fortran code for models, we have also developed a type system that can identify possible errors already in the specifications. This type system is important for the acceptance of the program generator among scientists because it prevents a large class of errors in the generated code.

Paper.pdf (363K)


Header and Unit Inference for Spreadsheets Through Spatial Analyses

Robin Abraham and Martin Erwig. IEEE Symp. on Visual Languages and Human-Centric Computing (VL/HCC'04), 165-172, 2004
Best Paper Award

This paper describes the design and implementation of a unit and header inference system for spreadsheets. The system is based on a formal model of units that we have described in previous work. Since the unit inference depends on information about headers in a spreadsheet, a realistic unit inference system requires a method for automatically determining headers. The present paper describes (1) several spatial-analysis algorithms for header inference, (2) a framework that facilitates the integration of different algorithms, and (3) the implementation of the system. The combined header and unit inference system is fully integrated into Microsoft Excel and can be used to automatically identify various kinds of errors in spreadsheets. Test results show that the system works accurately and reliably.

Paper.pdf (1M)


Monadification of Functional Programs

Martin Erwig and Deling Ren. Science of Computer Programming, Vol. 52, No. 1-3, 101-129, 2004

The structure of monadic functional programs allows the integration of many different features by just changing the definition of the monad and not the rest of the program, which is a desirable feature from a software engineering and software maintenance point of view. We describe an algorithm for the automatic transformation of a group of functions into such a monadic form. We identify two correctness criteria and argue that the proposed transformation is at least correct in the sense that transformed programs yield the same results as the original programs modulo monad constructors. The translation of a set of functions into monadic form is in most cases only a first step toward an extension of a program by new features. The extended behavior can be realized by choosing an appropriate monad type and by inserting monadic actions into the functions that have been transformed into monadic form. We demonstrate an approach to the integration of monadic actions that is based on the idea of specifying context-dependent rewritings.

Paper.pdf (255K)


Parametric Fortran - A Program Generator for Customized Generic Fortran Extensions

Martin Erwig and Zhe Fu. 6th Int. Symp. on Practical Aspects of Declarative Languages (PADL'04), LNCS 3057, 209-223, 2004

We describe the design and implementation of a program generator that can produce extensions of Fortran that are specialized to support the programming of particular applications. Extensions are specified through parameter structures that can be referred to in Fortran programs to specify the dependency of program parts on these parameters. By providing parameter values, a parameterized Fortran program can be translated into a regular Fortran program. We describe as a real-world application of this program generator the implementation of a generic inverse ocean modeling tool. The program generator is implemented in Haskell and makes use of sophisticated features, such as multi-parameter type classes, existential types, and generic programming extensions and thus represents the application of an advanced applicative language to a real-world problem.

Paper.pdf (150K)


Escape from Zurg: An Exercise in Logic Programming

Martin Erwig. Journal of Functional Programming, Vol. 14, No. 3, 253-261, 2004

In this article we will illustrate with an example that modern functional programming languages like Haskell can be used effectively for programming search problems, in contrast to the widespread belief that Prolog is much better suited for tasks like these.

Paper.pdf (53K)

Haskell Source Code


Toward Spatiotemporal Patterns

Martin Erwig. Spatio-Temporal Databases (eds. De Caluwe et al.), Springer, 29-54, 2004

Existing spatiotemporal data models and query languages offer only basic support to query changes of data. In particular, although these systems often allow the formulation of queries that ask for changes at particular time points, they fall short of expressing queries for sequences of such changes. In this chapter we propose the concept of spatiotemporal patterns as a systematic and scalable concept to query developments of objects and their relationships. Based on our previous work on spatiotemporal predicates, we outline the design of spatiotemporal patterns as a query mechanism to characterize complex object behaviors in space and time. We will not present a fully-fledged design. Instead, we will focus on deriving constraints that will allow spatiotemporal patterns to become well-designed composable abstractions that can be smoothly integrated into spatiotemporal query languages. Spatiotemporal patterns can be applied in many different areas of science, for example, in geosciences, geophysics, meteorology, ecology, and environmental studies. Since users in these areas typically do not have extended formal computer training, it is often difficult for them to use advanced query languages. A visual notation for spatiotemporal patterns can help solving this problem. In particular, since spatial objects and their relationships have a natural graphical representation, a visual notation can express relationships in many cases implicitly where textual notations require the explicit application of operations and predicates. Based on our work on the visualization of spatiotemporal predicates, we will sketch the design of a visual language to formulate spatiotemporal patterns.

Paper.pdf (232K)


Toward the Automatic Derivation of XML Transformations

Martin Erwig. 1st Int. Workshop on XML Schema and Data Management (XSDM'03), LNCS 2814, 342-354, 2003

Existing solutions to data and schema integration require user interaction/input to generate a data transformation between two different schemas. These approaches are not appropriate in situations where many data transformations are needed or where data transformations have to be generated frequently. We describe an approach to an automatic XML-transformation generator that is based on a theory of information-preserving and -approximating XML operations. Our approach builds on a formal semantics for XML operations and their associated DTD transformation and on an axiomatic theory of information preservation and approximation. This combination enables the inference of a sequence of XML transformations by a search algorithm based on the operations' DTD transformations.

Paper.ps.gz (160K), Paper.pdf (135K)


KeyQuery - A Front End for the Automatic Translation of Keywords into Structured Queries

Martin Erwig and Jianglin He. 14th Int. Conf. on Database and Expert Systems Applications (DEXA'03), LNCS 2736, 494-503, 2003

We demonstrate an approach to transform keyword queries automatically into queries that combine keywords appropriately by boolean operations, such as and and or. Our approach is based on an analysis of relationships between the keywords using a taxonomy. The transformed queries will be sent to a search engine, and the returned results will be presented to the user. We evaluate the effectiveness of our approach by comparing the precision of the results returned for the generated query with the precision of the result for the original query. Our experiments indicate that our approach can improve the precision of the results considerably.

Paper.ps.gz (152K), Paper.pdf (256K)


Type-Safe Update Programming

Martin Erwig and Deling Ren. 12th European Symp. on Programming (ESOP'03), LNCS 2618, 269-283, 2003

Many software maintenance problems are caused by using text editors to change programs. A more systematic and reliable way of performing program updates is to express changes with an update language. In particular, updates should preserve the syntax- and type-correctness of the transformed object programs. We describe an update calculus that can be used to update lambda-calculus programs. We develop a type system for the update language that infers the possible type changes that can be caused by an update program. We demonstrate that type-safe update programs that fulfill certain structural constraints preserve the type-correctness of lambda terms.

Paper.ps.gz (164K), Paper.pdf (159K)


Xing: A Visual XML Query Language

Martin Erwig. Journal of Visual Languages and Computing, Vol. 14, No. 1, 5-45, 2003

We present a visual language for querying and transforming XML data. The language is based on a visual document metaphor and the notion of document patterns and rules. It combines a dynamic form-based interface for defining queries and transformation rules with powerful pattern matching capabilities and offers thus a highly expressive visual language. The design of the visual language is specifically targeted at end users.

Paper.ps.gz (114K), Paper.pdf (193K)


Spatio-Temporal Models and Languages: An Approach Based on Data Types

Ralf H. Güting, Mike H. Böhlen, Martin Erwig, Christian S. Jensen, Nikos A. Lorentzos, Enrico Nardelli, and Markus Schneider. Chapter 4 of Spatio-Temporal Databases: The CHOROCHRONOS Approach (eds. T. Sellis et al.), Springer Verlag, LNCS 2520, 97-146, 2003

We develop DBMS data models and query languages to deal with geometries changing over time. In contrast to most of the earlier work on this subject, these models and languages are capable of handling continuously changing geometries, or moving objects. We focus on two basic abstractions called moving point and moving region. A moving point can represent an entity for which only the position in space is relevant. A moving region captures moving as well as growing or shrinking regions. Examples for moving points are people, polar bears, cars, trains, or air planes; examples for moving regions are hurricanes, forest fires, or oil spills in the sea. The main line of research presented in this chapter takes a data type oriented approach. The idea is to view moving points and moving regions as three-dimensional (2D-space + time) or higher-dimensional entities whose structure and behavior is captured by modeling them as abstract data types. These data types can then be integrated as attribute types into relational, object-oriented, or other DBMS data models; they can be implemented as extension packages ("data blades") for suitable extensible DBMSs.


A Visual Language for the Evolution of Spatial Relationships and its Translation into a Spatio-Temporal Calculus

Martin Erwig and Markus Schneider. Journal of Visual Languages and Computing, Vol. 14, No. 2, 181-211, 2003

Queries about objects that change their spatial attributes over time become particularly interesting when they ask for changes in the spatial relationships between different objects. We propose a visual notation that is able to describe scenarios of changing object relationships. The visual language is based on the idea to analyze two-dimensional traces of moving objects to infer a temporal development of their mutual spatial relationships. We motivate the language design by successively simplifying object traces to their intrinsic properties. The notation can be effectively translated in to a calculus of spatio-temporal predicates that formally characterizes the evolution of spatial relationships. We also outline a user interface that supports specifications by menus and a drawing editor. The visual notation can be used directly as a visual query interface to spatio-temporal databases, or it can provide predicate specifications that can be integrated into textual query languages leading to heterogeneous languages.

Paper.ps.gz (258K), Paper.pdf (443K)


A Rule-Based Language for Programming Software Updates

Martin Erwig and Deling Ren. 3rd ACM SIGPLAN Workshop on Rule-Based Programming (RULE'02), 67-77, 2002

We describe the design of a rule-based language for expressing changes to Haskell programs in a systematic and reliable way. The update language essentially offers update commands for all constructs of the object language (a subset of Haskell). The update language can be translated into a core calculus consisting of a small set of basic updates and update combinators. The key construct of the core calculus is a scope update mechanism that allows (and enforces) update specifications for the definition of a symbol together with all of its uses. The type of an update program is given by the possible type changes it can cause for an object programs. We have developed a type-change inference system to automatically infer type changes for updates. Updates for which a type change can be successfully inferred and that satisfy an additional structural condition can be shown to preserve type correctness of object programs. In this paper we define the Haskell Update Language HULA and give a translation into the core update calculus. We illustrate HULA and its translation into the core calculus by several examples.

Paper.ps.gz (71K), Paper.pdf (134K)


Design of Spatio-Temporal Query Languages

Martin Erwig. NSF/BDEI Workshop on Spatio-Temporal Data Models of Biogeophysical Fields for Ecological Forecasting, 2002

In contrast to the field view of spatial data that basically views spatial data as a mapping from points into some features, the object view clusters points by features and their values into spatial objects of type point, line, or region, which can then be integrated as abstract data types into a database system. We can apply the ADT idea to model spatio-temporal data and to integrate such ADTs into data models. The basic idea is very simple and starts from the observation that anything that changes over time can be regarded as a function that maps from time into the data of consideration. The integration of spatio-temporal data types into data models supports the definition of a query languages because the interaction between ADT operations and general querying constructs is constrained to a few well-defined places, so that we are immediately able to pose spatio-temporal queries in a straightforward way.

Paper.ps.gz (96K), Paper.pdf (180K)


STQL: A Spatio-Temporal Query Language

Martin Erwig and Markus Schneider. Chapter 6 of Mining Spatio-Temporal Information Systems, (eds. R. Ladner, K. Shaw, and M. Abdelguerfi), Kluwer Academic Publishers, 105-126, 2002

Integrating spatio-temporal data as abstract data types into already existing data models is a promising approach to creating spatio-temporal query languages. Based on a formal foundation presented elsewhere, we present the main aspects of an SQL-like, spatio-temporal query language, called STQL. As one of its essential features, STQL allows to query and to retrieve moving objects which describe continuous evolutions of spatial objects over time. We consider spatio-temporal operations that are particularly useful in formulating queries, such as the temporal lifting of spatial operations, the projection into space and time, selection, and aggregation. Another important class of queries is concerned with developments, which are changes of spatial relationships over time. Based on the notion of spatio-temporal predicates we provide a framework in STQL that allows a user to build more and more complex predicates starting with a small set of elementary ones. We also describe a visual notation to express developments.

Paper.ps.gz (74K), Paper.pdf (127K)


Visually Customizing Inference Rules About Apples and Orange

Margaret M. Burnett and Martin Erwig. 2nd IEEE Int. Symp. on Human Centric Computing Languages and Environments (HCC'02), 140-148, 2002

We have been working on a unit system for end-user spreadsheets that is based on the concrete notion of units instead of the abstract concept of types. In previous work, we defined such a system formally. In this paper, we describe a visual system to support the formal reasoning in two ways. First, it supports communicating and explaining the unit inference process to users. Second and more important, our approach allows users to change the system's reasoning by adding and customizing the system's inference rules.

Paper.ps.gz (92K), Paper.pdf (93K)


Adding Apples and Oranges

Martin Erwig and Margaret M. Burnett. 4th Int. Symp. on Practical Aspects of Declarative Languages (PADL'02), LNCS 2257, 173-191, 2002

We define a unit system for end-user spreadsheets that is based on the concrete notion of units instead of the abstract concept of types. Units are derived from header information given by spreadsheets. The unit system contains concepts, such as dependent units, multiple units, and unit generalization, that allow the classification of spreadsheet contents on a more fine-grained level than types do. Also, because communication with the end user happens only in terms of objects that are contained in the spreadsheet, our system does not require end users to learn new abstract concepts of type systems.

Paper.ps.gz (225K), Paper.pdf (193K)


Spatio-Temporal Predicates

Martin Erwig and Markus Schneider. IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 4, 881-901, 2002

This paper investigates temporal changes of topological relationships and thereby integrates two important research areas: first, two-dimensional topological relationships that have been investigated quite intensively, and second, the change of spatial information over time. We investigate spatio-temporal predicates, which describe developments of well-known spatial topological relationships. A framework is developed in which spatio-temporal predicates can be obtained by temporal aggregation of elementary spatial predicates and sequential composition. We compare our framework with two other possible approaches: one is based on the observation that spatio-temporal objects correspond to three-dimensional spatial objects for which existing topological predicates can be exploited. The other approach is to consider possible transitions between spatial configurations. These considerations help to identify a canonical set of spatio-temporal predicates.

Paper.ps.gz (239K), Paper.pdf (405K)


Programs are Abstract Data Types

Martin Erwig. 16th IEEE Int. Conf. on Automated Software Engineering (ASE'01), 400-403, 2001

We propose to view programs as abstract data types and to perform program changes by applying well-defined operations on programs. The ADT view of programs goes beyond the approach of syntax-directed editors and proof-editors since it is possible to combine basic update operations into larger update programs that can be stored and reused. It is crucial for the design of update operations and their composition to know which properties they can preserve when they are applied to a program.

In this paper we argue in favor of the abstract data type view of programs, and present a general framework in which different programming languages, update languages, and properties can be studied.

Paper.ps.gz (54K), Paper.pdf (120K)


Inductive Graphs and Functional Graph Algorithms

Martin Erwig. Journal of Functional Programming, Vol. 11, No. 5, 467-492, 2001.

We propose a new style of writing graph algorithms in functional languages which is based on an alternative view of graphs as inductively defined data types. We show how this graph model can be implemented efficiently, and then we demonstrate how graph algorithms can be succinctly given by recursive function definitions based on the inductive graph view. We also regard this as a contribution to the teaching of algorithm and data structures in functional languages since we can use the functional-style graph algorithms instead of the imperative algorithms that are dominant today.

Paper.ps.gz (125K), Paper.pdf (297K)

Haskell Sources


Pattern Guards and Transformational Patterns

Martin Erwig and Simon Peyton Jones. Electronic Notes in Theoretical Computer Science, Vol. 41, No. 1, 12.1-12.27, 2001

We propose three extensions to patterns and pattern matching in Haskell. The first, pattern guards, allows the guards of a guarded equation to match patterns and bind variables, as well as to test boolean condition. For this we introduce a natural generalization of guard expressions to guard qualifiers. A frequently-occurring special case is that a function should be applied to a matched value, and the result of this is to be matched against another pattern. For this we introduce a syntactic abbreviation, transformational patterns, that is particularly useful when dealing with views. These proposals can be implemented with very modest syntactic and implementation cost. They are upward compatible with Haskell; all existing programs will continue to work. We also offer a third, much more speculative proposal, which provides the transformational-pattern construct with additional power to explicitly catch pattern match failure. We demonstrate the usefulness of the proposed extension by several examples, in particular, we compare our proposal with views, and we also discuss the use of the new patterns in combination with equational reasoning.

Paper.dvi.gz (32K), Paper.ps.gz (72K), Paper.pdf (203K)


XML Queries and Transformations for End Users

Martin Erwig. XML 2000, 259-269, 2000

We propose a form-based interface to expresses XML queries and transformations by so-called "document patterns" that describe properties of the requested information and optionally specify how the found results should be reformatted or restructured. The interface is targeted at casual users who want a fast and easy way to find information in XML data resources. By using dynamic forms an intuitive and easy-to-use interface is obtained that can be used to solve a wide spectrum of tasks, ranging from simple selections and projections to advanced data restructuring tasks. The interface is especially suited for end users since it can be used without having to learn a programming or query language and without knowing anything about (query or XML) language syntax, DTDs or schemas. Nevertheless, DTDs can be well exploited, in particular, on the user interface level to support the semi-automatic construction of queries.

Paper.ps.gz (146K), Paper.pdf (108K)


A Visual Language for XML

Martin Erwig. 16th IEEE Symp. on Visual Languages (VL 2000), 47-54, 2000

XML is becoming one of the most influential standards concerning data exchange and Web-presentations. In this paper we present a visual language for querying and transforming XML data. The language is based on a visual document metaphor and the notion of document patterns. It combines an intuitive, dynamic form-based interface for defining queries and transformation rules with powerful pattern matching capabilities and offers thus a highly expressive yet easy to use visual language. Providing visual language support for XML not only helps end users, it is also a big opportunity for the VL community to receive greater attention.

Paper.ps.gz (54K), Paper.pdf (120K)


Random Access to Abstract Data Types

Martin Erwig. 8th Int. Conf. on Algebraic Methodology and Software Technology (AMAST 2000), LNCS 1816, 135-149, 2000

We show how to define recursion operators for random access data types, that is, ADTs that offer random access to their elements, and how algorithms on arrays and on graphs can be expressed by these operators. The approach is essentially based on a representation of ADTs as bialgebras that allows catamorphisms between ADTs to be defined by composing one ADT's algebra with the other ADT's coalgebra. The extension to indexed data types enables the development of specific recursion schemes, which are, in particular, suited to express a large class of graph algorithms.

Paper.ps.gz (127K), Paper.pdf (227K), Long Version.ps.gz (136K), Long Version.pdf (252K)


Formalization of Advanced Map Operations

Martin Erwig and Markus Schneider. 9th Int. Symp. on Spatial Data Handling (SDH 2000), 8a.3-8a.17, 2000

Maps are a fundamental metaphor in GIS. We introduce several new operations on maps that go far beyond well-known operations like overlay or reclassification. In particular, we identify and generalize operations that are of practical interest for spatial analysis and that can be useful in many GIS applications. We give a precise definition of these operations based on a formal model of spatial partitions. This provides a theoretical foundation for maps which also serves as a specification for implementations.

Paper.ps.gz (96K), Paper.pdf (178K)


Query-By-Trace: Visual Predicate Specification in Spatio-Temporal Databases

Martin Erwig and Markus Schneider. 5th IFIP Conf. on Visual Databases (VDB 5), 2000

In this paper we propose a visual interface for the specification of predicates to be used in queries on spatio-temporal databases. The approach is based on a visual specification method for temporally changing spatial situations. This extends existing concepts for visual spatial query languages, which are only capable of querying static spatial situations. We outline a preliminary user interface that supports the specification on an intuitive and easily manageable level, and we describe the design of the underlying visual language. The visual notation can be used directly as a visual query interface to spatio-temporal databases, or it can provide predicate specifications that can be integrated into textual query languages leading to heterogeneous languages.

Paper.ps.gz (72K), Paper.pdf (188K)


A Foundation for Representing and Querying Moving Objects

Ralf H. Güting, Mike H. Böhlen, Martin Erwig, Christian S. Jensen, Nikos A. Lorentzos, Markus Schneider, and Michalis Vazirgiannis.
ACM Transactions on Database Systems, Vol. 25, No. 1, 1-42, 2000
Most Cited TODS Paper in Ten Years (1993-2004)

Spatio-temporal databases deal with geometries changing over time. The goal of our work is to provide a DBMS data model and query language capable of handling such time-dependent geometries, including those changing continuously which describe moving objects. Two fundamental abstractions are moving point and moving region, describing objects for which only the time-dependent position, or position and extent, are of interest, respectively. We propose to represent such time-dependent geometries as attribute data types with suitable operations, that is, to provide an abstract data type extension to a DBMS data model and query language.
This paper presents a design of such a system of abstract data types. It turns out that besides the main types of interest, moving point and moving region, a relatively large number of auxiliary data types is needed. For example, one needs a line type to represent the projection of a moving point into the plane, or a "moving real" to represent the time-dependent distance of two moving points. It then becomes crucial to achieve (i) orthogonality in the design of the type system, i.e., type constructors can be applied uniformly, (ii) genericity and consistency of operations, i.e., operations range over as many types as possible and behave consistently, and (iii) closure and consistency between structure and operations of non-temporal and related temporal types. Satisfying these goals leads to a simple and expressive system of abstract data types that may be integrated into a query language to yield a powerful language for querying spatio-temporal data, including moving objects. The paper formally defines the types and operations, offers detailed insight into the considerations that went into the design, and exemplifies the use of the abstract data types using SQL. The paper offers a precise and conceptually clean foundation for implementing a spatio-temporal DBMS extension.

Paper.ps.gz (143K), Paper.pdf (518K)


The Graph Voronoi Diagram with Applications

Martin Erwig. Networks, Vol 36, No. 3, 156-163, 2000

The Voronoi diagram is a famous structure of computational geometry. We show that there is a straightforward equivalent in graph theory which can be efficiently computed. In particular, we give two algorithms for the computation of graph Voronoi diagrams, prove a lower bound on the problem, and we identify cases where the algorithms presented are optimal. The space requirement of a graph Voronoi diagram is modest, since it needs no more space than the graph itself.
The investigation of graph Voronoi diagrams is motivated by many applications and problems on networks that can be easily solved with their help. This includes the computation of nearest facilities, all nearest neighbors and closest pairs, some kind of collision free moving, and anti-centers and closest points.

Paper.ps.gz (123K), Paper.pdf (252K)


Visual Graphs

Martin Erwig 15th IEEE Symp. on Visual Languages (VL'99), 122-129, 1999

The formal treatment of visual languages is often based on graph representations. Since the matter of discourse is visual languages, it would be convenient if the formal manipulations could be performed in a visual way. We introduce visual graphs to support this goal. In a visual graph some nodes are shown as geometric figures, and some edges are represented by geometric relationships between these figures. We investigate mappings between visual and abstract graphs and show their application in semantics definitions for visual languages and in formal manipulations of visual programs.

Paper.ps.gz (72K), Paper.pdf (67K)


Visual Specifications of Spatio-Temporal Developments

Martin Erwig and Markus Schneider 15th IEEE Symp. on Visual Languages (VL'99), 187-188, 1999

We introduce a visual language for the specification of temporally changing spatial situations. The main idea is to represent spatio-temporal (ST) objects in a two-dimensional way by their trace. The intersections of these traces with other objects are interpreted and translated into sequences of spatial and spatio-temporal predicates, called developments, that can then be used, for example, to query spatio-temporal databases.

Summary.ps.gz (29K), Summary.pdf (15K)


The Honeycomb Model of Spatio-Temporal Partitions

Martin Erwig and Markus Schneider Int. Workshop on Spatio-Temporal Database Management (STDBM'99), LNCS 1678, 39-59, 1999

We define a formal model of spatio-temporal partitions which can be used to model temporally changing maps.  We investigate new applications and generalizations of operations that are well-known for static, spatial maps.  We then define a small set of operations on spatio-temporal partitions that are powerful enough to express all these tasks and more.  Spatio-temporal partitions combine the general notion of temporal objects and the powerful spatial partition abstraction into a new, highly expressive spatio-temporal data modeling tool.

Paper.ps.gz (73K), Paper.pdf (256K)


Developments in Spatio-Temporal Query Languages

Martin Erwig and Markus Schneider IEEE Int. Workshop on Spatio-Temporal Data Models and Languages, 441-449, 1999

Integrating spatio-temporal data as abstract data types into already existing data models is a promising approach to creating spatio-temporal query languages.  In this context, an important new class of queries can be identified which is concerned with developments of spatial objects over time, that is, queries ask especially for changes in spatial relationships.  Based on a definition of the notion of spatio-temporal predicate we provide a framework which allows to build more and more complex predicates starting with a small set of elementary ones.  These predicates can be well used to characterize developments.  We show how these concepts can be realized within the relational data model.  In particular, we demonstrate how SQL can be extended to enable the querying of developments.

Paper.dvi.gz (27K), Paper.ps.gz (38K), Paper.pdf (133K)


Spatio-Temporal Data Types: An Approach to Modeling and Querying Moving Objects in Databases

Martin Erwig, Ralf H. Güting, Markus Schneider, and Michalis Vazirgiannis. GeoInformatica, Vol. 3, No. 3, 269-296, 1999

Spatio-temporal databases deal with geometries changing over time. In general, geometries cannot only change in discrete steps, but continuously, and we are talking about moving objects. If only the position in space of an object is relevant, then moving point is a basic abstraction; if also the extent is of interest, then the moving region abstraction captures moving as well as growing or shrinking regions.  We propose a new line of research where moving points and moving regions are viewed as three-dimensional (2D space + time) or higher-dimensional entities whose structure and behavior is captured by modeling them as abstract data types. Such types can be integrated as base (attribute) data types into relational, object-oriented, or other DBMS data models; they can be implemented as data blades, cartridges, etc. for extensible DBMSs. We expect these spatio-temporal data types to play a similarly fundamental role for spatio-temporal databases as spatial data types have played for spatial databases. The paper explains the approach and discusses several fundamental issues and questions related to it that need to be clarified before delving into specific designs of spatio-temporal algebras.

Paper.ps.gz (86K), Paper.pdf (101K)


Diets for Fat Sets

Martin Erwig. Journal of Functional Programming, Vol. 8, No. 6, 627-632, 1998

In this paper we describe the discrete interval encoding tree for storing subsets of types having a total order and a predecessor and a successor function.  We consider for simplicity only the case for integer sets; the generalization is not difficult. The discrete interval encoding tree is based on the observation that the set of integers { i| a<=i<=b} can be perfectly represented by the closed interval [ a, b]. The general idea is to represent a set by a binary search tree of integers in which maximal adjacent subsets are each represented by an interval. For instance, inserting the sequence of numbers [6, 9, 2, 13, 8, 14, 10, 7, 5] into a binary search tree, respectively, into an discrete interval encoding tree results in the tree structures shown below:

After defining operations for inserting, searching, and deleting elements, we give a best-, worst- and average-case analysis.  We also comment upon some applications and compare actual running times.

Paper.dvi.gz (10K), Paper.ps.gz (30K), Paper.pdf (149K)

ML and Haskell Sources


Abstract Syntax and Semantics of Visual Languages

Martin Erwig. Journal of Visual Languages and Computing, Vol. 9, No. 5, 461-483, 1998

The effective use of visual languages requires a precise understanding of their meaning. Moreover, it is impossible to prove properties of visual languages like soundness of transformation rules or correctness results without having a formal language definition. Although this sounds obvious, it is surprising that only little work has been done about the semantics of visual languages, and even worse, there is no general framework available for the semantics specification of different visual languages. We present such a framework that is based on a rather general notion of abstract visual syntax. This framework allows a logical as well as a denotational approach to visual semantics, and it facilitates the formal reasoning about visual languages and their properties. We illustrate the concepts of the proposed approach by defining abstract syntax and semantics for the visual languages VEX, Show and Tell, and Euler Circles. We demonstrate the semantics in action by proving a rule for visual reasoning with Euler Circles and by showing the correctness of a Show and Tell program.

Paper.ps.gz (87K), Paper.pdf (97K)


Categorical Programming with Abstract Data Types

Martin Erwig. 7th Int. Conf. on Algebraic Methodology and Software Technology (AMAST'98), LNCS 1548, 406-421, 1998

We show how to define fold operators for abstract data types. The main idea is to represent an ADT by a bialgebra, that is, an algebra/coalgebra pair with a common carrier. A program operating on an ADT is given by a mapping to another ADT. Such a mapping, called metamorphism, is basically a composition of the algebra of the second with the coalgebra of the first ADT. We investigate some properties of metamorphisms, and we show that the metamorphic programming style offers far-reaching opportunities for program optimization that cover and even extend those known for algebraic data types.

Paper.dvi.gz (27K), Paper.ps.gz (53K), Paper.pdf (224K)

Haskell Source Files and Related Papers


Temporal Objects for Spatio-Temporal Data Models and a Comparison of Their Representations

Martin Erwig, Markus Schneider, and Ralf H. Güting
Int. Workshop on Advances in Database Technologies, LNCS 1552, 454-465, 1998

Currently, there are strong efforts to integrate spatial and temporal database technology into spatio-temporal database systems.  This paper views the topic from a rather fundamental perspective and makes several contributions.  First, it reviews existing temporal and spatial data models and presents a completely new approach to temporal data modeling based on the very general notion of temporal object.  The definition of temporal objects is centered around the observation that anything that changes over time can be expressed as a function over time.  For the modeling of spatial objects the well known concept of spatial data types is employed.  As specific subclasses, linear temporal and spatial objects are identified.  Second, the paper proposes the database embedding of temporal objects by means of the abstract data type approach to the integration of complex objects into databases.  Furthermore, we make statements about the expressiveness of different temporal and spatial database embeddings.  Third, we consider the combination of temporal and spatial objects into spatio-temporal objects in (relational) databases.  We explain various alternatives for spatio-temporal data models and databases and compare their expressiveness.  Spatio-temporal objects turn out to be specific instances of temporal objects.

Paper.ps.gz (75K), Paper.pdf (69K)


Abstract and Discrete Modeling of Spatio-Temporal Data Types

Martin Erwig, Ralf H. Güting, Markus Schneider, and Michalis Vazirgiannis.
6th ACM Symp. on Geographic Information Systems (ACMGIS'98), 131-136, 1998

Spatio-temporal databases deal with geometries changing over time. In general, geometries cannot only change in discrete steps, but continuously, and we are talking about moving objects. If only the position in space of an object is relevant, then moving point is a basic abstraction; if also the extent is of interest, then the moving region abstraction captures moving as well as growing or shrinking regions. We propose a new line of research where moving points and moving regions are viewed as three-dimensional (2D space + time) or higher-dimensional entities whose structure and behavior is captured by modeling them as abstract data types. Such types can be integrated as base (attribute) data types into relational, object-oriented, or other DBMS data models; they can be implemented as data blades, cartridges, etc. for extensible DBMSs. We expect these spatio-temporal data types to play a similarly fundamental role for spatio-temporal databases as spatial data types have played for spatial databases. In this paper we consider the need for modeling spatio-temporal data types on two different levels of abstraction.

Paper.ps.gz (53K), Paper.pdf (57K)


The Categorical Imperative - Or: How to Hide Your State Monads

Martin Erwig. 10th Int. Workshop on Implementation of Functional Languages (IFL'98), 1-25, 1998

We demonstrate a systematic way of introducing state monads to improve the efficiency of data structures.  When programs are expressed as transformations of abstract data types - which becomes possible by suitable definitions of ADTs and ADT fold operations, we can observe restricted usage patterns for ADTs, and this can be exploited to derive (i) partial data structures correctly implementing the ADTs and (ii) imperative realizations of these partial data structures.  With a library of ADTs together with (some of) their imperative implementations, efficient versions of functional programs can be obtained without being concerned or even knowing about state monads.  As one example we demonstrate the optimization of a graph ADT resulting in an optimal imperative implementation of depth-first search.

Paper.dvi.gz (29K), Paper.ps.gz (55K), Paper.pdf (239K)

Haskell Source Files and Related Papers


Visual Semantics - Or: What You See is What You Compute

Martin Erwig. 14th IEEE Symp. on Visual Languages (VL'98), 96-97, 1998

In this paper we introduce visual graphs as an intermediate representation between concrete visual syntax and abstract graph syntax. In a visual graph some nodes are shown as geometric figures, and some edges are represented by geometric relationships between these figures. By carefully designing visual graphs and corresponding mappings to abstract syntax graphs, semantics definitions can, at least partially, employ a visual notation while still being based on abstract syntax. Visual semantics thus offers the "best of both worlds" by integrating abstract syntax and visual notation. We also demonstrate that these concepts can be used to give visual semantics for traditional textual formalisms. As an example we provide a visual definition of Turing machines.

Summary.ps.gz (31K), Summary.pdf (20K)


Partition and Conquer

Martin Erwig and Markus Schneider. 3rd Int. Conf. on Spatial Information Theory (COSIT'97), LNCS 1329, 389-408, 1997

Although maps and partitions are ubiquitous in geographical information systems and spatial databases, there is only little work investigating their foundations. We give a rigorous definition for spatial partitions and propose partitions as a generic spatial data type that can be used to model arbitrary maps and to support spatial analysis. We identify a set of three powerful operations on partitions and show that the type of partitions is closed under them. These basic operators are sufficient to express all known application-specific operations. Moreover, many map operations will be considerably generalized in our framework. We also indicate that partitions can be effectively used as a meta-model to describe other spatial data types.

Paper.ps.gz (80K), Paper.pdf (92K)


Fully Persistent Graphs - Which One to Choose?

Martin Erwig. 9th Int. Workshop on Implementation of Functional Languages (IFL'97), LNCS 1467, 123-140, 1997

Functional programs, by nature, operate on functional, or persistent, data structures. Therefore, persistent graphs are a prerequisite to express functional graph algorithms. In this paper we describe two implementations of persistent graphs and compare their running times on different graph problems. Both implementations essentially represent graphs as adjacency lists. The first structure uses the version tree implementation of functional arrays to make adjacency lists persistent. An array cache of the newest graph version together with a time stamping technique for speeding up deletions makes it asymptotically optimal for a class of graph algorithms that use graphs in a single-threaded way. The second approach uses balanced search trees to implement adjacency lists. For both structures we also consider several variations, e.g., ignoring edge labels or predecessor information.

Paper.ps.gz (60K), Paper.pdf (68K)


Semantics of Visual Languages

Martin Erwig. 13th IEEE Symp. on Visual Languages (VL'97), 304-311, 1997

The effective use of visual languages requires a precise understanding of their meaning. Moreover, it is impossible to prove properties of visual languages like soundness of transformation rules or correctness results without having a formal language definition. Although this sounds obvious, it is surprising that only little work has been done about the semantics of visual languages, and even worse, there is no general framework available for the semantics specification of different visual languages. We present such a framework that is based on a rather general notion of abstract visual syntax. This framework allows a logical as well as a denotational approach to visual semantics, and it facilitates the formal reasoning about visual languages and their properties. We illustrate the concepts of the proposed approach by defining abstract syntax and semantics for the visual languages VEX, Show and Tell, and Euler Circles. For the latter we also prove a rule for visual reasoning.

Paper.ps.gz (80K), Paper.pdf (95K)


Abstract Visual Syntax

Martin Erwig. 2nd IEEE Int. Workshop on Theory of Visual Languages (TVL'97), 15-25, 1997

We propose a separation of visual syntax into concrete and abstract syntax, much like it is often done for textual languages. Here the focus is on visual programming languages; the impact on visual languages in general is not clear by now. We suggest to use unstructured labeled multi-graphs as abstract visual syntax and show how this facilitates semantics definitions and transformations of visual languages. In particular, disregarding structural constraints on the abstract syntax level makes such manipulations simple and powerful. Moreover, we demonstrate that - in contrast to the traditional monolithic graph definition - an inductive view of graphs provides a very convenient way to move in a structured (and declarative) way through graphs. Again this supports the simplicity of descriptions for transformations and semantics.

Paper.ps.gz (54K), Paper.pdf (57K)

Russian Translation


Vague Regions

Martin Erwig and Markus Schneider. 5th Int. Symp. on Advances in Spatial Databases (SSD'97), LNCS 1262, 298-320, 1997

In many geographical applications there is a need to model spatial phenomena not simply by sharp objects, but rather through indeterminate or vague concepts. To support such applications we present a model of vague regions which covers and extends previous approaches. The formal framework is based on a general exact model of spatial data types. On the one hand, this simplifies the definition of the vague model since we can build upon already existing theory of spatial data types, on the other hand, this approach facilitates the migration from exact to vague models. Moreover, exact spatial data types are subsumed as a special case of the presented vague concepts. We present examples and show how they are represented within our framework. We give a formal definition of basic operations and predicates which particularly allow a more fine-grained investigation of spatial situations than in the pure exact case. We also demonstrate the integration of the presented concepts into an SQL-like query language.

Paper.ps.gz (87K), Paper.pdf (117K)


Functional Programming with Graphs

Martin Erwig. 2nd ACM SIGPLAN Int. Conf. on Functional Programming (ICFP'97), 52-65, 1997
            (also in: ACM SIGPLAN Notices, Vol. 32,No. 8, Aug 1997, pp. 52-65)

Graph algorithms expressed in functional languages often suffer from their inherited imperative, state-based style. In particular, this impedes formal program manipulation. We show how to model persistent graphs in functional languages by graph constructors. This provides a decompositional view of graphs which is very close to that of data types and leads to a ``more functional'' formulation of graph algorithms. Graph constructors enable the definition of general fold operations for graphs. We present a promotion theorem for one of these folds that allows program fusion and the elimination of intermediate results. Fusion is not restricted to the elimination of tree-like structures, and we prove another theorem that facilitates the elimination of intermediate graphs. We describe an ML-implementation of persistent graphs which efficiently supports the presented fold operators. For example, depth-first-search expressed by a fold over a functional graph has the same complexity as the corresponding imperative algorithm.

Paper.ps.gz (62K), Paper.pdf (242K)

ML-Source.tar.gz (7K)    ... for an improved implementation, have a look at the Functional Graph Library


Active Patterns

Martin Erwig. 8th Int. Workshop on Implementation of Functional Languages (IFL'96), LNCS 1268, 21-40, 1996

Active patterns apply preprocessing functions to data type values before they are matched. This is of use for unfree data types where more than one representation exists for an abstract value: In many cases there is a specific representation for which function definitions become very simple, and active patterns just allow to assume this specific representation in function definitions. We define the semantics of active patterns and describe their implementation.

Paper.dvi.gz (30K), Paper.ps.gz (57K), Paper.pdf (245K)


Heterogeneous Visual Languages - Integrating Visual and Textual Programming

Martin Erwig and Bernd Meyer. 11th IEEE Symp. on Visual Languages (VL'95), 318-325, 1995

After more than a decade of research, visual languages have still not become everyday programming tools. On a short term, an integration of visual languages with well-established (textual) programming languages may be more likely to meet the actual requirements of practical software development than the highly ambitious goal of creating purely visual languages. In such an integration each paradigm can support the other where it is superior. Particularly attractive is the use of visual expressions for the description of domain-specific data structures in combination with textual notations for abstract control structures. In addition to a basic framework for heterogeneous languages, we outline the design of a development system that allows rapid prototyping of implementations of heterogeneous languages. Examples will be presented from the domains of logical, functional, and procedural languages.

Paper.ps.gz (66K), Paper.pdf (63K)


Encoding Shortest Paths in Spatial Networks

Martin Erwig. Networks, Vol. 26,291-303, 1995

A new data structure is presented which facilitates the search for shortest paths in spatially embedded planar networks in a worst-case time of O(l log r) where l is the number of edges in the shortest path to be found and r is an upper bound on the number of so-called ``cross edges'' (these are edges connecting, for any node v, different shortest path subtrees rooted at v's successors). The data structure is based on the idea to identify shortest path subtrees with the regions in the plane they cover. In the worst-case the space requirement is O(rn), which, in general, is O(n^2), but for regularly shaped networks it is expected to be only O(n\sqrt(n)). A decomposition of graphs into bi-connected components can be used to reduce the sizes of the trees to be encoded and to reduce the complexity of the regions for these trees. The decomposition also simplifies the algorithm for computing encoding regions, which is based on minimum link paths in polygons. Approximations for region boundaries can effectively be utilized to speed up the process of shortest path reconstruction: For a realistically constrained class of networks, i.e., networks in which the ratio of the shortest path distance between any two points to the Euclidean distance between these points is bounded by a constant, it is shown that an average searching time of O(l) can be achieved.


DEAL - A Language for Depicting Algorithms

Martin Erwig. 10th IEEE Symp. on Visual Languages (VL'94),184-185, 1994

Algorithms can be regarded as instructions for building and/or changing data structures. In order to describe algorithms visually we need (i) a visual representation of data structures, and (ii) a (visual) representation of modifications.
An instructive way for explaining algorithms found in many data structure textbooks is to describe changes in data structures by drawing a picture showing the situation before the change and a picture denoting the modified structure. We believe that following this tradition is a good policy for developing a visual algorithmic language. Thus, we will utilize data type icons to denote objects of a certain type, and we will use action rules for describing changes in pictures and the denoted data structures.
Since typical operations often work on subparts of a larger data structure (e.g., add a node to a tree or exchange two elements of an array) it is convenient to be able to denote parts of data structures. In the presented language this is achieved by attaching labels to data type icons at specific positions. Some label positions denote unique parts of the data type whereas other locations mean collections of subobjects. The latter case is exploited to implicitly describe iterations over objects in complex data structures. Let us convey a concrete impression by an example. The action rule that is shown in the following Figure introduces bindings for variables denoting parts of an array A on the left side and describes changes to be performed on A by relocating labels on the right side (exchange y and z).

Bubblesort Picture

The variables i and j below the array denote indices which are interpreted by (nested) loops over A: Here, by default, the first loop for i ranges from 1 to n, and the inner loop for j runs from n down to i. The lower bound of this loop is given by the condition under the arrow, j> i, which also specifies the direction of the loop (> means descending). The condition above the arrow restricts the action described by the right side: The array elements y and z are only exchanged if y< z. Since the label z is located directly adjacent to y both labels are interpreted as adjacent array elements.
Pattern matching is a very powerful feature used by many modern functional languages, such as ML or Haskell, to define functions on structured data. We will use icon patterns, showing the composition of complex data structures, for distinguishing arguments for different cases of action rules and building new structures. Specifically, in building new array structures the concept of array comprehensions is extremely useful. An array comprehension is an expression [x \in A | P(x)] denoting a new array B consisting of all elements of A for which condition P is true. The order of the elements of B is the same as in A. One or more array comprehensions are displayed by an array icon with one or more variables inside (i.e., not denoting the first or last element in the array) for which the conditions are given above the arrow symbol. Consider, e.g., the recursive function definition of Quicksort:

Quicksort Picture

The icon above the box describes how to apply the defined function. The action rule inside the box initially binds x to A[1] (since the label x is located at the beginning of the array icon) and describes three array comprehensions: [a \in A | a< x], [b \in A | b=x], and [c \in A | c> x]. The pattern on the the right side appends the three new arrays defined by the array comprehensions where the quicksort function is recursively applied to the first and to the third one.
In this paper, we compare our visual language with other approaches, and we provide a formal definition together with describing the translation into a textual algorithmic language.


Explicit Graphs in a Functional Model for Spatial Databases

Martin Erwig and Ralf H. Güting. IEEE Transactions on Knowledge and Data Engineering, Vol. 5,No. 6, 787-804, 1994

Observing that networks are ubiquitous in applications for spatial databases, we define a new data model and query language that especially supports graph structures. This model integrates concepts of functional data modeling with order-sorted algebra. Besides object and data type hierarchies graphs are available as an explicit modeling tool, and graph operations are part of the query language. Graphs have three classes of components, namely nodes, edges, and explicit paths. These are at the same time object types within the object type hierarchy and can be used like any other type. Explicit paths are useful because ``real world'' objects often correspond to paths in a network. Furthermore, a dynamic generalization concept is introduced to handle heterogeneous collections of objects in a query. In connection with spatial data types this leads to powerful modeling and querying capabilities for spatial databases, in particular for spatially embedded networks such as highways, rivers, public transport, and so forth. We use multi-level order-sorted algebra as a formal framework for the specification of our model. Roughly spoken, the first level algebra defines types and operations of the query language whereas the second level algebra defines kinds (collections of types) and type constructors as functions between kinds and so provides the types that can be used at the first level.

Paper.ps.gz (123K), Paper.pdf (147K), Long Version (Report).ps.gz (156K), Long Version (Report).pdf (181K)


Graphs in Spatial Databases

Martin Erwig. Doctoral Thesis, FernUniversität Hagen, 1994

This thesis investigates the integration of graph structures into spatial databases. We consider three different perspectives:
First and foremost, we define a functional data model extended by spatial datatypes to which graphs are added by means of special type constructors. Graphs are defined as an orthogonal, ``first-class'', concept of the model which enables the clear modeling of applications and the direct formulation of graph queries. The latter is achieved by presenting the functional model within an algebraic setting where graph operations are directly available as functions. In order to give a precise account for extending algebras by polymorphic type constructors we introduce the formalism of multi-level algebra which allows the description of type systems as well as languages within one formal framework. Multi-level algebra is capable of expressing very subtle details of type systems, such as preventing type constructors from being nested or describing operations with a variable number of arguments. These features are needed by the functional model which is entirely specified by the multi-level algebra formalism.
Second, realizing that all interesting computations on graph structures cannot be captured by a small set of operators --- at least not in an efficient way --- we develop a programming language for the specification of graph algorithms. This is done by formulating a powerful exploration paradigm which regards a large class of graph algorithms as iterations over graph elements. The exact behavior of explorations is primarily determined by special data structures which are supplied as arguments to exploration operators. This graph language is defined and implemented as an extension of ML. The integration with ML allows an even larger class of graph problems to be addressed. The design of the language is chosen on the one hand to provide very succinct and clear descriptions of graph algorithms and on the other hand to retain an efficient implementation. The second aspect is achieved by introducing recursive array definitions which are the formal backbone of the computations performed during explorations. Recursive arrays are a new means for introducing mutable state in an efficient and at the same time declarative way into functional languages.
Finally, we propose data structures and algorithms supporting the efficient realization of some very important graph operations. The first is a geometric index for encoding shortest paths. The idea is to represent shortest path (sub-) trees by the regions they cover in the plane. Point location in these regions is then used to reconstruct shortest paths. Concerning space and time complexity the proposed structure outperforms previous proposals in some practically relevant classes of graphs. The second structure is the graph Voronoi diagram which is a partition of the nodes in a graph according to shortest paths to a set of reference nodes. We give efficient algorithms for the construction of graph Voronoi diagrams and show their use by presenting numerous example queries which are perfectly supported by this structure.


Specifying Type Systems with Multi-Level Order-Sorted Algebra

Martin Erwig. 3rd Int. Conf. on Algebraic Methodology and Software Technology (AMAST'93),177-184, 1993

We propose to use order-sorted algebras (OSA) on multiple levels to describe languages together with their type systems. It is demonstrated that even advanced aspects can be modeled, including, parametric polymorphism, complex relationships between different sorts of an operation's rank, the specification of a variable number of parameters for operations, and type constructors using values (and not only types) as arguments.
The basic idea is to use a signature to describe a type system where sorts denote sets of type names and operations denote type constructors. The values of an algebra for such a signature are then used as sorts of another signature now describing a language having the previously defined type system. This way of modeling is not restricted to two levels, and we will show useful applications of three-level algebras.

Paper.dvi.gz (16K), Paper.ps.gz (44K), Paper.pdf (221K)


Graph Algorithms = Iteration + Data Structures? The Structure of Graph Algorithms and a Corresponding Style of Programming

Martin Erwig. 18th Int. Workshop on Graph Theoretic Concepts in Computer Science (WG'92), LNCS 657, 277-292, 1992

We encourage a specific view of graph algorithms, which can be paraphrased by "iterate over the graph elements in a specific order and perform computations in each step". Data structures will be used to define the processing order, and we will introduce recursive mapping/array definitions as a vehicle for specifying the desired computations. Being concerned with the problem of implementing graph algorithms, we outline the extension of a functional programming language by graph types and operations. In particular, we explicate an exploration operator that essentially embodies the proposed view of algorithms. Fortunately, the resulting specifications of algorithms, in addition to being compact and declarative, are expected to have an almost as efficient implementation as their imperative counterparts.

Paper.dvi.gz (24K), Paper.ps.gz (55K), Paper.pdf (246K)


A Functional DBPL Revealing High Level Optimizations

Martin Erwig and Udo W. Lipeck. 3rd Int. Workshop on Database Programming Languages (DBPL3),306-321, 1991

We present a functional DBPL in the style of FP that facilitates the definition of precise semantics and opens up opportunities for far-reaching optimizations. The language is integrated into a functional data model, which is extended by arbitrary type hierarchies and complex objects. Thus we are able to provide the clarity of FP-like programs together with the full power of semantic data modeling. To give an impression of the special facilities for optimizing functional database languages, we point out some laws not presented before which enable access path selection already on the algebraic level of optimization. The algebraic way of access path optimization also gives new insights into optimization strategies.

Paper.ps.gz (68K), Paper.pdf (217K)


last change: December 3, 2009 Martin Erwig  erwig@eecs.oregonstate.edu