White Paper on

Formation of the High Performance Debugging Forum

February 7, 1997

Jeff Brown, Los Alamos National Laboratory
Joan Francioni, University of Southwest Louisiana
Cherri Pancake, Oregon State University


Introduction

The working group on Debugging at the Workshop on Software Tools for High Performance Computing Systems (Cape Cod, 1996) addressed the question of how to evaluate debuggers for high performance computing. The group concluded that this was a difficult task, partially due to the lack of any standards specifying what constitutes a parallel debugger. The group proposed that a debugger standards effort be initiated to address this problem. (Similar proposals had been made by other groups, including a national task force on Guidelines for Requirements on System Software and Tools and the previous Cape Cod workshop.) It was decided that the next step would be to air this out to a larger audience at the next Supercomputing conference.

A Birds-of-a-Feather session was held at Supercomputing '96 to explore community interest in a debugging standards effort. It was decided at that session to go ahead with this effort and to hold the first formal meeting of the High Performance Debugging Forum in March, in conjunction with the SIAM Conference on Parallel Processing for Scientific Computing. An initial steering committee for the Forum was created, consisting of Jeff Brown, Joan Francioni, and Cherri Pancake.


General Discussion

The goal of an effective debugging tool for high performance computing (HPC) is to be useful in helping users find out what they need to know in order to debug their applications. To meet user needs, it also must be easy-to-use, either portable or consistetly implemented across a range of platforms, scalable (to the extent of supporting a range of application sizes), and extensible (at least in terms of accommodating key parallel languages).

The debugging community has yet to come up with one debugger that meets all of these goals. This is not because the goals aren't understood. Rather, it is is because debugging tools are very complex pieces of code that must be compatible with a specific machine with a specific compiler and under a specific operating system. As these three components are constantly changing in the high performance computing arena, it has been difficult for researchers and vendors to have access to stable platforms long enough to be able to develop appropriate tools. Further, there are no accepted standards about what debuggers should do nor about what kinds of support compilers/operating systems should provide to the debugging program.

This lack of standardization affects both users and developers of parallel debuggers. Users find they must learn a new tool for every machine where their applications execute. The problems they experience range from non-standard and inconsistent command semantics (e.g., breakpoint and next) to confusing screens full of too many windows and an inability to do the things they need done. The effort required to learn a new tool also frequently turns out to be a major issue. It is not cost-effective for a user to spend a large amount of time learning a new tool for a platform that may not be in existence for very long, or for a tool that is not very useful.

For debugger implementors, the difficulty of writing a debugger as new systems are introduced is greatly increased by the lack of established standards. They find that they are forever re-developing the same functional code as machines, compilers, and operating systems change under their feet. Tool developers are frequently behind the power curve when it comes to actual tool deployment. New and complex computing platforms and programming models require sophisticated development tools in order to exploit their full power, yet development tools are invariably the last software to arrive on a new system. The tools stabilize just in time for the next generation of system to be introduced. This cycle needs to be broken.

Finally, both users and developers cite the lack of user experimentation and user input during the development process as a serious impediment to building useful debugging tools.


High Performance Debugging Forum

The overall goal of the High Performance Debugging Forum (HPDF) is to attack some of the problems just described, by defining a useful and appropriate set of standards relevant to debugging tools for HPC systems. To this end, HPDF will work to accommodate the needs of both users and tool developers. The Forum will include a combination of users, commercial tool developers, and academic/research tool developers. The need for input from all three of these groups is considered critical to this effort. The Parallel Tools Consortium is being asked to sponsor the Forum, and to work with HPDF to ensure appropriate representation of users and commercial developers.

In general, HPDF's efforts will focus on parallel systems and languages that are being used for research and production in the HPC arena. In particular, the scope of effort will be constrained as follows:

  1. The standards developed will not preclude the use of any particular parallel programming model. Every debugging command does not have to be applicable to every programming model, but no command should be in conflict with any model.
  2. The standards developed will not preclude the use of any particular parallel architecture.
  3. The standards will be developed to work with structured programming languages. Functional and interpreted languages will not be explicitly considered. The emphasis will be on languages commonly used in scientific programming, such as F77, F90 and C. (HPF??, C++??)
  4. Standards efforts will address both user level standards and standards for the underlying software infrastructure, such as a common API to compiler-generated information about a program.
  5. Different standards will be addressed in an orderly sequence. In order to expedite their adoption and implementation, HPDF will capitalize where possible on converging technologies, de facto norms, or other instances of "low-hanging fruit."
  6. Where possible, standards will be defined in a series of levels, such that each successive level builds upon the earlier ones.


Initial Focus of Effort

The initial focus of the group will be on defining the syntax and semantics of a standard command-line interface for parallel debuggers. In view of the disparity among current debuggers, the interface will be defined in a series of levels, each addressing a particular subset of debugger functionality.


First Meeting

The first meeting of the Forum will be held March 17-18, 1997 at the Hyatt Regency Hotel in Minneapolis (the conference hotel for the SIAM Conference on Parallel Processing for Scientific Computing). The meeting will begin at 1PM on Monday and continue until noon on Tuesday.

There will be no registration fee for persons who pre-register for the first meeting, which will be sponsored by the Northwest Alliance for Computational Science and Engineering. A registration form is attached to the end of this document. Late and on-site registrants will be charged a $75 registration fee to help defray last-minute expenses.

The objectives of the first meeting are:

Initial Effort for Command-Line Interface Levels

This is an initial attempt to identify the features (1) on which we are most likely to achieve consensus, and (2) which are already part of all (or most) parallel debuggers. The rationale for this is that if a standard requires adding new features - as opposed to "normalizing" existing features - there is going to be significantly longer lead time before any results are felt by the user community. We recognize that it is hard to keep users participating in standards efforts on good-faith alone, and want to ensure that their efforts pay off.

For that reason, we are suggesting that initial efforts be directed at debugger control features. There is considerable overlap in this type of functionality from one existing debugger to another, and that functionality is well understood. This appears to be an instance of "low-hanging fruit."

It is important to note that, just because a feature is excluded from Level 1 doesn't mean it will be missing from a debugger that conforms to the standard. The original operations will still be there -- they just won't be standard yet; only the Level 1 features will be certain to be consistent.

[General note: The standards developed will be applicable to processes, threads, and sets of both.]

The target completion date for Level 1 is November 1997, with results to be reported at SC'97. It may be desirable to start on other levels before Level 1 is complete, or to establish additional working groups to begin work on tool interfaces, such as Compiler-to-Debugger, Operating System-to-Debugger, Machine-to-Debugger (e.g. disassembly), Display Tool-to-Debugger, or Performance Tool-to-Debugger.


Registration Form

HIGH PERFORMANCE DEBUGGING FORUM
Inaugural meeting

** Deadline for free registration is Friday, March 7 ** ** Deadline for hotel discount is Friday, February 21 **

Dates: 1:00 PM, March 17 - 12:00 PM, March 18

Fee: none, if registration is received by Friday, March 7
Late Fee: $75 for late and on-site registrations

Housing: Hyatt Regency in Minneapolis. If you have a SIAM room reservation, it may be extended for our meeting. For new reservations contact the Hyatt's reservation desk at (612) 370-1475, naming the High Performance Debugging Forum or SIAM. The Hyatt will honor the special room rate arranged for SIAM, of $119 per night.

Please register early so that we can plan room counts!!

** Deadline for free registration is Friday, March 7 **


Questions and comments to jeffb@lanl.gov, 7 February 1997