White Paper on
Formation of the High Performance Debugging Forum
February 7, 1997
Jeff Brown, Los Alamos National Laboratory
Joan Francioni, University of Southwest Louisiana
Cherri Pancake, Oregon State University
Introduction
The working group on Debugging at the Workshop on Software Tools for
High Performance Computing Systems (Cape Cod, 1996)
addressed the question of how to evaluate debuggers for high performance
computing. The group concluded that this was a difficult task, partially due
to the lack of any standards specifying what constitutes a parallel debugger.
The group proposed that a debugger standards effort be initiated to address
this problem. (Similar proposals had been made by other groups, including a
national task force on
Guidelines for Requirements on System Software and Tools and the previous
Cape Cod workshop.)
It was decided that the next step would be to air this out to
a larger audience at the next Supercomputing conference.
A Birds-of-a-Feather session was held at Supercomputing '96 to explore
community interest in a debugging standards effort. It was
decided at that session to go ahead with this effort and to hold the first
formal meeting of the High Performance Debugging Forum in March, in conjunction
with the SIAM
Conference on Parallel Processing for Scientific Computing. An initial
steering committee for the Forum was created, consisting of Jeff Brown,
Joan Francioni, and Cherri Pancake.
General Discussion
The goal of an effective debugging tool for high performance
computing (HPC) is to be useful in helping users find out what they need
to know in order to debug their applications. To meet user needs, it
also must be easy-to-use, either portable or consistetly implemented across a
range of platforms, scalable (to the extent of supporting a range of
application sizes), and extensible (at least in terms of accommodating key
parallel languages).
The debugging community has
yet to come up with one debugger that meets all of these goals. This is
not because the goals aren't understood. Rather, it is is because debugging
tools are very complex pieces of code that must be compatible with a
specific machine with a specific compiler and under a specific operating
system. As these three components are constantly changing in the high
performance computing arena, it has been difficult for researchers and vendors
to have access to stable platforms long enough to be able to develop
appropriate tools. Further, there are no accepted standards about what
debuggers should do nor about what kinds of support compilers/operating
systems should provide to the debugging program.
This lack of standardization affects both users and developers of
parallel debuggers. Users find they must learn a new tool for every machine
where their applications execute. The problems they experience range from
non-standard and inconsistent command semantics (e.g., breakpoint and
next) to confusing screens full of too many windows and an inability to
do the things they need done. The effort required to learn a new tool also
frequently turns out to be a major issue. It is not cost-effective for
a user to spend a large amount of time learning a new tool for a platform
that may not be in existence for very long, or for a tool that is not very
useful.
For debugger implementors, the difficulty of writing a debugger as new
systems are introduced is greatly increased by the lack of established
standards. They find that they are forever re-developing the same
functional code as machines, compilers, and operating systems change under
their feet. Tool developers are frequently behind the power curve when it
comes to actual tool deployment. New and complex computing platforms and
programming models require sophisticated development tools in order to
exploit their full power, yet development tools are invariably the last
software to arrive on a new system. The tools stabilize just in time for
the next generation of system to be introduced. This cycle needs to be broken.
Finally, both users and developers cite the lack of
user experimentation and user input during the development process
as a serious impediment to building useful debugging tools.
High Performance Debugging Forum
The overall goal of the High Performance Debugging Forum (HPDF) is to
attack some of the problems just described, by defining a useful and
appropriate set of standards relevant to debugging tools for HPC systems.
To this end, HPDF will work to accommodate the needs of both users and
tool developers. The Forum will include a combination of users, commercial
tool developers, and academic/research tool developers. The need for input
from all three of these groups is considered critical to this effort. The
Parallel Tools Consortium is being asked to sponsor the Forum, and to work
with HPDF to ensure appropriate representation of users and commercial
developers.
In general, HPDF's efforts will focus on parallel systems and
languages that are being used for research and production in the HPC
arena. In particular, the scope of effort will be constrained as follows:
- The standards developed will not preclude the use of any particular
parallel programming model. Every debugging command does not have to
be applicable to every programming model, but no command should be in
conflict with any model.
- The standards developed will not preclude the use of any particular
parallel architecture.
- The standards will be developed to work with structured programming
languages. Functional and interpreted languages will not be explicitly
considered. The emphasis will be on languages commonly used in
scientific programming, such as F77, F90 and C. (HPF??, C++??)
- Standards efforts will address both user level standards and
standards for the underlying software infrastructure, such as a common
API to compiler-generated information about a program.
- Different standards will be addressed in an orderly sequence. In order
to expedite their adoption and implementation, HPDF will capitalize
where possible on converging technologies, de facto norms, or other
instances of "low-hanging fruit."
- Where possible, standards will be defined in a series of levels,
such that each successive level builds upon the earlier ones.
Initial Focus of Effort
The initial focus of the group will be on defining the syntax and
semantics of a standard command-line interface for parallel debuggers.
In view of the disparity among current debuggers, the interface will
be defined in a series of levels, each addressing a particular subset
of debugger functionality.
First Meeting
The first meeting of the Forum will be held March 17-18, 1997 at
the Hyatt Regency Hotel in Minneapolis (the conference hotel for
the SIAM
Conference on Parallel Processing for Scientific Computing).
The meeting will begin at 1PM on Monday and continue until noon
on Tuesday.
There will be no registration fee for persons who pre-register for
the first meeting, which will be
sponsored by the Northwest Alliance
for Computational Science and Engineering. A registration form
is attached to the end of this document. Late and on-site registrants will be
charged a $75 registration fee to help defray last-minute expenses.
The objectives of the first meeting are:
- define what debugging features will be included in Level 1 and how
to partition the work
- determine how much/little consensus already occurs among parallel
debuggers in terms of Level 1 features
- lay out a plan to accomplish a Level 1 specification by SC'97
- determine milestones and specific goals
- establish initial working groups
Initial Effort for Command-Line Interface Levels
This is an initial attempt to identify the features (1) on which we are most
likely to achieve consensus, and (2) which are already part of all (or most)
parallel debuggers. The rationale for this is that if a standard requires
adding new features - as opposed to "normalizing" existing
features - there is going to be significantly longer lead time before
any results are felt by the user community. We recognize that it is
hard to keep users participating in standards efforts on good-faith alone,
and want to ensure that their efforts pay off.
For that reason, we are suggesting that initial efforts be directed at
debugger control features. There is considerable overlap in this
type of functionality from one existing debugger to another, and that
functionality is well understood. This appears to be an instance
of "low-hanging fruit."
It is important to note that, just because a feature is excluded from Level
1 doesn't mean it will be missing from a debugger that conforms to the
standard. The original operations will still be there -- they
just won't be standard yet; only the Level 1 features will be certain
to be consistent.
[General note: The standards developed will be applicable to processes,
threads, and sets of both.]
- Level 1: Command-Line Interface of Control Features
- Execution Control: intermittent execution, including conditionals and
watchpoints, stepped execution
- Process/Thread Sets: defining sets, applying sets
- Process/Thread/Processor States: examining state, modifying state
- Program State: identifying code location, viewing source code
- Level 2: Command-Line Interface of Output Features
- Data Display: displaying values of scalars and arrays, selecting
portions of output (data filtering), targeted variables,
targeted output values (A[i[j]] = 100), tabular and other display
formats
- Data Modification
- Watchpoints and conditional watchpoints
- Level 3: Communication Features
- Level 4: Optimized/Assembly Code, Code Patching, other more advanced
features?
The target completion date for Level 1 is November 1997, with results
to be reported at SC'97.
It may be desirable to start on other levels before Level 1 is complete,
or to establish additional working groups to begin work on tool interfaces,
such as Compiler-to-Debugger, Operating System-to-Debugger, Machine-to-Debugger
(e.g. disassembly), Display Tool-to-Debugger, or Performance Tool-to-Debugger.
Registration Form
HIGH PERFORMANCE DEBUGGING FORUM
Inaugural meeting
** Deadline for free registration is Friday, March 7 **
** Deadline for hotel discount is Friday, February 21 **
Dates: 1:00 PM, March 17 - 12:00 PM, March 18
Fee: none, if registration is received by Friday, March 7
Late Fee: $75 for late and on-site registrations
Housing: Hyatt Regency in Minneapolis. If you have a SIAM room
reservation, it may be extended for our meeting. For new reservations contact
the Hyatt's reservation desk at (612) 370-1475, naming the High Performance
Debugging Forum or SIAM. The Hyatt will honor the special room rate arranged
for SIAM, of $119 per night.
Please register early so that we can plan room counts!!
Reply to: Joan Winter (Oregon State University), winter@cs.orst.edu
** Deadline for free registration is Friday, March 7 **
Questions and comments to
jeffb@lanl.gov, 7 February 1997