Parallel Tools Consortium Projects

Message Queue Manager


One of the most common sources of error in development of parallel applications is message passing. Message passing bugs can lead to deadlock, wrong answers, memory violations, or almost any other type of application failure.

Unfortunately, there are few mechanisms available for isolating such bugs. Those that do exist use somewhat cryptic output to describe the status of an application's message operations. Many users simply add print statements to their application to create a program trace, which they can use to analyze what happened.

The Ptools Message Queue Manager project was formed in response to this need. Its goal is a parallel debugging interface for examining the state of an application's message passing. The interface is general enough to support a variety of hardware platforms and message passing systems, yet specific enough to provide the programmer with enough information to easily find message passing bugs.



What the Message Queue Manager Does

The Message Queue Manager (MQM) provides a graphical user interface (GUI) for examining a parallel application's message passing state. MQM provides access to information about pending message operations at various levels of granularity, allowing the user to zoom in to focus on a particular process, or zoom out to see an snapshot view of the entire application. MQM requires the parallel application to be in a stopped state (e.g. at a debugger breakpoint) to view message passing status.
Main display of MQM

The main display shows the number of pending message operations for each process. Note that the controls are very simple: only two windows and a few buttons. The display is scalable, since it grows as the number of processes increases (256 are shown here). The user can view the status of these operations from many perspectives, using any combination of the following:

Supplementary displays allow the user to focus on a subset of the processes or view the specific message operations associated with a particular process.
Filter control window


How the Message Queue Manager Works

MQM consists of a Graphical User Interface (GUI) and a library that provides an interface to the target message passing system. This library, the Query Manager, translates requests for message status from MQM into whatever system-dependent form is required, and translates the resulting data into the standard MQM format.

The Query Manager is designed to provide an interface to a group of daemons in a cluster system, the run-time system of a parallel machine, or to a debugger or monitor that provides adequate functionality. Implementation of a query manager interfacing to the IPD debugger on the Intel Paragon was straightforward and took only a few days.
Structure of MQM

The GUI is implemented as a set of Tcl/Tk scripts that are executed by an embedded Tcl interpreter. These scripts can be configured so that features not supported by a particular target platform can be removed from the interface.


How You Can Participate

The MQM interface implementations for each machine will be made available in the form of source code (Tcl/Tk and C) royalty-free. This project is drawing to a close, but help is needed in order to complete it successfully.

If you are a parallel computer user, you should be aware that significant user input helped to guide every phase of the design of MQM. Although this phase of the project has come to a close, we are still interested in hearing your views on how the design might be improved.

If you work for a parallel computer vendor, we need your help in providing the users with an interface for examining the status of an application's message operations on your company's platform(s). Ptools is willing to make the interface between MQM and your system available "anonymously" to the general community, if your company prefers not to assume responsibility for the accuracy or longevity of the mechanism.

Current Status

A Tk/Tcl graphical interface has been developed in prototype form to show: The prototype has been successfully integrated with Intel's Interactive Parallel Debugger (IPD) for the Paragon. Functionality to support MQM has been implemented in the AIMS toolset from NASA, which supports PVM clusters, but MQM has not yet been integrated with AIMS.

For More Information

Visit the Web pages at http://www.ptools.org
This document was last updated 27 November '95.