Portable Timing Routines 

                               Proposed by

|                       Cherri Pancake (Oregon State)
|               Reagan Moore (San Diego Supercomputing Center)
|          Sally Haerer (National Center for Atmospheric Research)
|               Allan Porterfield (Tera Computer Corporation)
|                      Wayne Smith (Intel Corporation)

1.   User Requirement to Be Addressed

The limited availability and the expense of MPP computer time has
resulted in development scenarios where,

	-   Code is developed on a workstation prior to uploading it to
	    a parallel machine;

        -   The application is benchmarked (in serial) on a workstation
	    to acquire baseline results and timings;

        -   The application is run in small-scale parallelism across a
	    few workstations, or simulated parallelism via multiple 
	    processes on a single workstation, for testing and debugging
	    purposes; or

        -   Small-scale production runs are carried out on a cluster or a
	    parallel machine with a very small number of processors, while
	    major runs are executed on an MPP.

>From the programmer's perspective, one of the most frustrating aspects of
developing a parallel application on a given system, for execution on one
or more other systems, is the difficulty of timing program execution. The
functions available for obtaining timing information, referred to here as
timers, vary widely from one computer to another. In many cases, the routines 
available at the source level (i.e., those not requiring knowledge of assembly
language) have poor resolution, often on the order of 10-50 microseconds. 
The resolution varies significantly from one platform to another. 
Functionality, such as the ability to acquire data on user vs. system CPU
usage, also varies.

Many programmers report having to develop their own timing routines, or
acquire them from acquaintances. At worst, the result can be unreliable
values and/or serious perturbation of performance. At best, the user is
forced to replicate the efforts of system developers (not to mention the 
efforts of other users), who clearly have the needed accurate, minimally 
intrusive timing routines. 

Our solution is to propose a uniform API (Application Programmer's Interface)
for a library of high resolution timers supported across current MPP and
workstation platforms.

2.   Tool Functionality

The Portable Timing Routines (PTR) will provide timers, callable from C
and Fortran, with an API that is standard across MPP and workstation platforms.
Comments from applications programmers and initial surveys indicate that
users are interested in two basic categories of timers: timers that increment
in real time(wall clock) and timers that increment in process virtual time(CPU
time). Interest in CPU time is further focused on time spent in executing user
code vs. system code. Consequently, these are the types of timers that will be
provided through the PTR. Moreover, users report employing timers almost
exclusively in situations where they make repeated calls to the timing
routines, save the results, then use them to calculate "delta times" (i.e.,
the difference between subsequent timer values, representing the interval
that elapsed between two points of interest). This usage is so prevalent
that the PTR routines will also offer elapsed-time calls for each of the 
timer types.

The accuracy of the timing values returned will be the highest feasible for
the PTR implementation on the specific platform, and therefore may vary
from one platform to another. Utility functions will be supplied for querying
the accuracy of each timer type.

Thus, the programmatic interface provided by the PTR will support the
following functionality:

|     o  Local timer incrementing in real time
|     o  Local timer incrementing in process virtual time
|     o  Initializing the timer to a particular value
|     o	Current value of the timer
|     o  Resetting the timer to a particular value
|     o	Elapsed time since the previous call to the timer
|     o	Query functions for determining the accuracy of the timer on
	this platform

The library will make it possible for a parallel application program,
running (concurrently or consecutively) across multiple platforms, to 
invoke timers in a relatively platform independent way -- where "relative" 
is a caveat reflecting the fact that the availability of certain time 
values and their resolution may vary substantially.

| This proposal limits itself to providing local timers and in defining CPU
| time semantics. The need for global timers, multiple timers, and timers
| incrementing in virtual thread time has been established in working group
| discussions. They will be considered in detail in a separate proposal.

3.   Interface to Other Systems on Which the Tool Depends

The purpose of the PTR is to isolate the programmer from the variations 
in timers as he/she executes or migrates a parallel application across 
multiple platforms. Logically, therefore, PTR forms an interface layer to
the underlying system clock. The implementation, however, will in most cases
make use of the direct assembly language instructions (see below) required
to read the appropriate clock in the least intrusive way possible, so there
is no real interface per se to the underlying runtime system.

4.   Plans for Involving Users

Users at National Center for Atmospheric Research and the San Diego
Supercomputing Center will be involved in rigorous testing of the library.

5.   Plans for Addressing the Needs of Multiple Platforms

To provide comprehensive support for timing parallel applications, we hope 
to implement the PTR on as many as possible of the following platforms:

     o	Convex (C-series, SPP)

     o	Cray (Y/MP, T3D)

     o	DEC (Mips/Alpha running either Ultrix or OSF)

     o	HP (700/800 workstations)

     o	Intel (iPSC/860 and Paragon)

     o	IBM (RS/6000 and PowerPC workstations, SP/2)

     o	MasPar

     o	Meiko (CS/2)

     o	SGI (Indy, PowerChallenge series)

     o	Sun (Sparc1-10, running either SunOS or Solaris)

     o	TMC (CM-5)

Architectural considerations constrain the accuracy of a timer, with clock
accuracy ranging from ~10 nanoseconds to >50 microseconds, depending on the
platform and the model. While the API specification makes such issues 
transparent to a user of the library, the development team must include
string vendor representatives if we are to arrive at the best implementation.

We address the issue by defining three levels at which vendors can support
development of the library:

        Level One:  A vendor provides the complete, C and Fortran-callable
	library that conforms to the syntactic and semantic description of
	the API.

        Level Two:  A vendor provides assembly level routines that conform to
	the semantics; project personnel will add the necessary wrappers to
	make them compatible with API syntax.

        Level Three:  A vendor provides assembly level code sequences, or 
	advice about what sequences would be best, to offer high accuracy 
	and low intrusiveness; such sequences need not conform precisely to
	the semantic specification. Project personnel will make the needed

Vendors may choose to support the project at any level, although implementation
priority will be given to those who provide higher level support. Regardless 
of the level of support, vendors will be asked to provide information about
the accuracy and overhead associated with the functions.

6.   Related Standards That Will Be Studied for Compatibility

The programmatic interface will be compatible with K&R C, ANSI C, Fortran 77
and Fortran 90.

The functionality of the routines will be examined for POSIX compliancy.
(IEEE Std 1003.4, Portable Operating System Interface [POSIX] Part 4: real
time : timers and clocks)

7.   Project Outcomes

Standard definition of the PTR's API, including syntactic and clearly
defined semantic specifications for the C and Fortran callable routines.

Documentation of the perturbation effects and resolutions of the timer

Parallel Tools Consortium Home Page