Parallel Tools Consortium Projects

Portable Timing Routines


When it's time to tune the performance of a parallel (or serial!) application, the first order of business is to measure the amount of time spent executing key portions of the source code. Timing data can be used to compare the relative performance of alternative algorithms, data distributions, message sizes, and so forth. Timing data also make it possible to measure improvement against some earlier version of the application.

Although all parallel computers and workstations support some type of timer functions, they vary in terms of resolution (how fine a measurement is reported) and intrusiveness (how long it takes to acquire the measurement). The only timing routines currently supported across a range of computers -- the UNIX time-of-day functions -- are both coarse and highly intrusive.

There are also machine-specific routines, but their names and arguments vary widely and in some cases they are undocumented. Many users report having to "roll their own" timers, because the UNIX routines are not accurate enough and because they simply do not know how to access the routines provided by the operating system.

The Ptools Portable Timing Routines project was formed in response to this need. Its goal is a timer library that can be called in the same way from applications running on any parallel computer or workstation. The library routines will be implemented differently on each machine, however, so that they can take advantage of the finest resolution, least intrusive timers available.



What Portable Timing Routines Do

The Portable Timing Routines (PTR) library provides a Common Application Programming Interface (API) for accessing timers that are machine-dependent. (Since the library code itself varies from one machine to another, "universal" is really a more appropriate term than "portable".) All PTR routines are callable from within Fortran and C programs.

The PTR library provides access to the least intrusive timer mechanisms currently available on each machine. Routines are available to retrieve the value of the wallclock timer (measuring "real" time in some fraction of seconds), user CPU timer (measuring the CPU time attributed to the user's program), and system CPU timer (CPU time used for system activities). If the machine does not support one or more of the times, the PTR routine returns an appropriate status code.

PTR routines do not provide a timestamp facility for recording precisely when an event occurred. Rather, they are used to determine the amount of time that elapses between any two events in execution. Inserting a call to the PTR wallclock timer immediately before and after a loop, for example, measures the length of time occupied by execution of that loop.

Since each timer reflects the values for a single processing element (the one executing the PTR routine), these timers are actually serial rather than parallel. However, it is possible to derive reasonable estimates of parallel time using the values returned by the routines.

Also, since the library implementation is machine-dependent, timer characteristics (resolution, intrusiveness, and accuracy) will vary from one machine to another. They are not intended to provide standardized timing measures, nor are they intended to measure times for comparison across machines.

How Portable Timing Routines Work

PTR routines are intended to measure time intervals -- that is, the amount of time that elapses between two points in execution. Calls to the routines for a given type of timer (wallclock, user CPU, or system CPU) are used in pairs, bracketing the code to be measured. Figure 1 shows how the timers would be used to measure the overall time used in executing an application.
      INCLUDE "ftimer.h"
      INTEGER rc
      DOUBLE PRECISION wall_time1, wall_time2, wall_interval
      DOUBLE PRECISION usr_time1, usr_time2, usr_interval

C  Initialize Timers
      CALL PTR_INIT_WALL_TIMER(PTR_WALL_NOM_TICK, 0, rc)
      CALL PTR_INIT_USR_TIMER(PTR_USR_NOM_TICK, 0, rc)
C     ... other code ...

C  Get time before loop
      CALL PTR_GET_WALL_TIME (wall_time1)
      CALL PTR_GET_USR_TIME (usr_time1)

      DO 10 I=1,10
C     ... other code ...      
   10 CONTINUE

C  Get time after loop   
      CALL PTR_GET_WALL_TIME (wall_time2)
      CALL PTR_GET_USR_TIME (usr_time2)
C     ... other code ...

C  Get the time interval
      CALL PTR_GET_WALL_INTERVAL 
     *          (wall_time1, wall_time2, wall_interval, rc)
      CALL PTR_GET_USR_INTERVAL
     *          (usr_time1, usr_time2, usr_interval, rc)
      
      PRINT *,"TOTAL ELAPSED WALLCLOCK TIME ", wall_interval
      PRINT *,"TOTAL ELAPSED USER CPU TIME ", usr_interval
      
      END

Figure 1. Example of calls to wallclock and user CPU timers


Unlike the UNIX time routines, PTR timers do not return a value representing a particular time unit, like seconds. Remember that the point of the timers is to be as unintrusive as possible, so that the times reported can be as accurate as possible. Reporting times in some meaningful unit would require that the timer register value be converted, a process that is simply too intrusive. Therefore, the PTR routines return opaque data values. They are meaningless in themselves (strings of bits representing some machine-specific "tick rate"), but they are fast to retrieve.

Figure 2 illustrates how the opaque values are converted into a unit that makes sense to the programmer. At some point in execution when intrusiveness is no longer an issue -- typically at the end of the program -- the values which were acquired at the start/end of each interval are supplied as arguments to PTR's interval calculation routines. The interval is calculated using the opaque timer values, then converted to a floating point value representing seconds. It is this value that can be printed or used by the program as the basis for statistical analyses of performance.



Figure 2. Relationship between opaque timer values and elapsed seconds


How You Can Participate

The PTR library implementations for each machine will be made available in the form of source code (C, Fortran, and in some cases, assembly language) royalty-free. The assistance of both users and computer vendors is needed to complete this project in a timely fashion.

If you are a potential PTR user, we need your help in reviewing the library API and testing library routines on various target machines. PTR routines have been targeted for all major workstations and parallel machines (see the Web pages for the current list of targets). Users whose needs are not met by these implementations are encouraged to contact the working group with their suggestions.

If you work for a workstation or parallel computer vendor, we need your help in providing the users with the most efficient timing mechanisms for your company's platform(s). Ptools is willing to make your routines or mechanisms available "anonymously" to the general community, if your company prefers not to assume responsibility for the accuracy or longevity of the mechanism.

Current Status

The Fortran calling interface for PTR has been defined. Implementations of the library are being completed incrementally (see the Web pages for an up-to-date list).

The C calling interface is in progress, and initial library implementations should be available soon. The precise status of these should be verified through the Web pages.

Implementation will continue on an ongoing basis, as new machines and new timer support become available.



For More Information

Visit the PTR Web pages at http://www.nero.net/~pancake/ptools/ptr. These provide the most up-to-date information on the PTR project.

The PTR working group is open to all interested participants. The email reflector for working group discussions is ptools-ptr@ptools.org. To subscribe or unsubscribe to the list, send one of the following lines to majordomo@ptools.org:



The Parallel Tools Consortium, ptools@ptools.org
Web pages at http://www.ptools.org