Portable Timing Routines ------------------------ Proposed by | Cherri Pancake (Oregon State) | Reagan Moore (San Diego Supercomputing Center) | Sally Haerer (National Center for Atmospheric Research) | Allan Porterfield (Tera Computer Corporation) | Wayne Smith (Intel Corporation) 1. User Requirement to Be Addressed The limited availability and the expense of MPP computer time has resulted in development scenarios where, - Code is developed on a workstation prior to uploading it to a parallel machine; - The application is benchmarked (in serial) on a workstation to acquire baseline results and timings; - The application is run in small-scale parallelism across a few workstations, or simulated parallelism via multiple processes on a single workstation, for testing and debugging purposes; or - Small-scale production runs are carried out on a cluster or a parallel machine with a very small number of processors, while major runs are executed on an MPP. >From the programmer's perspective, one of the most frustrating aspects of developing a parallel application on a given system, for execution on one or more other systems, is the difficulty of timing program execution. The functions available for obtaining timing information, referred to here as timers, vary widely from one computer to another. In many cases, the routines available at the source level (i.e., those not requiring knowledge of assembly language) have poor resolution, often on the order of 10-50 microseconds. The resolution varies significantly from one platform to another. Functionality, such as the ability to acquire data on user vs. system CPU usage, also varies. Many programmers report having to develop their own timing routines, or acquire them from acquaintances. At worst, the result can be unreliable values and/or serious perturbation of performance. At best, the user is forced to replicate the efforts of system developers (not to mention the efforts of other users), who clearly have the needed accurate, minimally intrusive timing routines. Our solution is to propose a uniform API (Application Programmer's Interface) for a library of high resolution timers supported across current MPP and workstation platforms. 2. Tool Functionality The Portable Timing Routines (PTR) will provide timers, callable from C and Fortran, with an API that is standard across MPP and workstation platforms. Comments from applications programmers and initial surveys indicate that users are interested in two basic categories of timers: timers that increment in real time(wall clock) and timers that increment in process virtual time(CPU time). Interest in CPU time is further focused on time spent in executing user code vs. system code. Consequently, these are the types of timers that will be provided through the PTR. Moreover, users report employing timers almost exclusively in situations where they make repeated calls to the timing routines, save the results, then use them to calculate "delta times" (i.e., the difference between subsequent timer values, representing the interval that elapsed between two points of interest). This usage is so prevalent that the PTR routines will also offer elapsed-time calls for each of the timer types. The accuracy of the timing values returned will be the highest feasible for the PTR implementation on the specific platform, and therefore may vary from one platform to another. Utility functions will be supplied for querying the accuracy of each timer type. Thus, the programmatic interface provided by the PTR will support the following functionality: | o Local timer incrementing in real time | | o Local timer incrementing in process virtual time | | o Initializing the timer to a particular value | | o Current value of the timer | | o Resetting the timer to a particular value | | o Elapsed time since the previous call to the timer | | o Query functions for determining the accuracy of the timer on this platform The library will make it possible for a parallel application program, running (concurrently or consecutively) across multiple platforms, to invoke timers in a relatively platform independent way -- where "relative" is a caveat reflecting the fact that the availability of certain time values and their resolution may vary substantially. | This proposal limits itself to providing local timers and in defining CPU | time semantics. The need for global timers, multiple timers, and timers | incrementing in virtual thread time has been established in working group | discussions. They will be considered in detail in a separate proposal. 3. Interface to Other Systems on Which the Tool Depends The purpose of the PTR is to isolate the programmer from the variations in timers as he/she executes or migrates a parallel application across multiple platforms. Logically, therefore, PTR forms an interface layer to the underlying system clock. The implementation, however, will in most cases make use of the direct assembly language instructions (see below) required to read the appropriate clock in the least intrusive way possible, so there is no real interface per se to the underlying runtime system. 4. Plans for Involving Users Users at National Center for Atmospheric Research and the San Diego Supercomputing Center will be involved in rigorous testing of the library. 5. Plans for Addressing the Needs of Multiple Platforms To provide comprehensive support for timing parallel applications, we hope to implement the PTR on as many as possible of the following platforms: o Convex (C-series, SPP) o Cray (Y/MP, T3D) o DEC (Mips/Alpha running either Ultrix or OSF) o HP (700/800 workstations) o Intel (iPSC/860 and Paragon) o IBM (RS/6000 and PowerPC workstations, SP/2) o MasPar o Meiko (CS/2) o SGI (Indy, PowerChallenge series) o Sun (Sparc1-10, running either SunOS or Solaris) o TMC (CM-5) Architectural considerations constrain the accuracy of a timer, with clock accuracy ranging from ~10 nanoseconds to >50 microseconds, depending on the platform and the model. While the API specification makes such issues transparent to a user of the library, the development team must include string vendor representatives if we are to arrive at the best implementation. We address the issue by defining three levels at which vendors can support development of the library: Level One: A vendor provides the complete, C and Fortran-callable library that conforms to the syntactic and semantic description of the API. Level Two: A vendor provides assembly level routines that conform to the semantics; project personnel will add the necessary wrappers to make them compatible with API syntax. Level Three: A vendor provides assembly level code sequences, or advice about what sequences would be best, to offer high accuracy and low intrusiveness; such sequences need not conform precisely to the semantic specification. Project personnel will make the needed changes. Vendors may choose to support the project at any level, although implementation priority will be given to those who provide higher level support. Regardless of the level of support, vendors will be asked to provide information about the accuracy and overhead associated with the functions. 6. Related Standards That Will Be Studied for Compatibility The programmatic interface will be compatible with K&R C, ANSI C, Fortran 77 and Fortran 90. The functionality of the routines will be examined for POSIX compliancy. (IEEE Std 1003.4, Portable Operating System Interface [POSIX] Part 4: real time : timers and clocks) 7. Project Outcomes Standard definition of the PTR's API, including syntactic and clearly defined semantic specifications for the C and Fortran callable routines. Documentation of the perturbation effects and resolutions of the timer routines.