Email Discussions Prior to 23 May '95

Application Programmer's Interface

Donna Bergmark:
Times should always be returned as two 64-bit integers (or, two big words, I don't care): one for the seconds and one for the nanoseconds. This will allow really long jobs to run for weeks without overflowing the seconds timer, yet allow really fine-tuning with the nanoseconds timer.

James Cownie:
What is the data type of the returned time values ? Possibilities include:

[Double may be OK, but one must be careful about adding lots of small quantities together, e.g. when an elapsed time exceeds around 16,777,216 adding 1.e-9 too it has no effect (in IEEE). Since this is around 194 days, this is probably OK. When we need pS timers, then it becomes more dangerous (a day and a half)].

Don Breazeal:
A related issue (I think I mentioned this before?) is how well optimized the timer access is. On the one hand, many users just want double precision seconds so they don't have to worry about anything. On the other hand, other users want to have the lowest possible overhead, even if that means converting some system-specific number of ticks to seconds in a post-processing phase. I don't think there is a "right" answer to this issue and would recommend implementing both interfaces.

Co-Processor Timers

Donna Bergmark:
vector timing -- Cornell found this to be very useful in the past. Vector timers returned the time spent executing vector instructions, because this was one of the things you wanted to maximize. Perhaps the generalization of this is to provide co-processor timings, where such co-processors exist for a processor.

Global Timer

James Cownie:
What promises are we expecting (in an MP machine) about
   a) Relative base of clocks on different nodes.
      (i.e. if my clock is at midnight, is yours at midnight too ?)
   b) Constancy of clock rate on different nodes. ("Drift").
      (i.e. if my clock says one o'clock and yours says two o'clock,
       am I guaranteed that when mine says two o'clock yours will say
       three ?)	
   [The simplest and least useful thing to do is say nothing about
   either of these properties, since implementing anything else is
   HARD, particularly on a network of workstations !]
Don Breazeal:
People have certainly done this type of thing before, and if this library doesn't provide it people will continue to have to do it. It would be a great thing to have an implementation of a global timer in this library. Maybe as a "phase II" implementation? James Cownie:
Don suggests that we do this as an add on. I agree. I wouldn't want this to be in the base definition. (Mostly because it's HARD, and will require some form of communication)

Handling Compiler Optimizations

Rod Oldehoeft:
An issue that I've seen come up occasionally is compiler treatment of calls to timers (and other counters as well). On high-performance systems the aggressive nature of optimizers can cause lots of unusual things to happen.

On one hand, disallowing optimization near clock calls can severely degrade the performance that one wants to measure. On the other hand, the above phenomena can result in more widely varying measures of performance (across systems ) than the application programmer desires.

These things don't seem to come up much in compilers meant for workstations, or in young compiler software. It's not clear what a proposal for timing routines should say about this issue.

Rod Oldehoeft:
Other people tell me that Cray compilers treat calls to timing routines as program points across which stuff cannot flow.

My local action has been to place the stuff to be measured into a function that is marked as not inlineable, and call it between two clock reads. The call overhead is negligible, but it's not as easy as dropping a couple of clock reads into any arbitrary section of code.

A language-based solution (begin-measure/end-measure) is clearly impractical.

Multiple Timers

James Cownie:
Is the proposal to have many separate timers, or one per process(/thread) ? I get the feeling that many are being proposed, from the phrase "since previous call to this timer".

Don Breazeal:
Another way of doing this would be to support only the "now()" call, and let the user worry about separating threads and processes.

James Cownie:
I agree completely. I wasn't trying to propose a system with multiple timers, merely trying to ascertain whether the current proposal was saying that, or not.


What standards should we look at?

Rusty Lusk:
My only concern was that it be compatible with the very elaborate and detailed POSIX standard for timers. Since this standard is for a C interface, one could claim that a Fortran interface need not take any notice of them, but I would like to see that they are aware of the POSIX work. There is also a pair of routines in MPI out of which these routines could easily be synthesized, but these were included in MPI precisely because there was no portable timing library specification.

System time Vs. User time

How do we define CPU time semantics? What does it mean by user time and System time?

James Cownie:
What is the semantics of CPU time (and more particularly of user and system time). Questions which must be resolved here include:

Reagan Moore:
I would like to elaborate on Jim's proposal and suggest the following interpretation for system CPU time:

System CPU time = time spent in kernal. This includes time spent processing system calls, context switch time, job paging/swapping time, time spent processing interrupts. Notice the time spent processing system calls includes all CPU time required to process I/O calls. This does not include time spent blocked with an idle CPU waiting for I/O to complete.

For the CRAY C90, the system time is dominated by the sum of the system call time, the process switch time, and the job swap time.

The wall clock time on a dedicated system would equal the sum of the user CPU time, system CPU time, and idle time.

To evaluate load balancing, these times are needed separately for each node on the system. To track overall usage, the sum of these times is needed across all nodes.

The times are needed per thread, and for the entire process.

Don Breazeal:
I don't disagree with Reagan's definition or Jim's issues, but I think that the reality is that Ptools is unable to dictate what any particular target system defines as user or system time, or even CPU time for that matter. The best that the interface can do is take what the system provides, and offer a query function that allows the caller to determine what it is getting.

James Cownie:
Neither do I, though I think that Reagan's definitions fit somewhat uneasily with a per process (or per thread) measurement. They seem more relevant to a per cpu measure of load, something which I don't think this proposal is intended to address.

I absolutely agree that the only things we can provide are what is already provided by the underlying system. The question which it raises, though, is whether it therefore makes sense to specify a standard interface if all one is doing is specifying a standard syntax, without a standard semantics. (This is the Humpty Dumpty world view : "The question is who is to be the master, me or the word. When I use a word it means exactly what I want it to mean.")

While Don's suggestion that we offer a query function so that you can tell what you got is sensible, it seems to me very hard to capture the different possible sets of semantics. (e.g. on VMS it used to be the case (and may still be for all I know) that when a process is descheduled the cost of scheduling is charged to the user time of the process which is being descheduled. This has the amusing effect that raising a process priority reduces its apparent CPU usage !)

It was for reasons like this that MPI decided only to provide a wall clock timer. (We can at least all agree on what that is supposed to mean !)

To reiterate the danger here is that people will start to make comparisons between apples and oranges, because we're putting both in an identical paper bag, so they can't see inside. This will be particularly the case if this library is used in a parallel program running on a heterogeneous network.

Timers Incrementing in Virtual Thread Time

Should we provide timers incrementing in virtual thread time?


Richard Frost:
My only concern is with the extension to heterogeneous systems. That is, I want to time my program when it is running simultaneously on multiple platforms. By heterogeneous I mean different parallel architecture designs, not different Unix workstation vendors. Think about a MIMD, VP, and Cluster system all coupled in computation space. To make it really interesting, throw a high-performance graphics workstation into the computation loop for computational steering.

I believe that this kind of functionality needs to be considered early on. Without some thoughtful planning it is not something that can be added later without a major re-write.

Portable Timing Routines home page
Parallel Tools at OSU home page
Parallel Tools Consortium home page

This document was last updated 20 May '95.

For further information, contact