What is the data type of the returned time values ? Possibilities include:
A related issue (I think I mentioned this before?) is how well optimized the timer access is. On the one hand, many users just want double precision seconds so they don't have to worry about anything. On the other hand, other users want to have the lowest possible overhead, even if that means converting some system-specific number of ticks to seconds in a post-processing phase. I don't think there is a "right" answer to this issue and would recommend implementing both interfaces.
vector timing -- Cornell found this to be very useful in the past. Vector timers returned the time spent executing vector instructions, because this was one of the things you wanted to maximize. Perhaps the generalization of this is to provide co-processor timings, where such co-processors exist for a processor.
What promises are we expecting (in an MP machine) about a) Relative base of clocks on different nodes. (i.e. if my clock is at midnight, is yours at midnight too ?) b) Constancy of clock rate on different nodes. ("Drift"). (i.e. if my clock says one o'clock and yours says two o'clock, am I guaranteed that when mine says two o'clock yours will say three ?) [The simplest and least useful thing to do is say nothing about either of these properties, since implementing anything else is HARD, particularly on a network of workstations !]Don Breazeal:
Handling Compiler Optimizations
An issue that I've seen come up occasionally is compiler treatment of calls to timers (and other counters as well). On high-performance systems the aggressive nature of optimizers can cause lots of unusual things to happen.
These things don't seem to come up much in compilers meant for workstations, or in young compiler software. It's not clear what a proposal for timing routines should say about this issue.
Other people tell me that Cray compilers treat calls to timing routines as program points across which stuff cannot flow.
My local action has been to place the stuff to be measured into a function that is marked as not inlineable, and call it between two clock reads. The call overhead is negligible, but it's not as easy as dropping a couple of clock reads into any arbitrary section of code.
A language-based solution (begin-measure/end-measure) is clearly impractical.
Is the proposal to have many separate timers, or one per process(/thread) ? I get the feeling that many are being proposed, from the phrase "since previous call to this timer".
Another way of doing this would be to support only the "now()" call, and let the user worry about separating threads and processes.
I agree completely. I wasn't trying to propose a system with multiple timers, merely trying to ascertain whether the current proposal was saying that, or not.
What standards should we look at?
My only concern was that it be compatible with the very elaborate and detailed POSIX standard for timers. Since this standard is for a C interface, one could claim that a Fortran interface need not take any notice of them, but I would like to see that they are aware of the POSIX work. There is also a pair of routines in MPI out of which these routines could easily be synthesized, but these were included in MPI precisely because there was no portable timing library specification.
System time Vs. User time
How do we define CPU time semantics? What does it mean by user time and
What is the semantics of CPU time (and more particularly of user and system time). Questions which must be resolved here include:
System CPU time = time spent in kernal. This includes time spent processing system calls, context switch time, job paging/swapping time, time spent processing interrupts. Notice the time spent processing system calls includes all CPU time required to process I/O calls. This does not include time spent blocked with an idle CPU waiting for I/O to complete.
For the CRAY C90, the system time is dominated by the sum of the system call time, the process switch time, and the job swap time.
The wall clock time on a dedicated system would equal the sum of the user CPU time, system CPU time, and idle time.
To evaluate load balancing, these times are needed separately for each node on the system. To track overall usage, the sum of these times is needed across all nodes.
The times are needed per thread, and for the entire process.
I don't disagree with Reagan's definition or Jim's issues, but I think that the reality is that Ptools is unable to dictate what any particular target system defines as user or system time, or even CPU time for that matter. The best that the interface can do is take what the system provides, and offer a query function that allows the caller to determine what it is getting.
Neither do I, though I think that Reagan's definitions fit somewhat uneasily with a per process (or per thread) measurement. They seem more relevant to a per cpu measure of load, something which I don't think this proposal is intended to address.
I absolutely agree that the only things we can provide are what is already provided by the underlying system. The question which it raises, though, is whether it therefore makes sense to specify a standard interface if all one is doing is specifying a standard syntax, without a standard semantics. (This is the Humpty Dumpty world view : "The question is who is to be the master, me or the word. When I use a word it means exactly what I want it to mean.")
While Don's suggestion that we offer a query function so that you can tell what you got is sensible, it seems to me very hard to capture the different possible sets of semantics. (e.g. on VMS it used to be the case (and may still be for all I know) that when a process is descheduled the cost of scheduling is charged to the user time of the process which is being descheduled. This has the amusing effect that raising a process priority reduces its apparent CPU usage !)
It was for reasons like this that MPI decided only to provide a wall clock timer. (We can at least all agree on what that is supposed to mean !)
To reiterate the danger here is that people will start to make comparisons between apples and oranges, because we're putting both in an identical paper bag, so they can't see inside. This will be particularly the case if this library is used in a parallel program running on a heterogeneous network.
Timers Incrementing in Virtual Thread Time
Should we provide timers incrementing in virtual thread time?
My only concern is with the extension to heterogeneous systems. That is, I want to time my program when it is running simultaneously on multiple platforms. By heterogeneous I mean different parallel architecture designs, not different Unix workstation vendors. Think about a MIMD, VP, and Cluster system all coupled in computation space. To make it really interesting, throw a high-performance graphics workstation into the computation loop for computational steering.
I believe that this kind of functionality needs to be considered early on. Without some thoughtful planning it is not something that can be added later without a major re-write.
This document was last updated 20 May '95.
For further information, contact firstname.lastname@example.org.