
James Cownie:
What is the data type of the returned time values ?
Possibilities include:
Don Breazeal:
A related issue (I think I mentioned this before?) is how well optimized the
timer access is. On the one hand, many users just want double precision
seconds so they don't have to worry about anything. On the other hand,
other users want to have the lowest possible overhead, even if that means
converting some system-specific number of ticks to seconds in a
post-processing phase. I don't think there is a "right" answer to this
issue and would recommend implementing both interfaces.
Co-Processor Timers
Donna Bergmark:
vector timing -- Cornell found this to be very useful in the
past. Vector timers returned the time spent executing vector
instructions, because this was one of the things you wanted to
maximize. Perhaps the generalization of this is to provide
co-processor timings, where such co-processors exist for a
processor.
Global Timer
James Cownie:
What promises are we expecting (in an MP machine) about
a) Relative base of clocks on different nodes.
(i.e. if my clock is at midnight, is yours at midnight too ?)
b) Constancy of clock rate on different nodes. ("Drift").
(i.e. if my clock says one o'clock and yours says two o'clock,
am I guaranteed that when mine says two o'clock yours will say
three ?)
[The simplest and least useful thing to do is say nothing about
either of these properties, since implementing anything else is
HARD, particularly on a network of workstations !]
Don Breazeal:
Handling Compiler Optimizations
Rod Oldehoeft:
An issue that I've seen come up occasionally is compiler treatment
of calls to timers (and other counters as well). On high-performance
systems the aggressive nature of optimizers can cause lots of unusual
things to happen.
These things don't seem to come up much in compilers meant for workstations, or in young compiler software. It's not clear what a proposal for timing routines should say about this issue.
Rod Oldehoeft:
Other people tell me that Cray compilers treat calls to timing
routines as program points across which stuff cannot flow.
My local action has been to place the stuff to be measured into a function that is marked as not inlineable, and call it between two clock reads. The call overhead is negligible, but it's not as easy as dropping a couple of clock reads into any arbitrary section of code.
A language-based solution (begin-measure/end-measure) is clearly impractical.
Multiple Timers
James Cownie:
Is the proposal to have many separate timers, or one per
process(/thread) ? I get the feeling that many are being proposed,
from the phrase "since previous call to this timer".
Don Breazeal:
Another way of doing this would be to support only the "now()" call, and
let the user worry about separating threads and processes.
James Cownie:
I agree completely. I wasn't trying to propose a system with multiple
timers, merely trying to ascertain whether the current proposal was
saying that, or not.
Standards
What standards should we look at?
Rusty Lusk:
My only concern was that it be compatible with the very elaborate and detailed
POSIX standard for timers. Since this standard is for a C interface, one
could claim that a Fortran interface need not take any notice of them, but
I would like to see that they are aware of the POSIX work. There is also
a pair of routines in MPI out of which these routines could easily be
synthesized, but these were included in MPI precisely because there was no
portable timing library specification.
System time Vs. User time
How do we define CPU time semantics? What does it mean by user time and
System time?
James Cownie:
What is the semantics of CPU time (and more particularly of user
and system time).
Questions which must be resolved here include:
System CPU time = time spent in kernal. This includes time spent processing system calls, context switch time, job paging/swapping time, time spent processing interrupts. Notice the time spent processing system calls includes all CPU time required to process I/O calls. This does not include time spent blocked with an idle CPU waiting for I/O to complete.
For the CRAY C90, the system time is dominated by the sum of the system call time, the process switch time, and the job swap time.
The wall clock time on a dedicated system would equal the sum of the user CPU time, system CPU time, and idle time.
To evaluate load balancing, these times are needed separately for each node on the system. To track overall usage, the sum of these times is needed across all nodes.
The times are needed per thread, and for the entire process.
Don Breazeal:
I don't disagree with Reagan's definition or Jim's issues, but I think
that the reality is that Ptools is unable to dictate what any particular
target system defines as user or system time, or even CPU time for that
matter. The best that the interface can do is take what the system provides,
and offer a query function that allows the caller to determine what it is
getting.
James Cownie:
Neither do I, though I think that Reagan's definitions fit somewhat
uneasily with a per process (or per thread) measurement. They seem
more relevant to a per cpu measure of load, something which I don't
think this proposal is intended to address.
I absolutely agree that the only things we can provide are what is already provided by the underlying system. The question which it raises, though, is whether it therefore makes sense to specify a standard interface if all one is doing is specifying a standard syntax, without a standard semantics. (This is the Humpty Dumpty world view : "The question is who is to be the master, me or the word. When I use a word it means exactly what I want it to mean.")
While Don's suggestion that we offer a query function so that you can tell what you got is sensible, it seems to me very hard to capture the different possible sets of semantics. (e.g. on VMS it used to be the case (and may still be for all I know) that when a process is descheduled the cost of scheduling is charged to the user time of the process which is being descheduled. This has the amusing effect that raising a process priority reduces its apparent CPU usage !)
It was for reasons like this that MPI decided only to provide a wall clock timer. (We can at least all agree on what that is supposed to mean !)
To reiterate the danger here is that people will start to make comparisons between apples and oranges, because we're putting both in an identical paper bag, so they can't see inside. This will be particularly the case if this library is used in a parallel program running on a heterogeneous network.
Timers Incrementing in Virtual Thread Time
Should we provide timers incrementing in virtual thread time?
Other
Richard Frost:
My only concern is with the extension to heterogeneous systems. That is, I
want to time my program when it is running simultaneously on multiple
platforms. By heterogeneous I mean different parallel architecture
designs, not different Unix workstation vendors. Think about a MIMD,
VP, and Cluster system all coupled in computation space. To make it really
interesting, throw a high-performance graphics workstation into the
computation loop for computational steering.
I believe that this kind of functionality needs to be considered early on. Without some thoughtful planning it is not something that can be added later without a major re-write.
This document was last updated 20 May '95.
For further information, contact kennino@cs.orst.edu.