The Sixteen Rules of Thumb
Rule of Thumb (1)
- If your application fits the model of perfect parallelism, the
parallelization task is relatively straightforward and likely to achieve
Rule of Thumb (2)
- If your application is an example of pipeline parallelism, you
have to do more work; if you can't balance the computational intensity, it
may not prove worthwhile.
Rule of Thumb (3)
- If your application is fully synchronous, a significant amount of
effort is required and payoff may be minimal; the decision to parallelize
should be based on how uniform computational intensity is likely to
Rule of Thumb (4)
- A loosely synchronous application is the most difficult to
parallelize, and probably is not worthwhile unless the points of CPU
interaction are very infrequent.
Rule of Thumb (5)
- A perfectly parallel application will probably perform
reasonably well on any MIMD architecture, but may be difficult to adapt
to a SIMD multicomputer.
Rule of Thumb (6)
- A pipeline style application will probably perform best on
a shared-memory machine or clustered SMP (where a given stage fits on a
single SMP), although it should be adaptable to a distributed-memory
system as well, as long as the communication network is fast enough to
pipe the data sets from one stage to the next.
Rule of Thumb (7)
- A fully synchronous application will perform best on a
SIMD multicomputer, if you can exploit array operations. If the
computations are relatively independent, you might achieve
respectable performance on a shared-memory system (or clustered SMP if a
small number of CPUs is sufficient). Any other match is probably
Rule of Thumb (8)
- A loosely synchronous application will perform best on a
shared-memory system (or clustered SMP if a small number of CPUs is
sufficient). If there are many computations between CPU interactions
(see "Setting Realistic Expectations"), you
can probably achieve good performance on a distributed-memory system as well.
Rule of Thumb (9)
- With few exceptions, you don't pick the language; it picks
Rule of Thumb (10)
- Timings measured on a baseline (serial) version of your
application provide a solid starting point for estimating potential
payoffs and reliability.
Rule of Thumb (11)
- The debilitating impact of serial content on theoretical
speedup means that you probably shouldn't consider parallelizing a program
with less than 95% parallel content, unless you're already experienced in
parallel programming, or unless you will be able to replace a significant
portion of the serial version with parallel algorithms that have been
proven to be good performers.
Rule of Thumb (12)
- Apply your knowledge of the program to estimate how varying
problem size will affect the theoretical speedup curve.
Rule of Thumb (13)
- Theoretical speedup is only an upper bound on what is possible;
the attained performance will almost certainly be much lower.
Rule of Thumb (14)
- Although you can improve concurrency to some extent, it will
largely be dependent on the application itself and the average load on the
Rule of Thumb (15)
- A coarse-grained program will perform relatively well on any
parallel machine; a medium- or fine-grained one will probably be
respectable only on a SIMD multicomputer.
Rule of Thumb (16)
- To understand the granularity requirements of a
distributed-memory computer, calculate its message-equivalent. To be
worth parallelizing, your program probably needs to perform many
thousands of floating-point operations between each CPU interaction
Copyright 1996, Cherri M. Pancake