The Sixteen Rules of Thumb


    Rule of Thumb (1)
    If your application fits the model of perfect parallelism, the parallelization task is relatively straightforward and likely to achieve respectable performance.

    Rule of Thumb (2)

    If your application is an example of pipeline parallelism, you have to do more work; if you can't balance the computational intensity, it may not prove worthwhile.

    Rule of Thumb (3)

    If your application is fully synchronous, a significant amount of effort is required and payoff may be minimal; the decision to parallelize should be based on how uniform computational intensity is likely to be.

    Rule of Thumb (4)

    A loosely synchronous application is the most difficult to parallelize, and probably is not worthwhile unless the points of CPU interaction are very infrequent.

    Rule of Thumb (5)

    A perfectly parallel application will probably perform reasonably well on any MIMD architecture, but may be difficult to adapt to a SIMD multicomputer.

    Rule of Thumb (6)

    A pipeline style application will probably perform best on a shared-memory machine or clustered SMP (where a given stage fits on a single SMP), although it should be adaptable to a distributed-memory system as well, as long as the communication network is fast enough to pipe the data sets from one stage to the next.

    Rule of Thumb (7)

    A fully synchronous application will perform best on a SIMD multicomputer, if you can exploit array operations. If the computations are relatively independent, you might achieve respectable performance on a shared-memory system (or clustered SMP if a small number of CPUs is sufficient). Any other match is probably unrealistic.

    Rule of Thumb (8)

    A loosely synchronous application will perform best on a shared-memory system (or clustered SMP if a small number of CPUs is sufficient). If there are many computations between CPU interactions (see "Setting Realistic Expectations"), you can probably achieve good performance on a distributed-memory system as well.

    Rule of Thumb (9)

    With few exceptions, you don't pick the language; it picks you.

    Rule of Thumb (10)

    Timings measured on a baseline (serial) version of your application provide a solid starting point for estimating potential payoffs and reliability.

    Rule of Thumb (11)

    The debilitating impact of serial content on theoretical speedup means that you probably shouldn't consider parallelizing a program with less than 95% parallel content, unless you're already experienced in parallel programming, or unless you will be able to replace a significant portion of the serial version with parallel algorithms that have been proven to be good performers.

    Rule of Thumb (12)

    Apply your knowledge of the program to estimate how varying problem size will affect the theoretical speedup curve.

    Rule of Thumb (13)

    Theoretical speedup is only an upper bound on what is possible; the attained performance will almost certainly be much lower.

    Rule of Thumb (14)

    Although you can improve concurrency to some extent, it will largely be dependent on the application itself and the average load on the computer.

    Rule of Thumb (15)

    A coarse-grained program will perform relatively well on any parallel machine; a medium- or fine-grained one will probably be respectable only on a SIMD multicomputer.

    Rule of Thumb (16)

    To understand the granularity requirements of a distributed-memory computer, calculate its message-equivalent. To be worth parallelizing, your program probably needs to perform many thousands of floating-point operations between each CPU interaction point.


Copyright 1996, Cherri M. Pancake