CS 475/575 Test #1 Review

CS 475/575 -- Spring Quarter 2019

Test #1 Review

On-campus Edition

This page was last updated: April 25, 2019

This will be a multiple choice test on a Scantron (fill-in-the-bubble) form.

From the OSU Test Scoring User Guide: "The scanner reads reflective optical marks and only a number 2 or softer (i.e., a smaller number) lead pencil should be used. Ball Point Pen and Marker type pencils should not be used."

I hardly ever use pencils anymore, so don't come up and ask to borrow one. I won't have any.

Don't try to second-guess me about patterns in the answers -- I have a program that randomly tells me what letter to put each right answer under.

On-campus test date and time:

Friday, May 3, 2019

1:00 - 1:50

LInC 210

Test rules:

The test is worth 100 points.
It is closed notes, closed Internet, closed friends.
You are responsible for
1. what is in all handouts
2. what was said in class
3. what you have done in the projects
The test is over promptly at 1:50. (Some people have to leave, and there is another class right behind ours.) Only the tests that have been turned in by then will be graded.

The test can potentially cover any of the following topics:

Project notes: drawing the performance graphs correctly.
You don't have to know the scripting.
Three reasons to study parallel programming. (1) make existing problems compute faster, (2) make larger problems compute in the same time, and (3) programming convenience.
Kinds of Parallelism: Instruction Level Parallelism, Thread Level Parallelism, Data Level Parallelism.
Single Program Multiple Data (SPMD).
Architecture: memory, control unit, arithmetic logic unit, accumulator, input, output, stack, registers, program counter, stack pointer.
Moore's Law. Transistor density. Clock speed. Power consumption. Heat dissipation.
Multicore, hyperthreading.
What is a thread?
Thread State: program counter, stack pointer, registers.
Individual threads' stacks.
Multicore without multithreading.
Multithreading without multicore.
Flynn's Taxonomy: SISD, SIMD, MIMD
Definitions:
atomic,
barrier,
chunksize,
coarse-grain parallelism,
deterministic,
dynamic scheduling
fine-grain parallelism,
fork-join,
private variable,
race condition,
reduction,
shared variable,
static scheduling,
thread safety.
Timing: Speedup (Sn = T1/Tn = Pn/P1),
Speedup Efficiency (Sn/n),
Parallel Fraction (Fp) and Sequential Fraction (Fs),
Amdahl's Law to compute S given Fp and n.
Scalability with number of cores,
Inverse Amdahl's law to compute Fp given S and n.
Using the parallel fraction to compute the maximum speedup possible,
Gustafson's Observation.
You do need to memorize Amdahl's Law.
You do not need to memorize the Inverse Amdahl's Law.
OpenMP:
Fork-join model, pragma, thread teams.
#pragma omp parallel xxx
The parts of the architecture that are shared among cores (heap, executable, globals).
The parts that aren't (stack, stack pointer, program counter, registers).
What's stored on the stack (local variables, return addresses).
For loops (canonical form).
Declaring shared vs. private variables.
Setting the scheduling (static vs. dynamic),
Setting the chunksize.
Declaring a variable inside the for-loop automatically makes it private.
Reduction, Atomic, Critical: what they are, who is fastest and why.
Problems: Thread safety, Race Conditions, Deadlock.
Coarse-grained vs. Fine-grained parallelism: what's the difference.
Synchronization: mutexes, barriers.
Sections: what they are, asking for them, the fact that the number of sections is static.
Tasks: what they are, the fact that the number of tasks is dynamic.
Cache:
L1, L2, L3.
Cache hits and misses
Coherence: spatial, temporal
Cache lines: what they are, how large they are (64 bytes), N-way set associative.
You don't need to know how to take a memory address and figure out what cache line it will end up in and what its offset will be.
Array-of-structures vs. Structure-of-arrays
Linked list cache strategy
Modified, Exclusive, Shared, Invalid (MESI) cache states
False sharing: what it is, why it happens, two ways of fixing it.
Functional Decomposition
Using sections
Why 3 (or 4) of them?
Using barriers
You don't need to know how WaitBarrier( ) works, just why it is there.
Data Decomposition
Compute-to-communicate ratio, area-to-perimeter ratio, volume-to-surface ratio.
Trade-offs of using large shared arrays versus private smaller arrays
Trade-off of using more threads (more computing power vs. smaller C:C ratio)
You don't need to know the heat transfer differential equation!
You are not responsible for any information in the rabbit.engr.oregonstate.edu notes.
Hint: You won't need a calculator.
Projects:
Project 0: Welcome to OpenMP
Project 1: OpenMP Monte Carlo
Project 2: OpenMP Volume integration
Because Project 3 won't be due yet, so there will be no questions about it