CS 475/575 -- Spring Quarter 2025
Test #2 Review
This page was last updated: May 20, 2025
Test date and time range:
Test #2 will open in Finals Week on Wednesday, June 11 at 12:01 AM PDT (one minute after midnight).
It will close on Saturday, June 14, at 11:59 PM PDT (one minute before midnight).
This gives you 4 days minus 2 minutes in which to take a 1-hour test.
Test Information:
-
This is Test #2, not a comprehensive final.
-
It is a multiple choice test cast as a Canvas "Quiz".
-
There will be 40 questions, worth 2.5 points each.
-
You will have 60 minutes to complete it.
Once you start, you must finish.
Canvas does not allow you to pause, leave, then come back and resume.
-
The test is open notes and closed friends.
Warning! "Open Notes" is not the same as "I don't need to study for it"!
You will run out of time if you have to look up in the notes every one of the questions.
-
Clearly, I cannot stop you from accessing information on the Internet.
However, the test has been written against our class notes.
If you miss a particular question, any protest of the form "But somethingsomething.com said that..." will be ignored.
-
You are responsible for
- what is in the handouts
- what was said in class and the videos, including the Live Lectures
- what was covered on the quizzes
- what you have done in the projects
Grade Cutoffs
Reminder: we do not use Canvas percentages for final grades.
Go to Canvas and add up your points.
That will show you where you fall in the grade cutoff table.
Ignore whatever percentage Canvas gives you.
The cutoffs are:
Points |
Grade |
1060 |
A |
1040 |
A- |
1020 |
B+ |
1000 |
B |
980 |
B- |
960 |
C+ |
940 |
C |
920 |
C- |
900 |
D+ |
880 |
D |
860 |
D- |
Reminder: we do not use Canvas percentages for final grades.
Go to Canvas and add up your points.
That will show you where you fall in the grade cutoff table.
Ignore whatever percentage Canvas gives you.
The test can potentially cover any of the following:
Class Topics:
-
GPU 101:
GPU performance vs. CPU performance and why,
"CUDA Cores" vs. "Intel cores",
What GPUs are good at, what GPUs are not good at
Streaming Multiprocessors (SM) / Compute Units (CU),
CUDA Cores (CC) / Processing Elements (PE),
the ubiquitous Yellow Robot.
-
CUDA:
general idea,
two programs together in the same file,
the nvcc compiler,
relationship between nvcc and gcc/g++ and Visual Studio,
executing the kernel,
"chevrons",
the GPU consists of a grid of blocks,
each block contains a grid of threads,
1D or 2D,
thousands of lightweight threads,
threads per block,
number of threads in a "Warp" (32 on rabbit and the DGX, 128 on some newer hardware),
built-in variables: gridDim, blockIdx, blockDim, threadIdx
types of memory (and who can share them),
steps in creating and running a CUDA program,
host (CPU) memory vs. device (GPU) memory,
transferring buffers to/from the GPU,
cudaMalloc( ),
cudaMemcpy( ),
why "sizeof(dhits)" doesn't work but "sizeof(hhits)" and "NUMTRIALS*sizeof(float)" do,
performance.
[ You won't need to be able to reproduce exact function syntax. ]
-
DGX system:
what it is,
what is slurm used for?
using sbatch
-
CUDA ↔ OpenCL Transition:
relationship between the CUDA and OpenCL compilers and gcc/g++ and Visual Studio,
the four things that both CUDA and OpenCL must do:
- Allocate data space in GPU memory
- Transfer data from the CPU to the GPU
- Execute a kernel to compute on that data
- Transfer data back from the GPU to the CPU
-
OpenCL:
general idea,
two programs, each in a separate file (.cpp and .cl),
the command queue
work-groups,
work-items,
1D or 2D or 3D,
thousands of lightweight threads,
threads per block,
get_global_id( ),
get_local_id( ),
SIMD parallelism (float2, float4, float 8, float16),
types of memory (and who can share them),
steps in creating and running an OpenCL program,
host (CPU) memory vs. device (GPU) memory,
command queue,
compiling and building .cl code,
where the OpenCL compiler lives (in the OpenCL driver),
enqueuing,
executing a kernel,
transferring buffers to/from the GPU,
clCreateBuffer, glEnqueueWriteBuffer, clEnqueueNDRangeKernel, clEnqueueReadBuffer,
performance.
[ You won't need to be able to reproduce exact function syntax. ]
-
OpenCL Events:
throwing events
waiting for one or more events
creating a kernel-execution graph structure
[ You won't need to be able to reproduce exact function syntax. ]
-
OpenCL Assembly Language:
Difference between sqrt( ), distance( ), length( ), normalize( ) and
fast_sqrt( ), fast_distance( ), fast_length( ), fast_normalize( ).
which you should use when,
registers,
fused multiply-add (FMA).
-
GPU Reduction:
general idea,
workgroup-shared memory array,
mask, offset,
barriers and why they are needed.
-
OpenCL / OpenGL Interoperability:
general idea,
OpenGL creates a vertex buffer,
in this case, the Vertex Buffer is a table of positions and colors,
OpenCL acquires the buffer,
clCreateFromGLBuffer,
[ You are not responsible for any of the code. ]
-
Message Passing Interface (MPI):
general idea,
multiple computers networked together,
single-program-multiple-data (SPMD) programming model,
how many processors are being used on this job: MPI_Comm_size( ),
which one I am: MPI_Comm_rank( ),
broadcast: MPI_Bcast( ),
sending: MPI_Send( ),
receiving: MPI_Recv( ),
the number of items sent should be the same as the number received,
scatter / gather,
reduction,
barriers,
derived types.
[ You won't need to be able to reproduce the exact syntax of any of the function calls. ]
-
Compute-to-communicate ratio:
what it is,
why it is good to make it bigger (more computing accomplished before a communication is needed),
why it is good to make it smaller (bring more simultaneous compute power to bear),
can result in a non-obvious "sweet spot"
N:2, N:4, N:6
- Combined Parallelism
The four different types of parallelism you could bring to bear on a specific problem at the same time.
-
More Information.
Projects:
- Project #3: Parallelizing the K-Means algorithm
- Project #4: SIMD Array Multiplication and Summing: advantage of SIMD
- Project #5: CUDA -- Monte Carlo Simulation: performance characteristics
- Project #6: OpenCL -- Quadratic Regression: performance characteristics
- Project #7: MPI -- Fourier Analysis: performance characteristics
[ You don't need to know the details of the K-means or Fourier Analysis algorithms ]
Hints:
-
Hint #1: You won't have to write any code.
-
Hint #2: I might give you code and ask what it does.
-
Hint #3: I might give you code and ask what is wrong with it and how to fix it.
-
Hint #4: Any arithmetic on the test will be things that you can do in your head,
but you can have a calculator handy if you want.