CS 475/575 -- Spring Quarter 2024
Project #4
Vectorized Array Multiplication and Multiplication/Reduction using SSE
60 Points
Due: May 15
This page was last updated: April 4, 2024
Introduction
There are many problems in scientific and engineering computing when you want to multiply
arrays of numbers together: C[i] = A[i] * B[i], or when you want to multiply arrays of numbers together
and add up all the multiplies to produce a single sum
(Fourier transformation, convolution, autocorrelation, etc.):
sum = ΣA[i]*B[i]
This project is to test array multiplication and multiplication/reduction using usual C/C++ programming and SIMD.
For the "control groups" benchmarks, do not use OpenMP parallel for-loops.
Just use straight C/C++ for-loops.
In the non-extra-credit part of this project, we will only use OpenMP for the timing.
Requirements
-
Use the supplied SIMD SSE assembly language code to run an array multiplication timing experiment.
Run the same experiment a second time using your own C/C++ array multiplication code.
-
Use the supplied SIMD SSE assembly language code to run an array multiplication/reduction timing experiment.
Run the same experiment a second time using your own C/C++ array multiplication/reduction code.
-
Use different array sizes from 1K to 8M.
The choice of in-between values is up to you,
but pick enough values that will make for a good graph.
-
Run each array-size test a certain number of tries.
Use the peak value for the performance that you record.
-
We will not be graphing performance (e.g., megamults/sec) -- we will be graphing SpeedUp.
-
For the C[i] = A[i] * B[i] experiment, create a table and a graph showing SSE/Non-SSE speed-up as a function of array size.
Speedup in this case will be (P = Performance, T = Elapsed Time):
S = Psse/Pnon-sse = Tnon-sse/Tsse
-
For the sum = ΣA[i]*B[i] experiment, create a table and a graph showing SSE/Non-SSE speed-up as a function of array size.
Speedup in this case will be (P = Performance, T = Elapsed Time):
S = Psse/Pnon-sse = Tnon-sse/Tsse
-
This would normally be 2 graphs with one curve each.
If you want, you can also do this as one graph with 2 curves.
But, somehow, you need to end up graphing two curves.
-
Note: this is not a multithreading assignment, so you don't need to worry about a NUMT.
Don't use any OpenMP-isms except for getting the timing.
-
The Y-axis performance units in this case will be "Speed-Up", i.e., dimensionless.
Don't use any units that involve xxx/second.
-
Parallel Fraction doesn't apply to SIMD parallelism, so don't compute one.
-
Your commentary write-up (turned in as a PDF file) should tell us:
- What machine you ran this on
- Show the 2 tables of performances for each array size and the corresponding speedups
- Show the graphs (or graph) of SIMD/non-SIMD speedup versus array size (either one graph with two curves, or two graphs each with one curve)
- What patterns are you seeing in the speedups?
- Are they consistent across a variety of array sizes?
- Why or why not, do you think?
SSE SIMD code:
-
Find starter code in the file:
all04.cpp.
-
Note that you are linking in the OpenMP library only because we are using it for timing.
-
Because this code uses assembly language, this code is not portable.
I know for sure it works on flip, using gcc/g++ 11.4.1.
I know for sure it works on rabbit, using gcc/g++ 4.8.5. (Not sure why this is different.)
It will not work in Visual Studio.
You are welcome to try it other places, but there are no guarantees.
-
You can run the tests one-at-a-time, or you can script them
by making the array size a #define that you set from outside
the program.
Warning!
Do not use any optimization flags when compiling this code.
It jumbles up the use of the registers.
Extra Credit: Combining SIMD with OpenMP
Combine multithreading and SIMD in a single test.
In this case, you will vary both the array size and the number of threads (NUMT).
Show your table of performances.
Produce a graph similar to the one on Slide #21 of the SIMD Vector notes, using your numbers.
Add a brief discussion of what your curves are showing and why you think it is working this way.
Grading:
Feature | Points
|
---|
Table of Array Multiply performances and speedups | 10
|
Graph of Array Multiply speedupe | 10
|
Table of Array Multiply/Reduction performances and speedups | 10
|
Graph of Array Multiply/Reduction speedup curve | 10
|
Commentary | 20
|
Extra Credit | +5
|
Potential Total | 65
|
---|