CS 475/575  Spring Quarter 2022
Project #4
Vectorized Array Multiplication/Reduction using SSE
60 Points
Due: May 11
This page was last updated: May 11, 2022
Introduction
There are many problems in scientific and engineering computing where you want to multiply
arrays of numbers together and add up all the multiplies to produce a single sum
(Fourier transformation, convolution, autocorrelation, etc.):
sum = ΣA[i]*B[i]
This project is to test array multiplication/reduction using SIMD and nonSIMD.
For the "control groups" benchmarks, do not use OpenMP parallel forloops.
Just use straight C/C++ forloops.
In this project, we are only using OpenMP for the timing.
Requirements

Use the supplied SIMD SSE assembly language code to run an array multiplication/reduction timing experiment.
Run the same experiment a second time using your own C/C++ array multiplication/reduction code.

Use different array sizes from 1K to 8M.
The choice of inbetween values is up to you,
but pick values that will make for a good graph.

Run each arraysize test a certain number of trials.
Use the peak value for the performance you record.

Create a table and a graph showing SSE/NonSSE speedup as a function of array size.
Speedup in this case will be (P = Performance, T = Elapsed Time):
S = Psse/Pnonsse = Tnonsse/Tsse

Note: this is not a multithreading assignment, so you don't need to worry about a NUMT.
Don't use any OpenMPisms except for getting the timing.

The Yaxis performance units in this case will be "SpeedUp", i.e., dimensionless.

Parallel Fraction doesn't apply to SIMD parallelism, so don't compute one.

Your commentary writeup (turned in as a separate PDF file) should tell:
 What machine you ran this on
 Show the table of performances for each array size and the corresponding speedups
 Show the graph of SIMD/nonSIMD speedup versus array size (either one graph with two curves, or two graphs each with one curve)
 What patterns are you seeing in the speedups?
 Are they consistent across a variety of array sizes?
 Why or why not, do you think?
SSE SIMD code:
Warning!
Do not use any optimization flags when compiling this code.
It jumbles up the use of the registers.
+5 points Extra Credit
Combine multithreading and SIMD in one test.
In this case, you will vary both the array size and the number of threads (NUMT).
Show your table of performances.
Produce a graph similar to the one on Slide #20 of the SIMD Vector notes, using your numbers.
Add a brief discussion of what your curves are showing and why you think it is working this way.
Grading:
Feature  Points


Array Multiply/Reduction performances and speedups  20

Array Multiply/Reduction speedup curve  20

Commentary  20

Extra Credit  +5

Potential Total  65

