CS 475/575 -- Spring Quarter 2024

Project #6

OpenCL Linear Regression

100 Points

Due: June 2


This page was last updated: May 22, 2024


Note:


Introduction

These days there is a mad rush to be able to analyze massive data sets. In this project, you will perform a linear regression on a 4M (x,y) pair dataset. The linear regression will produce the equation of the line that best fits the data points. The line will be of the form: y = Mx + B (just like in high school...). Your job will be to run multiple tests to characterize the performance, and to determine what the values of M and B are.

Some Help

Here is proj06.cpp, a skeleton C++ program.
Here is proj06.cl, a skeleton OpenCL kernel program.
As web browsers don't know what the .cl file type is, they will try to force you to save proj06.cl
If you just want to see what is in this file right now, click here.

Requirements:

Determining the M and B for the Line Equation

To complete the linear regression, you need to determine the optimal values of M and B from the line equation: y = M*x + B. To do this, you need to solve a two-equations-two-unknowns linear system. Fortunately, you don't need to figure out how to do this yourself. Call the Solve( ) function (it's in the skeleton code) like this:


float m, b;
Solve( Σx2, Σx, Σx, (float)DATASIZE, Σ(x*y), Σy,   &m, &b );

For those new to C/C++, the ampersand (&) means "address of". It gives a function a way to fill values that you want returned. When this function is done executing, m and b will have the computed values in them, ready for you to include in your report.

Developing this Project in Linux

You will need the following files:

  1. cl.h
  2. cl_platform.h

If you are on rabbit, compile, link, and run like this:


g++  -o  proj06  proj06.cpp  /usr/local/apps/cuda/10.1/lib64/libOpenCL.so.1  -lm  -fopenmp
./proj06

If you are on the DGX System, put these lines in a bash file:


#!/bin/bash
#SBATCH  -J  Proj06
#SBATCH  -A  cs475-575
#SBATCH  -p  classgputest
#SBATCH  --constraint=v100
#SBATCH  --gres=gpu:1
#SBATCH  -o  proj06.out
#SBATCH  -e  proj06.err
#SBATCH  --mail-type=BEGIN,END,FAIL
#SBATCH  --mail-user=joeparallel@oregonstate.edu

for s in 4096 16384 65536 262144 1048576 4194304
do
        for b in 8 32 64 128 256
        do
                g++ -DDATASIZE=$s -DLOCALSIZE=$b -o proj06 proj06.cpp /usr/local/apps/cuda/11.7/lib64/libOpenCL.so.1  -lm -fopenmp
                ./proj06
        done
done
and then submit that file using the sbatch slurm command.

Developing this Project in Visual Studio

Right-click on this link: VS2022.zip, and save it somewhere. Then, un-zip it, go to the VS2022 folder, and double-click on the VS06.sln file.

Getting the right platform and device number:

OpenCL is capable of running on multiple parts of the system you are on (CPU, GPUs, etc.). So, you must decide which one to use.

The skeleton code contains a function called SelectOpenclDevice( ) which selects what it thinks is the best Platform and Device to run your OpenCL code on. Call it from your main program. (That call is already in the skeleton code.) It will print out what it has decided to do. Feel free to comment out those print statements later so they don't interfere with producing an output file.

On rabbit, it produced:
I have selected Platform #0, Device #0: Vendor = NVIDIA, Type = CL_DEVICE_TYPE_GPU

Grading:

FeaturePoints
Performance table10
Graph of Performance versus DATASIZE25
Graph of Performance versus LOCALSIZE work-group size25
Determined the correct M and B values15
Commentary25
Potential Total100

Note: the graph of performance versus DATASIZE needs to have colored curves of constant LOCALSIZE
Note: the graph of performance versus LOCALSIZE size needs to have colored curves of constant DATASIZE