CS534 Homework 2 Due Monday April 11

In this assignment, you will use the WEKA system to analyze two artificial data sets and one real data set. You will apply three learning algorithms to each data set and compare their performance.


Obtaining Weka

You can obtain WEKA by visiting the WEKA Project Webpage and clicking on the appropriate link for your operating system. Alternatively, if you are on one of the CS systems, you can access WEKA by connecting to /usr/local/classes/eecs/spring2005/cs534/weka and executing the command run-weka or run-weka.bat. I have verified that this works from COE windows machines (under drive W).


Using Weka

These instructions will describe how to apply the learning algorithms to the BR data set. The others can be processed in exactly the same way, of course. When you start up Weka, you will first see the WEKA GUI Chooser, which has a picture of a bird (a weka) and four buttons. You should click on the Explorer button. This opens a large panel with several tabs, and the Preprocess tab will already be selected.

Click on "Open file...", then click on the "data" folder, and then select the "br-train.arff" file. The "Current relation" window should now show "Relation" as BR with 614 instances and 17 attributes. The table and bar plot on the right-hand side of the window will show 316 examples in class 0 and 298 in class 1.

Now click on the "Classify" tab of the Explorer window and examine the "Test options" panel. First we will load in the test data. Click on the radio button "Supplied test set". Then click on the "Set..." button. A small "Test Instances" pop-up window should appear. Click on "Open file...", navigate to the "data" folder, and select "br-test.arff". The Test Instances window should now show the relation "BR" with 613 instances and 17 attributes. You may close this window at this point.

Now we will tell Weka which of the 17 attributes is the class variable. Below the Test options panel, there is a drop down menu with the entry "(Num) x16" selected. Click on this and choose "(Nom) class" instead. [Num means numeric; Nom means nominal, i.e., discrete]

Now we need to select the learning algorithm to apply. Go to the "Classifier" panel (near the top) which initially shows two buttons: "Choose" and "ZeroR". ZeroR is a very simple rule-learning algorithm (which we do not want). The general idea of this user interface is that if you click on "Choose" you can choose a different algorithm. If you click on "ZeroR" (or whatever algorithm name is displayed there), you can set the parameters for the algorithm.

Click on "Choose", and you will see a hierarchical display whose top level is "weka", whose second level is "classifiers", and whose third level contains seven general kinds of classifiers: "bayes", "functions", "lazy", "meta", "trees", "rules". To choose NaiveBayesSimple, click on the "bayes" indicator and then select "NaiveBayesSimple". To select Logistic Regression, choose "functions" and then "Logistic". To select the Perceptron algorithm, choose "functions" and then "VotedPerceptron".

Once we have chosen an algorithm to run, Now we are ready to run the algorithm. Click on the "Start" button, and the Classifier Output window will show the output from the classifier. For Naive Bayes, this output consists of several sections: