CS434 Assignment 3 Due Wed Nov 12th in class

Part I: Experiments with Support Vector Machines

Learning Algorithm. We will experiment with SMO (i.e., support vector machines). We will use an internal validation set to decide on the best settings of the SVM parameters and choices. You can do this by selecting the Percentage Split option in the Classify panel of the Weka Explorer to use 66% of the data for training and 34% as the internal validation set. Then you must perform a series of runs in which you vary the different parameters and choices to find a setting that minimizes the error on the internal validation set. Once you have chosen the best paramter settings based on the internal validation set. Choose the "Supplied test set" option and use the br-test.arff data. Train your algorithm on the entire training set using your chosen parameter values, and evaluate on the BR test set

For SMO, there are three parameters that we must consider:

the value of C which controls the tradeoff between fitting the training data (large values) and maximizing the separating margin (small values). Values of C outside the range from 0.01 to 100 usually don't work well.

the choice of the kernel: polynomial (the default) or RBF (gaussian, chosen by setting "useRBF" to true).

the kernel parameters. For the polynomial kernel, the only parameter is "exponent", which controls the degree of the polynomial. Set to 1 for the linear kernel (i.e., no kernel at all, just a dot product). Set to 2 for the quadratic kernel and 3 for the cubic kernel. With polynomial kernels you can include low order terms by setting "lowOrderTerms" to true. By default, the kernel is computed as (x * y)^exponent. If you include low order terms, you get the kernels we discussed in class, which are computed as (x * y + 1)^exponent.
For the RBF kernel, the parameter is "gamma", which controls the width of the RBF kernel. Values in the range from 1 to 10 usually work well, but sometimes values as small at 0.1 or as large as 50 give good results.

Note that SMO will run out of memory with the default java parameters. I use the command

java -Xmx200m -jar
weka.jar

to request 200 megabytes of memory for the java vm

You need to design a set of experiments using internal validation set to choose the most approproriate parameter settings for the BR data set.

In your report, you need to describe:

The experimental procedure that you used for model selection.
The validation error of the different models that you have investigated in the following table format:

C kernel kernel-params Validation error
ccc kkk ppp eee

where kkk is "polynomial" or "rbf" and ppp is the parameter value of the kernel (exponent for polynomial and gamma for rbf). Include one line for each combination of C, kernel, and kernel parameters that you tried. Finally, of course, report your chosen parameters and the test set error when training on the entire training set.Part II: Bagging and Boosting.

Your chosen parameters and the test set error when training on the entire training set.
A discussion about the sensitivity of SVM performance to the choice of these parameters.

Part II. Experiments with Bagging and Boosting

Learning Algorithms. Bagging and AdaboostM1 are available under the "Meta" category in Weka. Please use the following settings:
- Bagging: set numIterations to 30. You will run experiments with the classifier set to Trees.J48, Functions.logistic, and Bayes.naiveBayesSimple.
- AdaboostM1: set maxIterations to 30. Set weightThreshold to 1000. You will run experiments with the classifier set to the same three algorithms as for Bagging.
For J48, set the "unpruned" option to True. You can use the default settings for all other parameters of J48, NaiveBayesSimple, and Logistic Regression. Optional: Rerun the experiments with pruning turned on and see if it makes any difference.

In addition to running Bagging and AdaBoostM1, you should rerun a single decision tree, a single Naive Bayes, and a single logistic regression.
Data Sets. We will apply these three algorithms to the same data sets that we have been using before: hw2-1, hw2-2, and br. However, we will not construct learning curves this time. Instead, you should just train on the following three files:
```
 
Domain     Training Data File            Test Data File
BR          br-train.arff                br-test.arff
hw2-1       hw2-1-200.arff               hw2-1-test.arff
hw2-2       hw2-2-200.arff               hw2-2-test.arff
```
In your report, you should report the following:

The results in the following format:

hw2-1:
Base learner     Single       Bagging        Boosting
J48              xxx          yyy            zzz
Logistic         xxx          yyy            zzz
NaiveBayes       xxx          yyy            zzz

hw2-2:
Base learner     Single       Bagging        Boosting
J48              xxx          yyy            zzz
Logistic         xxx          yyy            zzz
NaiveBayes       xxx          yyy            zzz

br:
Base learner     Single       Bagging        Boosting
J48              xxx          yyy            zzz
Logistic         xxx          yyy            zzz
NaiveBayes       xxx          yyy            zzz

Where xxx gives the error rate of a single classifier of the indicated Base Learning, yyy gives the error rate of a bagging (30 iterations), and zzz gives the error rate of AdaboostM1 (maximum 30 iterations.)

A discussion about the following questions:

Which algorithms+data sets are improved by Bagging? Can you explain these results in terms of the bias and variance of the learning algorithms applied to these domains?
Which algorithms+data sets are improved by Boosting? Can you provide possible explanations for why boosting can sometimes lead to worsened performance?