**Learning Algorithms**. We will experiment with J48, Logistic Regression, and SMO (support vector machines). In each case, we will use an internal holdout set to decide on the best setting of the overfitting parameters. You can do this by selecting the Percentage Split option in the Classify panel of the Weka Explorer to use 66% of the data for training and 34% as the internal validation set. Then you must perform a series of runs in which you vary the overfitting parameters to find the values of those parameters that minimize the error on the internal validation set. Once you have chosen the values of the overfitting parameters. Choose the "Supplied test set" option and use the`br-test.arff`

data. Train your algorithm on the entire training set using your chosen parameter values, and evaluate on the BR test set.**Overfitting Parameters for Each Algorithm**- For J48, we control the amount of overfitting with the "confidenceFactor" parameter. Values in the range from 0.01 to 0.50 are sensible.
- For Logistic Regression, overfitting is controlled by the "ridge" parameter which controls the size of a square penalty on the weights. Values in the range from 0.01 to 100.0 are sensible.
- For SMO, there are three parameters that we must consider:
- the value of C which controls the tradeoff between fitting the training data (large values) and maximizing the separating margin (small values). Values of C in the range from 0.01 to 100 usually are worth checking. Values outside this range usually don't work well.
- the choice of the kernel: polynomial (the default) or RBF (gaussian, chosen by setting "useRBF" to true).
- the kernel parameters. For the polynomial kernel, the only
parameter is "exponent", which controls the degree of the polynomial.
Set to 1 for the linear kernel (i.e., no kernel at all, just a dot
product). Set to 2 for the quadratic kernel and 3 for the cubic
kernel. With polynomial kernels you can include low order terms by
setting "lowOrderTerms" to true. By default, the kernel is computed
as (x * y)^exponent. If you include low order terms, you get the
kernels we discussed in class, which are computed as (x * y +
1)^exponent.
For the RBF kernel, the parameter is "gamma", which controls the width of the RBF kernel. Values in the range from 1 to 10 usually work well, but sometimes values as small at 0.1 or as large as 50 give good results.

**Results**. You should turn in the following:- For J48, please construct a table of the form
confidence level tree size validation error ppp sss eee

with one row for each confidence level that you tried. The tree size is the total number of nodes, and it is reported by the algorithm. Finally, report your chosen confidence level, the resulting tree size (when trained on the entire training set), and the test set error. - For Logistic, please construct a table of the form
ridge parameter sum of abs(coef) validation error ppp sss eee

with one row for each ridge value that you tried. The second column is the sum of the absolute values of the coefficients (not including the intercept term). You will need to compute this from the output produced by the algorithm. Finally, report your chosen ridge parameter, the resulting sum of abs(coef), and the test set error when training on the entire training set. - For SMO, please construct a table of the form
C kernel kernel-params validation error ccc kkk ppp eee

where`kkk`

is "polynomial" or "rbf" and`ppp`

is the parameter value of the kernel (exponent for polynomial and gamma for rbf). Include one line for each combination of C, kernel, and kernel parameters that you tried. Finally, of course, report your chosen parameters and the test set error when training on the entire training set. - Answer the following questions:
- Which algorithm and parameter set gave the lowest final error rate?
- For which algorithm is the setting of overfitting parameters the most difficult (e.g., either because it is time consuming or because the algorithm is very sensitive to the parameter values or both).
- Can you explain these results in terms of the bias and variance of the learning algorithms applied to these domains?

- For J48, please construct a table of the form
- Note that SMO will run out of memory with the default java
parameters. I use the command
`java -Xmx200m -jar weka.jar`

to request 200 megabytes of memory for the java vm.