Machine Learning, HW1
--------------------------------------
Due Sat March 9 11:59pm on blackboard.
--------------------------------------


Part I - Theory (please LaTeX your solutions)

*optional.

1. Prove that perceptron converges regardless of initial weight vector and learning rate.

For the rest of this HW, you can assume initial weight vector is zero vector and learning rate is 1.

*2. Prove that p-aggressive MIRA (0<=p<1) converges.

*3. Based on 2, formulate a conjecture about the final geometric margin of p-aggressive MIRA after convergence (assuming separability), and prove it.

Hint: how does the final geometric margin compare to the optimal geometric margin (i.e., SVM)?

4. Prove that multiclass perceptron converges.

5. Do you still need augmented space in multiclass perceptron? Include a drawing to support your answer.

6. Work out the MIRA update formula *geometrically* for the negative example case. Include a figure.

7. In p-aggressive MIRA, if we replace the functional margin in the update trigger by geometric margin, 
   would it work? Why or why not?


Part II - Programming (python+numpy)

1. Adult-income-50k dataset from UCI:

I've cleaned up the "adult income" dataset here:

http://acl.cs.qc.edu/~lhuang/teaching/machine-learning/hw1/adult-50k.tgz

There is a README.lhuang file that you need to read first.

a) Your basic job is train a perceptron classifier on the training data, and use dev set to 
figure out the best number of iterations (to prevent overfitting), and finally report the 
accuracy on test using the best model on dev.

You'll need to do some feature preprocessing (see README.lhuang).

You'll also need to make a plot where x axis is the number of epochs, and y axis is accuracy on dev
(yes, accuracy, not error rate), and y2 axis is number of updates on train (i.e., two curves).

Are these curves bumpy or monotonic? Why?

The basic accuracy isn't very good, right? Now you'll explore various ways to improve your perceptron.

b) The very first trick is averaged perceptron (rather than voted perceptron!).

Does it help? Redo the plots and report the new accuracy on test.

*** update: As demonstrated by Kai, you should report/plot accuracy on dev after training every 1000 examples. see Kai's plot at:

http://acl.cs.qc.edu/~lhuang/teaching/machine-learning/hw1/adult-50k/kai-avgperc.png

 
*c) The next simple trick you'd always use is (re-)shuffling before each epoch.

But now the results are non-deterministic. So try it 10 times, and pick the model that does best on dev,
and report the new accuracy on test. Also, report the average accuracy of the 10 models on dev.

*d) The next thing you could try is variable learning rate.

Try 1/(total#errors) and 1/(total#examples) and redo the plots and if it helps on dev, report on test.

You can also propose your own formula.

Use the best learning rate scheme you found and stick to it for the rest of this HW.

The next few tricks are more advanced and less commonly used in practice.

e) p-aggressive MIRA.

figure out the best p using dev set, and report on test. does it converge slower than perceptron?

*f) feature scaling and translation. unit-Gaussian.

for each feature, move it so that the median is 0 (at the origin), and rescale it so that the variance is 1
(which is analogous to a unit-Gaussian).

*g) feature engineering: bucketing, categorical to binary, and concatenation.

try your own ideas here. report the best accuracy on dev, and the corresponding accuracy on test.

*** update: new extra credit problem added:

*h) what are most positive/negative features/values for the following categories? 
    Do they make intuitive sense (e.g., age >=65 and age <=25 are negative, and "Married" is positive)?

Age:
Occupation:
Marital-Status:
Capital Gain:
Capital Loss:


PLEASE INCLUDE A FILE "report.pdf" FOR BOTH THE THEORY PART AND THE EXPERIMENTAL RESULTS (including plots).


3 DEBRIEF SECTION

Please include a separate "debrief.txt" for these questions:

1. Did you work alone, or did you discuss with other students? 
   If the latter please write down their names. 
   Note: in general, only high-level discussions are allowed; 
         each student still has to write the code, perform the experiments, 
         and write the report all by him/herself.

2. How many hours did you spend on this assignment?

3. Would you rate it as easy, moderate, or difficult?

4. Are the lectures too fast, too slow, or just in the right pace?

5. Any other comments?

Note: You'll get 5 pts off for not including the debrief. 
If you have submitted but forgot to do include it before the deadline, 
email debrief.txt alone to TA Kai (kzhao@gc.cuny.edu) and the professor.

--------------- THE END OF HW1 ------------------------------------------