CS430/530: Homework Assignment 6 (Friday, December 1, 2000)

This is a combined written and programming assignment.

1. Generate a learning curve for the restaurant problem as follows. The "true" decision tree for the restaurant problem is the value of the global variable *target*, defined in learning/domains/restaurant.lisp. That file also defines the variables *attributes* and *goal*. The *attributes* list has the form

((atr-1-name val1 val2) (atr-2-name val3 val4) ... (atr-n-name val20 val21))
These are the "input" features for the learning algorithm. The *goal* list has the form
(willwait yes no)
This tells which attribute is the "goal" attribute---the one we are trying to predict. A training example has the following format:
((atr-1-name . val-1) (atr-2-name . val-2) ... (atr-n-name . val-n) (atr-goal-name . val-g))
The order in which the attributes appear is unimportant. The attributes include all of the input attributes and the goal attributes.

To construct a decision tree, you call the function (dtl examples attributes goal). It returns a data structure describing a decision tree. You can print the decision tree using the function (dtprint tree). You can use a decision tree to classify a new example by calling (dtpredict tree example). Note that it returns the answer as a 1-element list, such as (yes).

You should use the *target* tree to generate a list of 1000 examples to use as the test set and 100 examples to use as the training set. The function (random-examples n attributes) will generate a list of n random examples. The function (classify examples goals h performance-element) will add the goal attribute to each of these random examples. For example, you could evaluate

(classify
        (random-examples 50 *attributes*)
        (list *goal*)
        *target*
        #'dtpredict)
to generate a list of 50 classified training examples for the restaurant problem.

You will then run the decision tree learning algorithm for 10 trials. On the i-th trial, you will give it a training set consisting of the first 10 * i examples from the training set. You will then test the accuracy of the resulting decision tree on the 1000 test examples. You will need to write a function to compute the proportion of examples correctly classified.

You should produce a table of the following form (the numbers are fake---your numbers will be different). The first column is the size of the training set and the second column is the fraction of test examples correctly classified.

        10      0.50
        20      0.80
        30      0.90
        40      0.95
        50      0.95
        60      0.94
        70      0.96
        80      0.95
        90      0.95
       100      0.96

If you store this in a file named table, you can use gnuplot to plot the learning curve by giving the commands

  % gnuplot
  gnuplot> set xlabel "Size of training set"
  gnuplot> set ylabel "Proportion correct"
  gnuplot> plot "table" with lines
If you repeat the experiment with a different training set, how much do the results change?

2. One of the earliest decision tree algorithms was developed by Ross Quinlan to learn a value function for chess end games. The file /usr/local/classes/cs/cs430/code/learning/domains/krkp.lisp contains 3,196 training examples from the chess endgame King-and-rook vs. King-and-pawn. The examples are stored as the value of *krkp-examples*. The variable *krkp-attributes* gives the attribute information, and the variable *krkp-goal* gives the goal information. The file is very large (1.4 Mb). You should load it into your lisp system to work this problem. NOTE: It will not fit into the student edition of LispWorks, so you will need to do this under GCL.

This data set was developed by Alen Shapiro in 1983. Each example describes a chess board position with the white side (king-and-rook) to move. The goal attribute is class, and its value is won if the white side can win from this position, and nowin if the white side cannot win from this position. The attributes have been carefully chosen to be useful for predicting the outcome of a chess game.

Apply the decision-tree learning algorithm to generate a learning curve for this problem. Randomly divide the available data into a training set of 200 points and a test set containing all of the remaining points (hint: Use the function random-choose from the first programming assignment to draw the training set). After you have selected the training set, you should also randomly permute it, because the examples are given in a non-random order. A simple way to generate a random permutation of a list of elements is given by the following example: Given the list (a b c d e) create a list of the form ((0.2345 . a) (0.1232 . b) (0.6847 . c) (0.1728 . d) (0.8476 . e)). We do this by consing a random number between 0 and 1 onto each element in the original list. To generate a random number between 0 and 1, use (random 1.0). Now we sort the list using the form (sort list #'< :key #'car). This says to sort the list list into ascending order using the car of each element as the sort key. Finally, we can strip off the floating point numbers to get the permuted list: (b d a c e).

Once you have sorted the training examples, you should run the decision tree learning algorithm for 20 trials. In trial i, you will give the learning algorithm the first 10 * i training examples and evaluate the resulting tree on the rest of the examples. Construct a table like the table in problem 1.

3. Run the decision tree learning algorithm on the entire KRKP data set and print the resulting decision tree. (This will take several CPU minutes). The resulting decision tree should completely fit the training data. Notice that the tree is much smaller than the data set from which it was constructed. Explain how you could use this tree to construct an agent that could play perfect chess on the KRKP endgame (ignore the difference between a draw and a loss). You may assume that agent has 36 sensors that provide the values of the 36 features plus a sensor that indicates when the game is over.

4. CS530 only. Work problem 18.8. Assume that you have an hypothesis space containing |H| hypotheses. The question is really asking what is the probability that after m examples generated and classified at random, none of the hypotheses in the hypothesis space are consistent with the examples. Hint: Because the examples are being labeled randomly, every hypothesis in H will have an error rate of 0.5.