1.1, 1.4, 2(all), 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 4.1, 4.2, 4.4, 6.1, 6.2, 7.5, 7.6, 7.7, 7.8.
We covered the following material since the midterm:
14 (all), 15.1, 15.2, 15.3 (530 only), 15.4 (530 only), 16.1, 16.3, 16.5, 16.6 (530 only), 17 (all), 18.1, 18.2, 18.3, 18.4, 19.2, 19.3, 19.4, 19.6, 19.7, 20.1, 20.4, 20.5, 20.6.
You are responsible for all of these except for the sections in Chapter 7.
Here is an outline of the most important points in the second half of the course.
1. Implementing agents using probability Key ideas: * Use the power of probability to represent uncertainty about the environment * Use probabilistic inference to implement the functions of the agent. * Use utility function to represent the utility of different states * Combine with dynamic programming search to find optimal policies. * Probability theory - random variables - algebraic rules of probability * Probabilistic inference - Belief networks (semantics, syntax, conditional independence, D-separation) - SPI algorithm * Dynamic Programming Algorithms - Value iteration - Policy iteration - Modified policy iteration * Dynamic decision networks - Updating belief about current state based on the chosen action and observation. - Performing lookahead search a fixed number of steps to choose the optimal action. 2. Learning for probabilistic agents Key ideas: * Each state-action-result-reward step provides training examples for learning P(S'|S,A) and R(S'|S,A). * A learning agent must explore (try actions not currently believed to be optimal) in order to learn more about the environment. * Exploration strategies include random exploration, weighted random exploration (Boltzmann exploration), and optimism under uncertainty. * Optimism under uncertainty is similar to A* search and avoids the need for exhaustive exploration in some cases. * Dynamic programming can be applied after each step to derive the current best policy. * Q learning is an alternative approach that avoids learning a model of the environment. It generally requires many more interactions with the environment to reach an optimal policy. Q learning relies on temporal averaging to compute expected values. Limitations: * Each action must be performed in each state many times in order to learn the model. No generalization! * Q learning can be very slow. 3. Supervised Learning Key Ideas: * Supervised learning involves learning the definition of an unknown function from examples of that function. * Learning algorithms use heuristic algorithms to search large spaces of potential hypotheses. * There is a fundamental tradeoff in machine learning between the amount of data, the size of the hypothesis space, and the expected accuracy of the resulting hypothesis. * Decision tree algorithms "grow" a decision tree using a 1-step greedy heuristic. * Linear threshold units ("perceptrons") are learned by using gradient descent search. * Multilayer neural networks are learned by gradient descent. Early stopping (using a halting set) is used to control the number of hypotheses explored.
Skills you should be able to demonstrate during the exam:
1. Know how to infer the joint distribution and conditional independencies from the structure of a belief network. 2. Be able to hand-execute the SPI algorithm. 3. Be able to hand-simulate value iteration, policy iteration (both value determination and policy improvement), and Q learning. 4. Be able to compute the optimism-under-uncertainty policy. 5. Be able to hand-simulate decision tree and linear threshold unit algorithms. 6. Be able to write down belief networks and decision diagrams for simple situations.
The most important items from the first half of the course are
1. Definitions of different kinds of agents. 2. Definitions of different kinds of environments. 3. Key functions that must be implemented in a general agent.Consult the midterm study guide for more details.