CS533: Program 5

(Due Friday, December 1, 10:00am)


Purpose

In this assignment, you will get some experience with Bayesian methods for diagnosis. In this assignment, you will do three things:

Downloading the Program


Compiling the Program

The program is configured for compiling with the GNU g++ compiler. The command
 make dx 
will compile all of the files in the lib subdirectory and also the diagnose program.


Input Files Describing the Diagnosis Problem

There are two files in the p5 directory that describe an automobile diagnosis problem: The format of the network file is as follows. The first entry tells the number of nodes. This is followed by a description of each node. For example, consider the lines
    6 BatteryState ( ok weak ) ( 5 ) (11)
    (6 5) .99 .80 .01 .20
Each node (also called a variable or a component) has an index number. This is node 6. The nodes must appear in numerical order in the network file. The name of the node is BatteryState. All nodes are assumed to take on two values. Internally, the values are represented by the integers 0 and 1. Externally, the values have the names ok (for 0) and weak (for 1). We have followed the rule that correct functioning is represented by the value 0 and faulty functioning by the value 1 for all nodes.

The next entry is a list of the incoming arrows to this node. There is an incoming arrow from node 5. The next entry is a list of the outgoing arrows from this node. There is a single outgoing arrow to node 11.

On the next line, we have the probability table. The first entry is a list (6 5) of the index numbers of the variables involved in this table (this is a conditional probability table giving the probability of node 6 given node 5). The four numbers are the four probabilities. They appear in the order corresponding to the following:

 
     P(node6=0 | node5=0) P(node6=0 | node5=1)
     P(node6=1 | node5=0) P(node6=1 | node5=1)
In other words, the node with the smallest index is varied most rapidly (counting in binary).

You should not need to modify the network file at all. However, you will need to modify the diagnosis file car.dx to work the third part of the assignment. The contents of this file are as follows:

18 13
0 SparkPlugs      1   15        1  10
1 Distributor     1   15        1  20
2 FuelPump        1   30        1  40
3 Leak2           1    5        1  60
4 Starter         1   10        1  40
5 BatteryAge      1    5        1  20
6 BatteryState    0    0        0   0
7 Alternator      0    0        1  50
8 FanBelt         1    5        1  15
9 Leak            1   30        1  60
10 Charge         1   10        0   0
11 BatteryPower   1   10        0   0
12 EngineCranks   1    2        0   0
13 Starts         1    2        0   0
14 Radio          1    1        0  50
15 GasInTank      0    0        1  10
16 GasGauge       1    1        0   0
17 Lights         1    1        0  20
The first line gives the number of nodes and the index of the problem-defining node (i.e., the node that has the value 0 if the whole device is functioning properly).

On each of the remaining lines, the following fields appear:


Running the diagnose Program

You can run a diagnosis as follows:
 diagnose car.net car.dx 
The program will ask a series of questions and eventually declare the device to be working properly. In its present state, the program can only handle variables that are repairable or are observable and repairable. It does not handle multiple faults or purely observable variables.

The program accepts two flag arguments -d and -t. The -d flag turns on some tracing to show the estimated probabilities of each of the faults. This is useful for understanding the reasoning of the program. When a component has probability 0, the program has proved to itself that that component cannot possibly be faulty. Similarly, you sometimes see a component with probability 1, meaning that the program has inferred that this component must be faulty. The -t flag gives an internal trace of the probability calculations. This is extremely verbose, and you probably don't ever want to see it.


The Assignment: Part 1

The purpose of the first part is to run the diagnosis process through all possible scenarios. You will run 10 scenarios on the program.

In each scenario you should assume that exactly one component is broken. There are ten components that can break (BatteryAge, Alternator, FanBelt, Leak, GasInTank, FuelPump, Distributor, SparkPlugs, Starter, Leak2). To generate a scenario, choose one of these. Suppose you choose BatteryAge. If the BatteryAge is bad (i.e., old), then you should assume that the BatteryState is bad, the BatteryPower is bad, and therefore the Radio and Lights don't work. Also, the GasGauge will not show any gas and EngineCranks will be false. This is what will prevent the car from starting. In constructing this line of reasoning, you should assume that if a node is bad, it causes all of its children (the nodes on its outgoing links) to be bad also.

You will then run the diagnose program and answer the questions that it asks according to this line of reasoning. When it eventually finds and repairs the fault, it will report the total cost. You should enter that cost into a table. The rows of the table should correspond to the various breakable components of the car. The columns should correspond to the various scenarios. For example, a scenario in which the SparkPlugs are broken and this causes the car not to start should look something like this (where the ???? should be replaced with the actual cost repair for this scenario).

Component:      Scenarios:
0 SparkPlugs    1  
1 Distributor   0  
2 FuelPump      0  
3 Leak2         0  
4 Starter       0  
5 BatteryAge    0  
6 BatteryState  0  
7 Alternator    0  
8 FanBelt       0  
9 Leak          0  
10 Charge       0  
11 BatteryPower 0  
12 EngineCranks 0  
13 Starts       1  
14 Radio        0  
15 GasInTank    0  
16 GasGauge     0  
17 Lights       0  
Cost          ????
You should add nine more columns to this table corresponding to the other 9 possible broken components. Within each column, enter a 1 if the corresponding variable should give a "bad" observation in that scenario. Run each scenario on the diagnose program.


The Assignment: Part 2

The second part of the assignment is intended to test how sensitive the diagnosis algorithm is to the costs of each action. Specifically, find the smallest change that you can make in the repair cost for GasInTank so that the program will not choose this as the first action to perform.


What to Turn In

You must turn in your solution before 10:00am Friday, December 1, 2000.


Tom Dietterich, tgd@cs.orst.edu