CS533: Program 5

(Due Friday, December 1, 10:00am)

Purpose

In this assignment, you will get some experience with Bayesian methods for diagnosis. In this assignment, you will do three things:

Download and compile a program for Bayesian diagnosis.
Test the program by running it on all single fault scenarios and building a table giving the actual cost of repair for each scenario.
Modify the costs of the repair action for filling the gas tank to find the smallest change that will cause a change in the diagnostic policy.

Downloading the Program

Download the following tar file and install it in a subdirectory p5 so that the path ../lib refers to the directory containing the Budd library routines.
- p5.tar.

Compiling the Program

The program is configured for compiling with the GNU g++ compiler. The command

 make dx

will compile all of the files in the lib subdirectory and also the diagnose program.

Input Files Describing the Diagnosis Problem

There are two files in the p5 directory that describe an automobile diagnosis problem:

car.net gives the Bayesian network structure corresponding to the network shown in the slides.
car.dx indicates which nodes are observable and repairable and what the observation and repair costs are.

The format of the network file is as follows. The first entry tells the number of nodes. This is followed by a description of each node. For example, consider the lines

    6 BatteryState ( ok weak ) ( 5 ) (11)
    (6 5) .99 .80 .01 .20

Each node (also called a variable or a component) has an index number. This is node 6. The nodes must appear in numerical order in the network file. The name of the node is BatteryState. All nodes are assumed to take on two values. Internally, the values are represented by the integers 0 and 1. Externally, the values have the names ok (for 0) and weak (for 1). We have followed the rule that correct functioning is represented by the value 0 and faulty functioning by the value 1 for all nodes.

The next entry is a list of the incoming arrows to this node. There is an incoming arrow from node 5. The next entry is a list of the outgoing arrows from this node. There is a single outgoing arrow to node 11.

On the next line, we have the probability table. The first entry is a list (6 5) of the index numbers of the variables involved in this table (this is a conditional probability table giving the probability of node 6 given node 5). The four numbers are the four probabilities. They appear in the order corresponding to the following:

 
     P(node6=0 | node5=0) P(node6=0 | node5=1)
     P(node6=1 | node5=0) P(node6=1 | node5=1)

In other words, the node with the smallest index is varied most rapidly (counting in binary).

You should not need to modify the network file at all. However, you will need to modify the diagnosis file car.dx to work the third part of the assignment. The contents of this file are as follows:

18 13
0 SparkPlugs      1   15        1  10
1 Distributor     1   15        1  20
2 FuelPump        1   30        1  40
3 Leak2           1    5        1  60
4 Starter         1   10        1  40
5 BatteryAge      1    5        1  20
6 BatteryState    0    0        0   0
7 Alternator      0    0        1  50
8 FanBelt         1    5        1  15
9 Leak            1   30        1  60
10 Charge         1   10        0   0
11 BatteryPower   1   10        0   0
12 EngineCranks   1    2        0   0
13 Starts         1    2        0   0
14 Radio          1    1        0  50
15 GasInTank      0    0        1  10
16 GasGauge       1    1        0   0
17 Lights         1    1        0  20

The first line gives the number of nodes and the index of the problem-defining node (i.e., the node that has the value 0 if the whole device is functioning properly).

On each of the remaining lines, the following fields appear:

Index number (the variables must appear in ascending order by index number)
Node name (this is not checked, but it should match against the names in the network file)
A 0 or 1 indicating whether the node is observable (1 = observable).
The observation cost of the node (0 for unobservable nodes).
A 0 or 1 indicating whether the node is repairable (1 = repairable).
The repair cost of the node (0 for unrepairable nodes).

Running the `diagnose` Program

You can run a diagnosis as follows:

 diagnose car.net car.dx

The program will ask a series of questions and eventually declare the device to be working properly. In its present state, the program can only handle variables that are repairable or are observable and repairable. It does not handle multiple faults or purely observable variables.

The program accepts two flag arguments -d and -t. The -d flag turns on some tracing to show the estimated probabilities of each of the faults. This is useful for understanding the reasoning of the program. When a component has probability 0, the program has proved to itself that that component cannot possibly be faulty. Similarly, you sometimes see a component with probability 1, meaning that the program has inferred that this component must be faulty. The -t flag gives an internal trace of the probability calculations. This is extremely verbose, and you probably don't ever want to see it.

The Assignment: Part 1

The purpose of the first part is to run the diagnosis process through all possible scenarios. You will run 10 scenarios on the program.

In each scenario you should assume that exactly one component is broken. There are ten components that can break (BatteryAge, Alternator, FanBelt, Leak, GasInTank, FuelPump, Distributor, SparkPlugs, Starter, Leak2). To generate a scenario, choose one of these. Suppose you choose BatteryAge. If the BatteryAge is bad (i.e., old), then you should assume that the BatteryState is bad, the BatteryPower is bad, and therefore the Radio and Lights don't work. Also, the GasGauge will not show any gas and EngineCranks will be false. This is what will prevent the car from starting. In constructing this line of reasoning, you should assume that if a node is bad, it causes all of its children (the nodes on its outgoing links) to be bad also.

You will then run the diagnose program and answer the questions that it asks according to this line of reasoning. When it eventually finds and repairs the fault, it will report the total cost. You should enter that cost into a table. The rows of the table should correspond to the various breakable components of the car. The columns should correspond to the various scenarios. For example, a scenario in which the SparkPlugs are broken and this causes the car not to start should look something like this (where the ???? should be replaced with the actual cost repair for this scenario).

Component:      Scenarios:
0 SparkPlugs    1  
1 Distributor   0  
2 FuelPump      0  
3 Leak2         0  
4 Starter       0  
5 BatteryAge    0  
6 BatteryState  0  
7 Alternator    0  
8 FanBelt       0  
9 Leak          0  
10 Charge       0  
11 BatteryPower 0  
12 EngineCranks 0  
13 Starts       1  
14 Radio        0  
15 GasInTank    0  
16 GasGauge     0  
17 Lights       0  
Cost          ????

You should add nine more columns to this table corresponding to the other 9 possible broken components. Within each column, enter a 1 if the corresponding variable should give a "bad" observation in that scenario. Run each scenario on the diagnose program.

The Assignment: Part 2

The second part of the assignment is intended to test how sensitive the diagnosis algorithm is to the costs of each action. Specifically, find the smallest change that you can make in the repair cost for GasInTank so that the program will not choose this as the first action to perform.

What to Turn In

Your table showing the costs of each scenario.
You modified car.dx file showing the smallest modification that changed the first step of the repair policy.

You must turn in your solution before 10:00am Friday, December 1, 2000.

Tom Dietterich, tgd@cs.orst.edu