In this assignment, you will learn how to apply BUGS to perform some simple data analysis of data on human heights.
sex[i]should be constant,
sigmashould be logical, and all of the other nodes should be stochastic. You change the node type by click on the
typefield at the upper left part of the Doodle window.
When you change
mean[i] to be logical, you should then enter its
formula into the field named
Unlike in the model I presented in class, you should use an
exponential prior to ensure that
alpha1 is positive and define
mean[i] using the formula
mean[i] ~ alpha0 -
alpha1*sex[i]. As you create the model, you should ensure that
all of the parameters of each distribution are defined. The only
exception is that you should leave the
lower bound and
upper bound parameters blank in all of the distributions.
When your model is finished, click on Doodle > Write Code. This will
open a window with the textual version of your model. Delete the
semicolon at the end of the first line in this file. I will call this
textual window the Text Model Window.
list(...)defines a list of variables with values.
c(a1,a2,a3...)defines a 1-dimensional array whose contents are the values
a2, and so on.
tau. Type a second "list" command to define a second set of initial values for these variables. Now double click on the first "list", move to the Specification Tool and click on the "Load Inits" button. Notice that the small box labeled "for chain" has the value 1, and that it automatically increments to 2 when the initial values are loaded. The words "chain initialized but other chain(s) contain uninitialized values" should appear in the lower left edge of the main WinBUGS window.
Now double click on the second "list" expression that you wrote, and then click on the "Load Inits" button in the Specification Tool to initialize the second chain. The words "model is initialized" should appear in the lower left edge.
If all goes well, WinBUGS should rapidly perform the 1000 iterations.
However, if you get a run-time error, check to make sure that you have
the proper probability densities defined and that the initial values
make sense. I made the mistake of making
tau a gaussian. This
allowed the variance of
height[i] to become negative, and that caused
an error. Under Help > User Manual, there is a section on Tips and
Troubleshooting that provides some advice on how to fix problems. The
BUGS web page also has some pointers to other useful information.
alpha0from this menu and click on the "history" button. This will show the values of
alpha0during the execution for both Markov chains. If these values seem to be staying within the same range and that range looks reasonable, this is comforting (but not particularly illuminating). Click on the "quantiles" button, and you will see the 95% prediction interval as a function of time. This hopefully shows the two Markov chains converging toward each other. You should inspect this display for
tauto decide when the chains have converged. Suppose you decide that they've converged after 400 iterations. Type 400 into the "beg" box in the Sample Monitor Tool. Now any subsequent displays will ignore the first 399 data points.
alpha0) from the node drop-down menu and click on the "auto cor" button. This displays a bar graph showing the degree of autocorrelation between the sampled value of
tand the sampled values at times
t - lag, for lag from 0 to 50. Suppose you decide that there are clear autocorrelations up to lag 10. In that case, type 10 into the "thin" box. This tells WinBUGS to compute its statistics using 1 out of every 10 samples from the the Markov chain. You can click "auto cor" again, and now you should see significant autocorrelations only at lag 0.
With a convergence at iteration 400 and a thinning of 10, we will have only 60 samples of each variable, so that may not been enough. You can run the Markov chain for another 1000 steps by clicking on the "Update" button in the Update tool. This will give us another 100 samples of each variable, which will give nicer estimates of the posterior distributions.
tau). Similarly, click on the "stats" button to see the mean, median, and 95% prediction interval for each parameter.
sex[i]a stochastic node with the
dbern(Bernoulli) distribution, define a parent node
thetato specify the
proportionparameter of the Bernoulli distribution, and give
dbetadistribution with parameters a=1 and b=1 (for a uniform prior).
sexvariable from the "list".
sex[i]variables and for
usex[i], I recommend tossing a coin (or generating random numbers some way) and setting
sex[i]to either 0 or 1 (at random) separately for each
i. You can write your initial values as
sexvariables, it is impractical to use the Sample Monitor Tool. Instead, you should click on Inference > Summary... to open the Summary Monitor Tool. Enter the "sex" variable into the "node" field and click the "set" button. This collects the statistics for
i, but it does not collect the other more detailed information. After you have run the markov chains, you can select "sex" from the "node" drop down menu in the Summary Monitor Tool and then click on "Stats". One trick here is that the data are collected from the start of the simulation, so you can't tell WinBUGS when to start or what value of "thin" to use. To accomplish this, you need to use the Update Tool. First, run the Markov chains until they have converged. Then click "clear" on the Summary Monitor Tool. Then go to the Update Tool and enter an appropriate value for "thin". Now each update iteration will perform "thin" number of actual updates and collect one sample. Hence, if you are sampling every 10th value and you want 200 samples, you should set "updates" to 200, and "thin" to 10. This will actually perform 2000 steps and draw 200 points. Now the Summary Monitor Tool will give you good statistics. This is explained in the Tutorial section of Help > User Manual.
If you have time and are curious, try to figure out how to extend
the latent variable model to have more than 2 possible sexes. You
should start by defining
sex[i] to have a
dcat (categorical) distribution. A categorical
distribution is a multinomial distribution with K possible outcomes
(e.g., rolling a die with 6 possible outcomes). You can see an
example of this in the "Eyes" example from Help > Examples Volume II.
dcat requires a vector-valued parameter "proportions". This vector of
values is provided by a Dirichlet distribution
The dirichlet distribution is the generalization of the beta
distribution to a vector of probabilities that must sum to 1. So
P ~ ddirch(alpha) where
c(1,1,1,1,1,1) would generate a vector of 6 probability values
that sum to 1.
The tricky part of having more than 2 sexes is that we can't just
use the model
mean[i] = alpha0 - alpha1 * sex[i], because
we don't expect each sex k to be a constant amount taller than sex
k-1. One way to do this is to define a set of constant vectors
T5 as follows:
list( T1 = c(0, 1, 1, 1, 1, 1), T2 = c(0, 0, 1, 1, 1, 1), T3 = c(0, 0, 0, 1, 1, 1), T4 = c(0, 0, 0, 0, 1, 1), T5 = c(0, 0, 0, 0, 0, 1))
Now we can write the model as
mean[i] = alpha0 + alpha1*T1[sex[i]] + alpha2*T2[sex[i]] + alpha3*T3[sex[i]] + alpha4*T4[sex[i]] + alpha5*T5[sex[i]]In this case,
alpha0is the mean height of sex 1,
alpha0+alpha1is the mean height of sex 2,
alpha0+alpha1+alpha2is the mean height of sex 3, and so on. I'm not convinced this is the best way to do this, because if the Gibbs sampler increases
alpha1, this causes all of the remaining sexes to get taller so
alpha2then needs to be decreased to compensate. But it does ensure that each sex is well-defined (i.e., they are defined in increasing height order).
To fit more than 2 sexes, you will need more data. The following file gives height data for all of the 21-25 year olds in the NIH study: height-21-25.txt. You will need to reformat it for WinBUGS.
For the two height models, you should be able to cut and paste the Bayesian network and the relevant graphs into a Word file and then print it out. I would like to see the following:
alpha0and your decision about the number of iterations at which the chains have converged
alpha0and your decision about how often it is safe to sample from the chains
theta(for the latent variable model)
sex[i]produced by the Summary Monitor Tool (for the latent variable model). It would be nice to convert this to a plot like the ones I showed in class.
Please give Bruce D'Ambrosio a hardcopy printout and email me the Word file. I will be in Washington DC Tues-Wed-Thurs of next week, but I'll look at them during my flight back across the country.