Hello,
I am a graduate student in industrial psychology working on my
dissertation to create an adaptive job analysis survey questionnaire
that classifies people into one of 1152 discrete “job type”
categories (i.e., job titles) using Bbns. For those without time to
read a long-winded question, I will summarize it now and elaborate
later for those interested who want to read on.
The network is a naive Bayes model with 438 children (i.e., the 438
survey questions, each with 7 seven states) and one parent (i.e., the
job type category, with 1152 states) with no links among
children. Using Netica’s “Sensitivity to Findings” I am
selecting the most informative questions to present-- in a way to
eliminate questions that don’t provide much additional information
about the person’s job type. Therefore, after they respond to a
question, the network is updated and the next most informative
question is selected. Each time they respond, I query the parent
(i.e. job type) node to find the post probable state (of the 1152) and
its probability value. After administering about 30-35 questions (out
of the 438), the probability values of the most probable state of the
parent node (i.e., job type) often exceeds .8, and as more questions
are administered, that value exceeds .95 (the point at which I stop
administering questions). However, the accuracy of the Bbn to
accurately predict a person’s ACTUAL job type is around one in four
(roughly one in four times it actually guesses the correct state from
the 1152 job types). Why would the computer be 95% confident that a
node is a particular state, yet only be 25% accurate at predicting the
actual state? Any suggestions to improve the accuracy of the
prediction?
MORE INFORMATION:
This questionnaire is a job analysis instrument created by the government to
measure the knowledge, skills, abilities, and activities needed for all
types of work. The questions are seven-point Likert-scale questions (i.e.,
strongly agree to strongly disagree). There is an existing database of 6000
cases (people who responded to all 438 questions) across all 1152 job types.
In other words, roughly 5 people in each job type responded to all
questions, and this dataset is what I used to make the Bbn. Obviously, since
there are multiple people in each job (and multiple jobs may have similar
responses to several questions), the data is noisy. Before creating the
network, I randomly selected 50 cases out of the 6000 (as simulated
participants) and made the network on the remaining 5950. I used Netica’s
Sensitivity to Findings to select a question that provides the most
information for the person’s job type, each time updating the network with
their response. Using the 50 cases (of which I know their correct job type),
I simulated people answering the questions as they would have responded and
observe the job type node probability value. I keep administering questions
until a state within the parent node (i.e., job type) exceeds .95. Keep in
mind that there are 1152 states; and the probability of any one state, given
no information, is .00087. It is pretty strange that given 30-35 of the most
informative findings (out of 438) that the probability of a particular state
would exceed .8 or .9. Nonetheless, I have checked the accuracy of the
network to predict the actual job type (of the simulated participants), and
it is slightly less than .25. Why would the Bbn insist that it is over 95%
confident that the job type node is a given state, yet be less that 25%
accurate in prediction?
Also, although 25% accurate is noteworthy, and MUCH better than a human
could do given the same information, it isn’t as high as I would like (hard
to convince people that 75% wrong is acceptable). When it is wrong, it is
usually not far off (the Bbn will guess the person is a Chemical Engineer
when they are really a Chemist). Any suggestions to further improve it’s
accuracy rate?
Thanks for your time,
Scott Bublitz
NC State University
This archive was generated by hypermail 2b29 : Wed Feb 07 2001 - 13:48:32 PST