[UAI] Bbn Probability Accuracy

From: Scott T. Bublitz (stb@mindless.com)
Date: Wed Feb 07 2001 - 13:45:15 PST

  • Next message: Hans van Leijen: "[UAI] Re: Bbn Probability Accuracy"

    Hello,

    I am a graduate student in industrial psychology working on my
    dissertation to create an adaptive job analysis survey questionnaire
    that classifies people into one of 1152 discrete “job type”
    categories (i.e., job titles) using Bbns. For those without time to
    read a long-winded question, I will summarize it now and elaborate
    later for those interested who want to read on.

    The network is a naive Bayes model with 438 children (i.e., the 438
    survey questions, each with 7 seven states) and one parent (i.e., the
    job type category, with 1152 states) with no links among
    children. Using Netica’s “Sensitivity to Findings” I am
    selecting the most informative questions to present-- in a way to
    eliminate questions that don’t provide much additional information
    about the person’s job type. Therefore, after they respond to a
    question, the network is updated and the next most informative
    question is selected. Each time they respond, I query the parent
    (i.e. job type) node to find the post probable state (of the 1152) and
    its probability value. After administering about 30-35 questions (out
    of the 438), the probability values of the most probable state of the
    parent node (i.e., job type) often exceeds .8, and as more questions
    are administered, that value exceeds .95 (the point at which I stop
    administering questions). However, the accuracy of the Bbn to
    accurately predict a person’s ACTUAL job type is around one in four
    (roughly one in four times it actually guesses the correct state from
    the 1152 job types). Why would the computer be 95% confident that a
    node is a particular state, yet only be 25% accurate at predicting the
    actual state? Any suggestions to improve the accuracy of the
    prediction?

    MORE INFORMATION:
    This questionnaire is a job analysis instrument created by the government to
    measure the knowledge, skills, abilities, and activities needed for all
    types of work. The questions are seven-point Likert-scale questions (i.e.,
    strongly agree to strongly disagree). There is an existing database of 6000
    cases (people who responded to all 438 questions) across all 1152 job types.
    In other words, roughly 5 people in each job type responded to all
    questions, and this dataset is what I used to make the Bbn. Obviously, since
    there are multiple people in each job (and multiple jobs may have similar
    responses to several questions), the data is noisy. Before creating the
    network, I randomly selected 50 cases out of the 6000 (as simulated
    participants) and made the network on the remaining 5950. I used Netica’s
    Sensitivity to Findings to select a question that provides the most
    information for the person’s job type, each time updating the network with
    their response. Using the 50 cases (of which I know their correct job type),
    I simulated people answering the questions as they would have responded and
    observe the job type node probability value. I keep administering questions
    until a state within the parent node (i.e., job type) exceeds .95. Keep in
    mind that there are 1152 states; and the probability of any one state, given
    no information, is .00087. It is pretty strange that given 30-35 of the most
    informative findings (out of 438) that the probability of a particular state
    would exceed .8 or .9. Nonetheless, I have checked the accuracy of the
    network to predict the actual job type (of the simulated participants), and
    it is slightly less than .25. Why would the Bbn insist that it is over 95%
    confident that the job type node is a given state, yet be less that 25%
    accurate in prediction?

    Also, although 25% accurate is noteworthy, and MUCH better than a human
    could do given the same information, it isn’t as high as I would like (hard
    to convince people that 75% wrong is acceptable). When it is wrong, it is
    usually not far off (the Bbn will guess the person is a Chemical Engineer
    when they are really a Chemist). Any suggestions to further improve it’s
    accuracy rate?

    Thanks for your time,

    Scott Bublitz
    NC State University



    This archive was generated by hypermail 2b29 : Wed Feb 07 2001 - 13:48:32 PST