AgendaManager Evaluation

The AgendaManager:
Evaluation

Synopsis: This page describes an experimental study to compare the effectiveness of the AgendaManager in facilitating Agenda Management with that of a model of the Engine Indication and Crew Alerting System (EICAS).

Keywords: Agenda Management, AgendaManager, evaluation, Engine Indication and Crew Alerting System, EICAS.

Last update: 27 Jun 97

Objective

The purpose of the experiment was to determine any differences in AMgt performance between the use of the AMgr and the use of a model (developed in our lab) of a conventional monitoring and alerting system called the Engine Indication and Crew Alerting System (EICAS).

Method

Subjects

A total of ten airline pilots participated in the experiment, with the first two being used to refine the scenarios and identify and correct problems with software and procedures.

Apparatus

The apparatus consisted of the following components

part-task flight simulator, running on two Silicon Graphics Indigo 2 workstations,
the AgendaManager running on one of the two workstations,
an experimenter's console running on a third workstation,
and a Verbex VAT31 automatic speech recognition system on a 486 personal computer, connected to the workstation running the AMgr by an RS-232 serial connection.

Procedure

Prior to the experiment each subject was given a brief introduction to the study, filled out a pre-experiment questionnaire, and read and signed an informed consent document. The following forty minutes were used to train the Verbex speech recognition system to recognize the subject's voice so that altitude, speed, and heading goals could be determined from ATC clearance acknowledgements. After a short break the subject learned how to fly the flight simulator using the Mode Control Panel (MCP -- the autoflight system interface), recognize and correct experimenter-induced goal conflicts and subsystem faults, interpret EICAS and AMgr displays, and alter programmed flightpaths. After a lunch break, the subject flew two 30 minute scenarios (one with EICAS, one with the AMgr), separated by a five minute break. Upon the completion of the experiment the subject answered a post-experiment questionnaire.

Experimental Design

The primary factor investigated in the experiment was monitoring and alerting system condition (whether AMgr or EICAS was used). The experimental design was balanced in regard to the monitoring and alerting system used and the scenario (1 or 2).

Data Collection

We collected data for each subject on:

how correctly the subject prioritized within concurrent subsystem functions;
the average subsystem fault correction time;
the average time to properly program the autoflight system;
the percentage of goal conflicts detected and corrected;
the average time to resolve goal conflicts;
how correctly the subject prioritized concurrent subsystem and aviate functions;
the average number of unsatisfactory functions at any time;
the percentage of time all functions were satisfactory; and
the subject's rating of the effectiveness of each monitoring and alerting system: -5 (great hindrance) to +5 (great help).

The raw data for variables 1 - 8 were recorded by the AMgr itself. GoalConflict objects recorded goal conflicts and FunctionAgents, which assess function status as part of their roles, recorded function performance data.

Results

The data were analyzed using Analysis of Variance and the following table summarizes the results obtained for each of these variables, with links to histograms.

AgendaManager evaluation results: mean values (all times in seconds), p-values, and levels of statistical significance of the differences.
Response variable	AgendaManager	EICAS	p-value	level of significance
within subsystem correct prioritization	100%	100%	NA	not significant
subsystem fault correction time	19.5	19.6	.9809	not significant
autoflight system programming time	7.0	5.9	.1399	not significant

goal conflicts corrected percentage	100%	70%	.0572	0.10
goal conflict resolution time	34.7	53.6	.0821	0.10
subsystem/aviate correct prioritization	72%	46%	.0308	0.05
average number of unsatisfactory functions	0.64	0.85	.0466	0.05
percentage of time all functions satisfactory	65%	52%	.0254	0.05

subject effectiveness rating (-5 to 5)	4.8	2.5	.0006	0.05

The first three variables, within subsystem correct prioritization, subsystem fault correction time, and autoflight programming time, show no statistically significant differences (p-values > 0.05) across the AMgr/EICAS conditions. This is critical for the interpretation of the results in that it supports the hypothesis of the AMgr being the only cause of significant differences. For example, within subsystem prioritization performance does not differ between the two conditions. Also, once a subsystem fault is detected, the process of correcting it is identical between the two conditions. Programming the autoflight system is identical in both conditions. However, we did observe a minor practice effect for each subject between the two scenarios, i.e., they showed significant improvement in programming the autoflight system.

A key objective of the AMgr is to support the pilot in recognizing goal conflicts and to help resolve those in a timely manner. The next two variables, goal conflicts corrected percentage and goal conflict resolution time, directly reflect this, and the results indicate how successful the AMgr condition achieved it (suggestive evidence of differences, with 0.05 < p < 0.10). Any time a goal conflict existed, the AMgr helped the subject identify this conflict (100%) whereas with EICAS, the subjects only identified 70% of the conflicts (a statistically significant difference, with p < 0.05). Also, with the AMgr the subjects were able to resolve the conflict nearly 19 seconds faster. This may have helped them achieve an overall lower level of unsatisfactory functions (AMgr: 0.64; EICAS: 0.85; a statistically significant difference) by making more time available to them.

It is crucial for the pilot to recognize that primary flight control functions (i.e., aviate functions) are usually more critical than subsystem related functions. The AMgr clearly showed its strength by helping the pilots in 72% of the cases to correctly prioritize. With EICAS the pilots only achieved 46% (a statistically significant difference). Last, but not least, with the AMgr the subjects were able to achieve a significantly higher percentage of time where all functions were performed satisfactorily (AMgr: 65%; EICAS: 52%; a statistically significant difference).

Independent of how well an individual can perform under a given condition, it is also important that subjectively he or she finds this condition acceptable. Based on our results, the subjects' effectiveness ratings strongly support the AMgr (4.8 vs. 2, a statistically significant difference).