Adapting the task-taxon-task methodology to

Comput Math Organ Theory DOI 10.1007/s10588-011-9093-7 SI: BRIMS 2010

Adapting the task-taxon-task methodology to model the impact of chemical protective gear Shane T. Mueller · Benjamin Simpkins · George Anno · Corey K. Fallon · Owen Price · Gene E. McClellan

© Springer Science+Business Media, LLC 2011

Abstract The Task-Taxon-Task method (Anno et al. DNA-TR-95-115, 1996) is a statistical modeling approach to predict performance decrements on behavioral tasks in response to various stressors. We describe the basics of the T3 method and our approach to adapting it to handle more acute stressors, which can require decomposition into task networks via logical or empirical analysis. We provide an illustrative example showing how the method can be used to account for performance decrements in manual tasks associated with wearing protective gloves. This illustration provides a substantive application in which the current T3 method can be augmented to account for performance decrements in a new sub-domain, while additionally providing lessons for extending the method to new stressors, performance domains, and behavior modeling systems. Keywords Performance prediction · Cognitive moderators · Performance modifiers · Performance under stress Introduction Many cognitive and behavioral models aim to predict performance under new conditions, such as predicting how a person might perform a new task based on previously measured parameters, predicting how future technologies may change the way a known task is performed, or predicting how performance on a known task will be impacted by some new stressors. The research we report here was conducted in service of this last kind of prediction: to measure and model the impact that chemical protective gear (also known as Mission-Oriented Protective Posture, or MOPP) worn by U.S. warfighters has on operational performance. However, the lessons we have S.T. Mueller () · B. Simpkins · G. Anno · C.K. Fallon · O. Price · G.E. McClellan Applied Research Associates, Inc., 1750 Commerce Center Blvd, Fairborn, OH 45324, USA e-mail: [email protected]

S.T. Mueller et al.

learned are useful more broadly; they may apply specifically to other types of protective systems (including industrial protection, fire protection, SCUBA equipment, etc.), but also to the general goal of developing predictive models of cognitive performance in the face of physical and physiological stressors (which, broadly construed, might include fatigue, sleep loss, drug or alcohol intoxication, age-related decline, and the like). The impact of stressors on human performance has been studied for over a century in the human factors and applied psychology literature (e.g., Yerkes and Dodson 1908; Broadhurst 1959; Sanders 1983; Ritter et al. 2007), and researchers such as Wickens (e.g., 2002) and Silverman (e.g., 2006) have offered modeling approaches for making predictions about general human performance in the face of stressors or “performance modifiers”. Our approach, the Task-Taxon-Task (T3) Methodology (Anno et al. 1996), is rooted in statistical analysis of task performance, but offers a well-grounded method that could be integrated into a number of modern modeling approaches that at present do not incorporate the impact of various stressors on performance. In this report, we will describe the method, including an illustrative example in the context of manual dexterity tasks. We will conclude with some suggestions about how the method could be integrated into a number of classes of human behavior models.

Taxonomies of skill Fleishman and his collaborators (e.g., Fleishman et al. 1968; Fleishman 1975; Fleishman and Quaintance 1984) have conducted the best-known and most comprehensive research on skill taxonomies, which culminated in a comprehensive skill taxonomy with five major areas (cognitive, perceptual, sensory, physical, and psychomotor), with (in one version) a total of 37 abilities (including skills such as verbal comprehension, deductive reasoning, reaction time, and manual dexterity). At the lowest level of any taxonomy is a specific skill or ability that is useful for accomplishing more complex tasks. In this context, a skill taxon is a division of the skill taxonomy that captures a basic psychological ability that is broadly deployed for numerous tasks. Hogan et al. (1990) noted the importance of developing skill taxonomies, and argued that taxonomies provide a means for assessing task demands that goes beyond case-by-case task analysis. In this view, once the taxonomic structure of a task is developed, all skills that fall under relevant taxons are represented, making prediction easier. Fleishman and Quaintance (1984) cataloged a number of taxonomies that had been previously developed. In Table 1 we highlight a few that have been developed since that time to illustrate their range and purpose. Many of the taxonomies in Table 1 are actually nested hierarchies, and so when appropriate, we indicate the two primary taxon levels in the third column. For example, Davis et al. (1990) specified ten primary taxa with a total of 42 subordinate taxa. Most of the taxonomies involving ten or more taxa are based on Fleishman and Quaintance (1984), and are typically categorized into seven to ten broad areas.

Adapting the T3 methodology Table 1 Description of several skill taxonomies. Taxonomies with a two-level hierarchical structure are described in terms of their primary/subordinate numbers of taxa Source

Name

Number of taxa/subtaxa

Purpose

Wickens (2002)

Multiple resource theory

4 “dimensions”

Make predictions about Human-System interaction

Klein et al. (2003)

Macrocognitive theory

5

Describe high-level skills required for expert performance

Mueller et al. (2007)

Cognitive decathlon

6

Identify specific measures that can be used to evaluate embodied artificial intelligence

Anno et al. (1996)

T3 methodology

5

Predict performance decrements from stressors related to chemical, biological, and nuclear weapons

Salvi (2001)

IMPRINT

9

Evaluate and predict human performance in military systems and equipment

Davis et al. (1990)

P2NBC2 database

10/42

Characterize impact of chemical protective gear across a range of field tasks

Ramirez (1997)

Human abilities taxonomy for CWTSAR

8/35

Characterize impact of chemical protective gear across a wide range of field tasks

Neades et al. (1998)

ORCA elemental capability

6/24

Model team capabilities following injury

Parks (2006)

Hawaii Early-Learning Profile (HELP)

6/58

Identify learning objectives and goals for pre-school children

Fleishman and Quaintance (1984)

Taxonomy of human performance

21–52

Characterize broadly the skills required for advanced performance

The Task-Taxon-Task methodology As the name implies, skill taxons play a central role in the Task-Taxon-Task methodology. The name of the method refers to the steps taken to produce predictive models of stressors: One begins with measured effects of a stressor on tasks, then uses statistical inference to identify the impact of those stressors on hypothesized skill taxa, and finally uses those taxa to make predictions about new tasks. The method allows predictions about stressors to be made without having to measure each particular task in both stressed and unstressed conditions. Taxa serve as a set of latent predictor variables through which stressed performance is predicted. The T3 method works by computing the impact of a stressor on each of the latent taxa. If we consider a stressor that has just two levels, one would collect data for a set of tasks under both stressed and unstressed conditions. These tasks are then coded according to skill taxa (through task analysis or other means), and after some simple transformations, a regression is performed that estimates the impact that the stressor

S.T. Mueller et al.

has on each skill taxon. For example, one might find that physical fatigue (a stressor) largely impacts gross motor skill, or sleep loss (another stressor) impacts attention. The hypothesized impact of the stressor on a new task can be computed by simulation or, in many cases, simple arithmetic. For the best results, the taxa must be chosen to cover the types of cognitive and physical capabilities that a stressor is likely to impact, and for statistical parsimony, relatively few taxa should be used to avoid overfitting data. For example, one can identify multiple taxons related to vision, and indeed Fleishman (1975) identified seven: near vision, far vision, visual color discrimination, night vision, peripheral vision, depth perception, and glare sensitivity. However, for some stressors only one or two visual taxa may be necessary (the impact of wearing of protective goggles might be captured by just knowing which tasks require vision in the periphery, and so require only a peripheral vision taxon). In the T3 method, any task can be described by the taxa that are important for performing the task, and similarly any stressor can be defined by the amount it impairs processes related to each taxon. A predicted performance decrement for a particular stressor on a particular task can be computed by summing the taxon-related decrements from the stressor, weighted by the relative importance of each taxon for the task. This statistical modeling approach is substantially less detailed than many agentbased modeling systems, but has advantages because it can be tied fairly closely to data, and because the effort for modeling new tasks or systems is fairly minimal (essentially a process of performing task analysis to develop ratings across skill taxa). This simplicity is important for our goal, because the stressors we are interested in impact performance of nearly every operational activity in the military, and so a simple model that can make rough predictions across many tasks is preferred over a complex model that makes detailed predictions about a small range of tasks with the same effort. Figure 1 depicts at a highly simplified level the basic process involved with determining taxon-related decrements for a particular stressor. Suppose that one has data from four basic tasks, performed under both presence and absence of a stressor (as shown in the second column, measured across a series of trials). Each row on the left side of Fig. 1 depicts one such task. Through some method of task analysis, the taxon weights for each task are identified (first column). Next, logarithmic transformations and linear regression is used to identify the impact that the stressor has on each taxon. That is, for each condition, log(timeimpaired /timebaseline ) is computed and regressed against the taxon weightings to produce regression weights relating the stressor to each taxon. The outcome is a set of regression weights that characterize how a one-unit ’dose’ of a particular stressor impacts each taxon. For categorical stressors (such as wearing or not wearing protective gear), the weight represents the direct impact of the stressor; for stressors that can change continuously (e.g., fatigue, sleep loss, drug intoxication, etc.), the weight represents a functional relationship between dose and response. Once this statistical analysis has been performed, decrements on a new task or new level of stress can be computed by reversing the process, as with any linear regression model. For example, consider the task of walking, which we represent as task Ti . Walking is a highly-automated physical task that requires little fine motor or attentional control. Thus, the fine motor and attention taxa would be given ratings of 0 out of 5. One

Fig. 1 Depiction of the data analysis phase of the Task-Taxon-Task method. Data describing the notional decrements a stressor places on performance for multiple tasks are analyzed to infer the decrements that each latent skill taxon is responsible for

Adapting the T3 methodology

S.T. Mueller et al. Table 2 Example weightings (on a 0 to 5 scale) across taxa for a hypothetical task, and hypothetical log-inverse regression coefficients for a particular stressor

Taxon

Task Ti importance weight

Stressor Sj weight

Attention

0

−0.05

Perception

1

−0.01

Physical

3

−0.2

Psychomotor

0

−0.05

Cognitive

1

−0.1

might expect slight impairments from stressors that impact cognitive and perceptual resources, and so those taxa would be given ratings of 1 out of 5. Finally, because it is primarily a physical task, it might be given a moderate rating (of 3) on the physical taxon. This task, Ti , would then be represented as a set of weights (between 0 and 5) relating to the relative importance of these five taxa (e.g., attention, perception, physical, psycho-motor, and cognitive): Ti = [0, 1, 3, 0, 1].

(1)

Similarly, a stressor may impair those taxa to different degrees (with 0 representing no impact, and values smaller than 0 representing greater slowing, in the form of an increase in log(RT 1 /RT 0 ) ratio). Suppose the stressor Sj primarily impacts physical and cognitive taxa, with a smaller impact on attention, psychomotor, and perception. Sj = [−0.05, −0.01, −0.2, −0.05, −0.1]

(2)

Both sets of weights are displayed in Table 2, in rows associated with their appropriate taxon. The ultimate prediction of performance impact is computed as a combination of the stressor and the taxon weight. Here, if task Ti is assumed to take one unit of time in a non-stressed condition, then the T3 model would assume that under stressor Sj , the task would be impacted by a factor of 1/ exp(0(−.05) + 1(−.01) + 3(−.2) + 0(−.05) + 1(−.1)) = 2.03 (i.e., the reciprocal exponentiated weighted sum of the stressor-taxon weights and the task-taxon weights). Thus, the high importance of the physical taxon, coupled with the large impact of the stressor on physical abilities would predict that performance time would essentially double under the stressor. The predicted performance for task i under stressor j can be computed via (3), where j = 0 indicates unstressed performance, and where k indexes different taxa: Tt,k Sj,k timei,0 (3) timei,j = exp − k

The benefit of this method is that once careful assessment of the taxonomic weights is performed for a set of tasks, the impact of a particular stressor can be assessed using standard regression techniques (assuming a wide enough range of input tasks is available). Although the data fitting is a statistical process, the decrements obtained could be used in other types of models, including agent-based simulations, task networks, statistical models, and even cognitive architectures. However, there are


several ways in which the T3 method (as previously implemented) is deficient, and could possibly be augmented to provide more accurate predictions about performance under stress. For example, one limitation stems from the fact that the T3 method was originally designed to predict behavioral decrements from toxic chemicals and radiation, based on a set of mediating symptomatology (such as nausea, disorientation, and headache). Such stressors have large-scale effects that may be well captured by global skill taxons. However, the physical and cognitive strain associated with chemical protective gear might have more acute impacts on performance. For example, one part of the MOPP suit is the gas mask and goggle assembly, which have a well-understood impact on peripheral vision. Another component is butyl-rubber gloves, which impact a number of dexterous behaviors (as we will show in the example). For such stressors, global taxons such as ‘psychomotor’ or ‘perceptual’ may be too crude to make useful predictions about performance decrements, and more detailed sets of taxa might be needed. In addition, as tasks become more complex and the stressors more acute, one may need better representations of tasks to make useful predictions about performance decrements. Describing arbitrary tasks simply as a weighting over a set of taxa may be insufficient, especially if a stressor does not have log-linear effects on task performance. Indeed, there are a number of reasons why a stressor such as protective gear might produce slowing, and not all of them are consistent with the slowing model implicit in the T3 method. A partial list includes: 1. The additional mass of protective gear may simply slow down motor movement. 2. Limited range-of-motion or perception may require new sets of actions (e.g., moving head to see in periphery; avoiding rough terrain). 3. Reduced precision of movement may lead to more errors that need to be corrected (e.g., mistaken key entry on a keyboard). 4. Wearing gear may place the wearer into a ‘novice’ performance mode, eliminating automaticity gains and requiring more deliberate control. 5. Discomfort and self-monitoring (e.g., to ensure seals have not been broken) may draw attentional resources and slow performance in primary tasks. 6. Biophysical metabolic processes (heat, oxygen, blood flow, CO2 or O2 maintenance, etc.) may produce neurophysiological inefficiency or physical fatigue that impacts task performance. 7. The wearer may intentionally and strategically slow down to avoid costly immediate error correction or long-term fatigue. 8. Wearing protective gear may cause loss of consciousness (e.g., through heat strain associated with overworking) or may increase the chances of working until the wearer physically collapses, which (along with potentially permanent health consequences) prevents task completion. This partial list highlights the many ways in which stressors associated with protective gear may impact task performance. Other stressors may share some or all of these impacts on task performance speed, and so it might serve as a basis for a more general list. But overall, many of the ways a stressor can impact performance are poorly suited for the type of slowing model assumed by the T3 method. For example, intentional strategic slowing may work to even out performance over a long

S.T. Mueller et al.

period of time, avoiding initial fast performance initially and later slow performance. So, one may observe early slowing on a task in response to wearing MOPP gear, but the source of that slowing is strategic rather than physical. More critically, there may also be strategic shifts in task performance that stem from limited mobility or limited sensory input. This type of shift may change the operators associated with performing a task, or may change the critical path in a task network, neither of which are consistent with the T3 slowing model. For example, a standard trick for soldiers who operate computer equipment in MOPP gear is to use a pencil to do keyboard entry (because they make too many errors when wearing gloves). Thus, the data entry method fundamentally changes, even though pencil-typing itself is probably unimpaired by wearing gloves. In addition, stressors that impact accuracy may produce highly non-linear effects on certain aspects of a task, because slowing could stem primarily from error correction rather than slowed operation itself. These two deficiencies suggest ways in which the standard T3 method could be augmented to better capture the impact of stressors. First, a more accurate set of taxa might be adopted; and second, more detailed task models might be used, especially ones that incorporate task structure and dependencies. Indeed, several simulation models offer some improvements in these directions, which offer suggestions or even possible implementation platforms for an improved T3 method.

Relationship of the T3 method to other modeling tools The initial models developed for the T3 method (e.g., Anno et al. 1996) formed the basis for how the IMproved Performance Research INtegration Tool (IMPRINT) tool predicts performance decrements (Allender et al. 1997). IMPRINT and its predecessor MANPRINT (see Booher and Minninger 2003) are tools that have been employed by the United States Army Research Laboratory to identify personnel-related constraints on system design and to predict performance during extreme condition. IMPRINT predicts performance decrements for a number of stressors (MOPP, heat, cold, noise, and sleeplessness), and currently uses a set of nine taxons: 1. 2. 3. 4. 5. 6. 7. 8. 9.

Perceptual (visual recognition/discrimination) Cognitive (numerical analysis) Cognitive (information processing/problem solving) Fine motor discrete Fine motor continuous Gross motor light Gross motor heavy Communication (reading and writing) Communication (oral)

It should be noted that IMPRINT does not implement the full T3 statistical modeling process. In the case of its pre-programmed stressors, it can simulate task-based decrements that are mediated by stressor-related impacts on taxa–essentially the transition between the Taxon and Task of the Task-Taxon-Task method. It also enables users to define custom stressors, which are essentially the output of a complete TaskTaxon analysis. However, IMPRINT will not perform the regression, transformation,


or data handling needed to identify the impact of a stressor across skill taxa. It is also important to note that IMPRINT’s skill taxa are fixed, and not customizable by users. Although IMPRINT uses more skill taxa than originally developed by Anno et al. (1996), they may be no better suited to modeling the impact of MOPP gear than the original five. However, IMPRINT has more to offer with respect to its ability to represent complex task structure. In particular, it allows task networks to be defined that specify dependencies among tasks, so that large complex tasks with multiple dependencies can be represented. This task representation may enable many of the more complex impacts of MOPP stressors to be captured by a behavior simulation model. Although IMPRINT implements a degradation model consistent with the T3 method and offers a means to represent complex task structure, researchers might prefer to implement T3 degradation models within their preferred modeling environments. Indeed, there are many modeling platforms that represent tasks with sufficient detail such that incorporating T3 degradation would be relatively simple. These include the Stochastic Activity Network Laboratory for Cognitive Modeling (SANLab-CM; Patton et al. 2009), a recent task network model that is more easily adaptable than IMPRINT; and even cognitive architectures such as ACT-R (e.g., Anderson and Lebiere 1998) and the Executive Process/Interactive Control Architecture (EPIC; Kieras and Meyer 1997) which might be adapted to integrate T3 stressors models. We review some of these possibilities in the Discussion section. Hierarchical and dependency-based task representations can be used even without a fully-implemented cognitive architecture or task network model (see Schweickert et al. 2003). One can use standard hierarchical task analysis methods to identify tasks, and represent the ultimate sequence of operators in a spreadsheet or statistical computing package. Furthermore, if one only wants to compute performance decrements via the T3 model (rather than a detailed cognitive architecture), the task analysis can sometimes be simplified. In the next section, we describe our approach to this type of task analysis.

Task-Goal-Operator-Taxon analysis To develop models that can capture some of the additional complexity of tasks that might be impacted by a stressor, we advocate using structured task analysis methods akin to Goal-Operator-Method-Selection Rule analysis (GOMS; Card et al. 1983; John and Kieras 1996; Gray et al. 1993) by which a task can be represented by its critical path in a subgoal network. In such networks, high-level task goals are decomposed into logical sets of subgoals, so that eventually each subgoal can be accomplished by a primitive cognitive “operator” whose time and resource properties are established by theory or empirical measurement. We advocate an approach to GOMS that decomposes tasks into taxon-pure subtask operators (see Mueller et al. 2009, for more detail), which we refer to as Task-Goal-Operator-Taxon (TGOT) analysis, because it focuses on deriving taxon weightings rather than primitive operators for a particular cognitive architecture. It is similar to other versions of GOMS analysis in that it is based on logical analysis of goals and subgoals which are traced to a set

S.T. Mueller et al.

of operators. However, it differs from flavors of GOMS based on particular sets of operators tied to a cognitive architecture (such as the Keyboard-Level Model; Card et al. 1980) because its goal is to define a set of bottom-level operators that are appropriately represented by a small number of skill taxa, such that a stressor will have a log-linear impact on the time taken to perform that operator. Thus, for many versions of GOMS, an operator may be thought of like a molecule that cannot be broken down further without changing its essence. For TGOT, an operator is more like a mineral sample which is the highest-level of description such that any further subdivision will lead only to identical parts in terms of the taxon distribution. The benefit for using a hierarchical task model along with the T3 method is that it addresses one of the major deficiencies we identified earlier: it has the potential to represent strategic changes in tasks in response to stressors. This, in turn, may help identify a simpler set of skill taxa that are responsible for task slowing. Next, we offer an example where we show how this type of analysis can provide a better model for stressor-related slowing, in the context of human dexterity.

Example: impact of chemical protective gloves on human dexterity As an illustrative example, we will examine how the T3 method can be deployed to model human dexterity data. The original method included only one taxon (psychomotor) that can reasonably be used to describe performance in dexterity tasks. IMPRINT incorporates two taxa (fine motor discrete and fine motor continuous), and assumes that only discrete action is impacted by protective gear. These may be insufficient to represent the way in which gloves slow performance across tasks. As a first step, we present in Table 3 a set of proportional decrements for 18 motor dexterity tasks. In this table, the performance decrement represents the ratio (time with gloves)/(time in bare hands), so that a value of 1.0 would indicate no slowing from gloves, and larger values would indicate larger impacts. The tasks were collected from a large number of sources, and their order in Table 3 is arbitrary. Furthermore, the data sources vary considerably in their reliability, methods, designs, participants, and even the particular version of chemical protective gloves. Some of these factors could be incorporated in a more robust meta-analysis, but for the present demonstration we will treat them each as independent and equal data sources. What can be said about the skill taxa necessary to capture these decrements? First, the one relevant taxon used previously (psychomotor) is probably insufficient. Certainly, one could assume that those tasks with greater decrements simply have higher psychomotor loadings. However, this is probably at odds with the ratings one would give a priori, and so may be of little use. For instance, it is probably unrealistic to say that those manual tasks which see little or no impact from protective gloves do not require psychomotor skill, and it would be difficult to predict a priori which types of tasks will have greater or lesser decrements, especially when decrements for similar tasks can vary so much. The two taxa used by IMPRINT are somewhat better, but they simply assume that ‘continuous’ tasks do not slow, which could capture the small effects on the pursuit rotor and mouse aimed movement, but would miss the mouse tracking impact. We propose that a way to capture these impacts would

Adapting the T3 methodology Table 3 Performance decrements of various tasks. Taxon weights are provided for two skill taxa (grasp and touch), and predicted performance decrements from the resulting model are shown in the final column Taxon rating: touch

Predicted performance decrement

5

1

1.29

5

5

1.6

2

3

1.27

1.2–1.37

3

3

1.33

1.05

1

1

1.09

1.24

3

3

1.33

7. M16A1 Assembly5

1.24

3

3

1.33

8. Find page in book3

1.25

3

3

1.33

9. 1–5 number keypad entry3

1.09

1

1

1.1

10. Hunt-and-peck word typing3

1.22

1

3

1.23

11. Touch word-typing3

2.07

1

5

1.37

12. Typing response3

1.70

1

5

1.37

13. Mouse tracking3

1.15

1

3

1.23

14. Mouse-aimed movement3

1.01

1

1

1.1

15. Cord and Cylinder2,4

1.5–1.76

5

3

1.44

16. Bennet Dexterity test4

1.0–1.09

1

1

1.1

17. Pick up cylinder (20 mm+)3

1.05

1

1

1.1

18. Pick up cylinder (1 to 20 mm)3

1.25

3

3

1.3

Empirical test

Observed performance decrement

1. O’Connor Finger Test1,2,4,5,6

1.14–1.72

2. Purdue Pegboard1,2,6

2.4–3.4

3. Minnesota Dexterity 1 hand3

1.17

4. Minnesota Dexterity-2 hand4 5. Manual Pursuit Rotor1 6. M16A1 Dis-Assembly5

Taxon rating: grasp

1 Bensel et al. (1987); 2 Teixeira and Bensel (1990); 3 Unpublished data by present authors; 4 McGinnis

et al. (1973); 5 Garrett et al. (2006); 6 Johnson and Kobrick (1997). Note: Model fit excluded Purdue pegboard and touch-typing, which we assumed would have strategy shifts in response to protective gloves

be to hypothesize two taxa: one related to grasping, and one related to the sense of touch. Initial ratings on the task for these taxa are provided in Table 3. These ratings were developed in a qualitative manner, and simply represent a reasonable hypothesis based on reading the descriptions of tasks and performing simple task analysis. Further detailed experimental analysis would be required to determine the impacts and separability of these two taxa. The grasping taxon is important because gloves appear to have a moderate impact on picking up small objects (see Task 18, with a 25% decrement), which is a component that is present in many of the tasks in Table 3. Loss of touch-sense could also have a large impact depending on the context, because it may require costly error correction or strategy shifts. We hypothesize that this is partly responsible for the large decrements seen in typing (Tasks 10–12), and indirectly, the Purdue Pegboard Test (Task 2). Here, loss of touch-sense is devastating. It can prevent the alignment of fingers on the keyboard that enables touch-typing, which means that either errors increase from misalignments, or the typist must look at the keyboard instead of the monitor so that the errors one makes are not seen until they are very costly to correct. A typist must choose to either type, check for errors, and then correct errors, or

S.T. Mueller et al.

Fig. 2 Hypothesized subgoals to perform Purdue Pegboard task. Top panel shows performance when not impaired by protective gloves, during which tasks can be overlapped because picking up can be guided by touch rather than vision. Bottom panel shows performance while impaired, during which subtasks must be performed serially because they all require visual attention

slow down to a degree such that errors are not made (perhaps relying on visual and auditory feedback instead of touch sense). Either way, performance will slow substantially. The smallest impact seen on typing tasks was for Task 9: number keypad entry (9% slowing): these were done hunt-and-peck style in both conditions, and the spacing of the number pad is big enough to avoid many mistakes. In essence, numberkeypad entry depends little on touch sense, whereas touch-typing relies heavily on it to know whether one’s fingers are on the correct keys. The Purdue Pegboard Test is interesting because it contains many of the same components measured in other tests, such as picking up small cylinders and placing them in holes or posts, which typically have a performance decrement of about 25%. Yet the Purdue test had a substantial decrement at least ten times larger than these. What, then, can account for the difference? To answer this question, we need to better understand what the task involves. The basic Purdue Pegboard task involves four consecutive operations: 1. pick up and insert post; 2. pick up and insert washer; 3. pick up and insert sleeve; 4. pick up and insert second washer. Each consecutive step is performed by a different hand, so performance may be able to overlap substantially: The top panel of Fig. 2 illustrates how these four tasks may overlap because they use different hands. Total time to perform this task could be modeled as the sum (with p indicating pick-up time and i indicating insert time) of roughly p1 + i1 + i2 + i3 + i4. For performance like this to occur, one needs to assume that these two tasks can be easily overlapped. Without protective gloves, the ’pick up’ subtask might be thought of as performed by two operators, such as: move hand to tray; grasp object by feel. If we were to make a prediction about the performance decrement based on these


operators using the standard T3 methodology, we would find that overall task decrement should be driven by individual decrement for either the insert or pick up task (whichever requires more time). If we assume these operators have decrements of about 25% (consistent with the other similar tasks), the time to perform the overall sequence would increase by about 25%. This, of course, does not match the empirical finding that performance is slowed by a factor of 100–200%. However, task overlapping may not be possible with protective gloves, because limited sensory input will prevent the pick-up task from being done without visual monitoring. Thus, slowing in this task may stem from a shift to a non-overlapping performance strategy necessitated by reduced sensory input. The sequence would be stretched out, as shown in the bottom panel of Fig. 2. Now, each pick up/insert subgoal must be achieved serially, and each of those subcomponents may slow as well. A reasonable estimate for the slowing would be that the task time would double. In addition, each component may increase by 25%, producing an estimated performance impact of 2.5, instead of the 1.25 estimated from each individual operation. This more closely aligns with the observed decrements. To assess the extent to which the two dexterity taxa can account for performance decrements, we applied the T3 method as described by Anno et al. (1996). To estimate the impact I for each task, − log(I ) was computed, ensuring that all decrements would be negative numbers. Next, a linear regression model was fit to predict − log(I ) based on the two performance taxa (“grasp” and “touch”), excluding the Purdue and touch typing tasks because they were thought to involve strategy shifts. Ratings on the performance taxa were obtained by fairly informal task analysis methods, restricting to just three levels of each (1, 3, or 5) in all cases except one, where we used a 2 to maintain an ordinal relationship between two versions of the Minnesota dexterity test. The intercept of the model was set to 0, as a non-zero intercept would simply amount to a generic decrement for all tasks. This regression was reliable, F (2.14) = 55, p < .01 with an adjusted R 2 = .87. The two predictors were reliable (grasp = −0.04, t (14) = −2.8, p = .01; touch = −.054, t (14) = 3.9, p < .01). These coefficient values indicate that each rating unit of the taxon reduces log-inverse-proportional performance by about .04 to .05. A useful engineering approximation for small positive values of p states that exp(−p) approximates 1 + p, which implies that each integer increase of either rating scale slows performance by about 5%. Predicted performance values for each task are also printed in Table 3, along with the predictions for the two excluded tasks (shown in bold). This model tended to underestimate tasks that had large decrements from wearing protective gloves. This is because the slowing model has an upper limit, with a maximum decrement of about 1.6 times baseline (if both taxa were given ratings of 5, 5(−.04) + 5(−.054) = −0.47, and exp(.47) = 1.6). Most likely, to accommodate larger impacts, one would need to incorporate simple notions of strategy shifts (such as those described in the Purdue task), or costly error-recovery processes that are outside the simple slowing model used in the T3 process. As a rough guide, to predict a decrement of 3.0, the Purdue task would need a touch rating value of about 22, which is well beyond the end of our 0–5 scale. With more data or better estimates that might be produced by conducting new experiments under more uniform conditions, one could explore whether superadditive interactions exist between stressor decrements,

S.T. Mueller et al.

or whether additional or alternative stressors would be useful. As such, we have supported our hypothesis and provided useful rules of thumb to inform design decisions, but the results are certainly not definitive or all-encompassing, and additional investigation is needed to more thoroughly test and validate the dexterity model we propose here.

Time and accuracy in the T3 method The majority of tasks we examined in this analysis might be considered to have inelastic accuracy but elastic time properties. That is, under normal circumstances (barring catastrophic failure or deliberate violation of instructions), it is essentially impossible for one to fail at accomplishing a task, but the time taken to complete the task can vary, depending upon factors such as effort, training, and quality of outcome. Tasks that are time-inelastic also exist (for example, “monitor radar for suspicious signals for one hour”). Most interesting time-inelastic tasks are accuracy elastic, otherwise it is difficult to really consider them tasks at all (can “Sit here and do nothing for half an hour” be a task?). However, many tasks are both time-elastic and accuracy-elastic. Such tasks can produce what is known as the speed-accuracy trade-off, which has been studied extensively in basic and applied settings (e.g., Pachella and Pew 1968; Wickelgren 1977; MacKay 1982; Meyer et al. 1990). The main problem posed by speed-accuracy trade-offs is that a stressor may induce basic processing limitations that are managed strategically by performers, so that either response or accuracy might suffer in response to a stressor. Approaches adopting the T3 method have incorporated accuracy in a number of ways. One approach is to model the speed-accuracy function explicitly. Anno (1997) described methods to model the speed-accuracy trade-off in logistic space, so that the relationship between a stressor and the speed-accuracy function could be identified directly. This method requires collecting speed-accuracy trade-off functions, which is often not practical. Similarly, the IMPRINT tool allows stressors to impact both time and accuracy. This is useful for task network simulators, where the large-scale impact of a stressor can be assessed, but it sidesteps the role of strategy in selecting a point along the speed-accuracy function. One can also deal with the problem empirically, designing accuracy-inelastic tasks or transforming accuracy-elastic tasks into accuracy-inelastic tasks. For example, many standard response time tasks can be made accuracy-inelastic by requiring the participant to make the correct response before continuing. This essentially transforms accuracy into time in a procedural way. We advocate an approach akin to this last method based on a concept of work throughput which we call adjusted response time. The notion is similar to concepts advocated by Thorne et al. (1983), Glenn and Parsons (1990, 1992), and Williams et al. (1997), and has its roots in information-theoretic notions of information transfer in skilled tasks. Such work is also rooted in Fitts’s (1966) studies of the speed-accuracy trade-off in aimed movement, and has been applied to understand cognitive stress involved with alcoholism (see Jennings et al. 1976; Rundell and Williams 1979). First, consider tasks that are time-elastic but accuracy-inelastic. If one can define an objective criterion by which one can determine if a task is complete, one can


measure work throughput or productivity—the number of times the activity can be completed in a given unit of time. This measure is analogous to a factory piecework measure, where one might want to know how many widgets that pass a quality control test can be made by a worker during a shift. Such a notion is especially important for physical stressors such as heat stress, which provide an upper limit to the time that work can be performed before rest is needed or the worker collapses. For example, if a soldier wearing chemical protective gear can perform a task with no performance decrement (either time or accuracy) for up to one half hour, but then must rest for 30 minutes to avoid heat stress, their effective productivity is cut in half. If the task takes one minute to perform, but it can only be performed for 30 out of every sixty minutes, its adjusted response time would be 2 minutes, penalizing the necessary rest periods. In this way, adjusted response time handles immediate speed-accuracy trade-offs and long-term work-rest trade-offs in the same fashion. In general, adjusted response time is computed by determining time-on-task, and dividing by the number of successes in that time: RT adj =

Time on task Numbers of successes

(4)

The choice of how time on task should be measured depends upon the goals of the researcher, but might include both work and rest time, or it could include just the sum of the individual response times for different trials of the task, ignoring inter-trial delays, instructions, breaks, and other non-performance times. In a typical laboratory experiment, RT adj is equivalent to the ratio of mean response time and accuracy, which is often used without justification to combine speed and accuracy into an ad hoc performance measure: RT adj =

Time on task/N Numbers of successes/N

(5)

This simple notion of adjusted RT can be useful in laboratory tasks to produce a measure that combines speed and accuracy, but it is equally useful in complex real-world tasks, where error recovery time can have large contributions to total task performance times. For example, compare typing on today’s computer systems, in which a mistaken key entry can be fixed immediately at about the same cost as making the error, versus a manual typewriter where fixing an error can take up to as long as retyping an entire page. On a typewriter, the cost of an error is substantial, whereas on a computer, errors have relatively little cost. As a consequence, typing styles have likely changed substantially, moving to an equilibrium that allows greater total throughput with smaller effort by trading precision for speed. As a consequence, manipulations that impact accuracy (such as using an alternative keyboard layout, or wearing protective gloves) might have much smaller impacts on modern computers than they did on typewriters. Here, the best measure of throughput would be correct words per minute, which can be estimated with adjusted RT if speed and accuracy of typing are known. Overall, adjusted RT can be useful for simple simulations of cognitive decrements, and may be a useful way to model accuracy-elastic tasks within frameworks that focus on time profiles, such as SANLab.

S.T. Mueller et al.

Applications and conclusions The T3 method offers a simple statistical method for predicting coarse decrements across tasks in response to a number of stressors. Although predictions needing finer precision may require agent-based modeling with systems such as EPIC (e.g., Meyer et al. 2001, in the context of age-related stressors), we are developing ways to adapt the process to enable prediction for acute stressors related to MOPP gear, especially those that impact perceptual, motor, and cognitive tasks. The adaptations we advocated here take two forms. First, we are beginning to hypothesize new performance taxa that can be used to account for some of the decrements produced by protective gear. Second, we believe that a more detailed task representation needs to be used, which can at least help identify whether a stressor will induce strategic shifts or costly error recovery processes. These advances allow the T3 method to have broader applicability in a number of distinct modeling and simulation approaches. We will next describe how the T3 method could be integrated into four classes of modeling and simulation approaches. Case 1: Statistical modeling At its core, the T3 method is a statistical modeling approach. Thus, once a set of skill taxa has been defined and the stressor-related task decrements have been inferred, one can use standard regression techniques to infer both the estimated impact, and to some extent, confidence bounds of those predictions. Such an approach may be useful for helping to make specific design or planning decisions, and especially for validation studies of the approach. Importantly, our approach allows the statistical models to be embedded within other models, because it is not tied to a particular modeling framework. Other simulation approaches (see Cases 2–4) that might embed taxonstressor mechanics will add levels of assumptions that are difficult to test, and so the statistical model may still be preferred, but there may be a number of advantages to embedding T3 degradation models within more complex simulations. Case 2: Task-network models Although our modeling framework is closely aligned with the one used by IMPRINT, IMPRINT has a set of hard-coded skill taxa that form the basis of its performance moderation mechanisms. Thus, to the extent that our approach identifies new taxa that are thought to moderate human performance, IMPRINT could be changed to incorporate those taxa. However, we feel a useful improvement to the tool would be to allow custom skill taxons to be defined by the analyst, alongside the custom stressors that are already present. IMPRINT’s approach to modeling performance degradation using a T3 mechanism is a blueprint for applying it to other task networks. However, some alternative approaches may also be useful. As we discussed before, IMPRINT allows one to specify both time and accuracy degradation functions, but not all task networks model successful completion of a task node. Some models (e.g., SANLab-CM; Patton et al. 2009) focus on time to complete nodes: tasks we refer to as accuracy-inelastic. Thus, our approach of using adjusted


RT could be useful in these contexts, because it incorporates the recovery time stemming from increased errors. SANLab-CM itself supports the type of hierarchical task analysis that we advocate, and because its source code is available and modifiable, it offers an ideal system to support flexible simulation of stressors using the T3 method. Case 3: Cognitive architecture models In general, cognitive architecture-based models have not been designed to systematically understand the impact of general psychological stressors. For example, none of the 25 architectures reviewed by Samsonovich (2010a, 2010b) are qualified by having intent to specifically accommodate psychological stressors, although CoJACK (Evertsz et al. 2008), which appears in the review, was developed to handle psychological modifiers using the Belief-Desire-Intent (BDI) modeling paradigm, and a number of other architectures listed there also incorporate means to model stressors, but these abilities may be underappreciated. Silverman et al.’s PMFServ (2006) is another exemplar of an architecture-level model that accounts for a broad range of “performance modifiers”. There are examples of adapting architectures to model particular stressors (e.g., age-related stressors in EPIC, Meyer et al. 2001; sleep-deprivation as a stressor in ACT-R, Gunzelmann et al. 2009), but few approaches identify ways to systematically embed physiological and psychological stressors into widely-used architectures, so that broad-scale impacts of stressors can be assessed based on running previous models under new stressor conditions. Architectures could make different assumptions about how to handle stressorbased decrements, but there are at least two methods we would consider: rule-based stressors and processor-based stressors. For rule-based stressors, one could simply require that every production rule or operator is coded with a weighting across skill taxa, just as tasks in a task network are weighted. The architecture would need to base its calculation of local cycle time on the current stressor configuration, so that cycling may slow or quicken based on the ensemble of rules currently active. There may be mechanisms within different cognitive architectures that can handle this type of processor cycle variability, but even then it would have several drawbacks: past models would have to be re-coded, adding stressor distributions to each rule, and the approach places the onus of stressor-based modeling on the modeler, while these stressors should typically be based at the level of the architecture. A processor-based approach could allow much more generality, while maintain acceptable precision. In such an approach, individual production rules would not be tagged with respect to taxa, because the skill taxa presumably map onto inherent processing structures within the architecture. Instead, principled models of degraded performance could be created such that stressors impacted the processors directly. Some of the stressors associated with MOPP gear are tightly coupled to physical systems that already have detailed models in various cognitive architectures. For example, detailed measures of how protective eyewear occludes peripheral vision have been made, and such a stressor can be done by adjusting or masking zones of a simulated visual field. Perceptual and motor processors could be slowed proportionately based on hypotheses about how particular stressors impact those processors. Attentional lapses and memory loss could be handled by changing decay or activation parameters in ways parametrically related to specific taxon dimensions. Such an approach

S.T. Mueller et al.

could also be useful if models of physiological stress were incorporated into the architecture, so that performance over long periods of time could both simulate the stressors and apply them to cognitive performance. Importantly, the first stage of the T3 method would remain unchanged, but rather than simulating task degradations, one would simulate processor degradation. Case 4: Large-scale agent-based simulations A number of agent-based simulations are used in various planning and training capacities within the military. These include OneSAF (Wittman and Harrison 2001), the FLexible Analysis, Modeling, and Exercise System (FLAMES), dTANK (Morgan et al. 2005), and other similar systems. Typically, these agents are modeled at a much coarser level of fidelity than in architecture-based models. However, many could benefit by integrating simple taxon-based decrements to help predict things like performance under duress, long-term fatigue, environmental conditions, Nuclear/Biological/Chemical attacks, and so on. In terms of simulating the impact of a stressor on large-scale simulations, one would need to embed information within the task or activity database regarding the skill taxon coding for the tasks. In addition, software that computes modified performance based on (3) would have to be implemented, so that performance under stress can be simulated. The T3 method is well-suited for this type of application, because it can be used to quickly calculate the performance impacts for hundreds or thousands of simultaneous agents embedded within such simulations. Summary In this paper, we examined issues related to modeling the psychological and cognitive impacts that chemical protective gear place on individuals. We described a statistical methodology, called the Task-Taxon-Task Method, that has been designed to capture and predict these decrements, and identified several ways in which past application of the method have been inadequate. We provided an example, in the domain of manual dexterity, of how the T3 method can be applied. We also discussed several ways in which the outcome of the T3 model can be applied to other modeling and simulation frameworks, including a proposed performance measure that simultaneously incorporates both time and accuracy to measure work throughput. Our hope is that the work described here may both inform the design of future chemical protective suits, but also will serve as a useful framework to incorporate physiological and physical stressors across broader modeling and simulation systems. Acknowledgements This work was supported by the Defense Threat Reduction Agency/JSTO Project Number CB-08-PRO-05. We thank Mr. Salvatore Clemente and LTC Tim Rittenhouse of DTRA/CBT and Dr. Richard Gonzalez for oversight and support. Cleared for public release, distribution unlimited.

References Allender L, Salvi L, Promisel D (1997) Evaluation of human performance under diverse conditions via modeling technology. In: Proceedings of the workshop on emerging technologies in human engineering testing and evaluation. NATO Research Study Group 24, Brussels

Adapting the T3 methodology Anderson JR, Lebiere C (1998) The atomic components of thought. Erlbaum, Mahwah Anno GH, Dore MA, Roth TJ (1996) Taxonomic model for performance degradation in combat tasks. DNA-TR-95-115, Defense Nuclear Agency, Alexandria, VA Anno GH (1997) Speed-accuracy measures for distributed interactive simulation of nuclear weapons effects. Technical report for Defense Special Weapons Agency, DSWA01-97-M-0309 Bensel CK, Teixeira RA, Kaplan DB (1987) The effects of US Army chemical protective clothing on speech intelligibility, visual field, body mobility, and psychomotor coordination of men. United States Army Natick Research. Development, and Engineering Center technical report TR-87-037, Natick, MA Booher HR, Minninger J (2003) Human systems integration in army systems acquisition. In: Booher HR (ed) Handbook of human systems integration. Wiley-Interscience, Hoboken, pp 663–698 Broadhurst PL (1959) The interaction of task difficulty and motivation: The Yerkes-Dodson Law revived. Acta Psychol 16:321–338 Card SK, Moran TP, Newell A (1980) The keystroke-level model for user performance time with interactive systems. Commun ACM 23(7):396–410 Card SK, Moran TP, Newell A (1983) The psychology of human computer interaction. Lawrence Erlbaum Associates, Hillsdale. ISBN 0-89859-859-1 Davis EG, Wick CH, Salvi L, Kash HM (1990) Soldier performance of military operational tasks conducted while wearing chemical individual protective equipment (IPE): Data analysis in support of the revision of the U.S. Army Field Manual on NBC Protection (FM 3–4). Ballistic Research Laboratory technical report BRL-TR-3155, Aberdeen Proving Ground, MD. DTIC Document ADA230157 Evertsz R, Busetta P, Pedrotti M, Ritter FE, Bittner JL (2008) CoJACK-Achieving principled behaviour variation in a moderated cognitive architecture. In: Proceedings of the 17th conference on behavior representation in modeling and simulation, 08-BRIMS-025. University of Central Florida, Orlando, pp 80–89 Fitts PM (1966) Cognitive aspects of information processing: III. Set for speed versus accuracy. J Exp Psychol 71(6):849–857 Fleishman EA, Kinkade RG, Chambers AN (1968) Development of a taxonomy of human performance: A review of the first year’s progress. Technical progress report for Advanced Research Projects Agency Order #1032. DTIC Document AD0684583 Fleishman EA (1975) Toward a taxonomy of human performance. Am Psychol 30(12):1127–1149 Fleishman EA, Quaintance MK (1984) Taxonomies of human performance. Academic Press, Orlando. With the assistance of Broedling LA Garrett L, Jarboe N, Patton DJ, Mullins LL (2006) The effects of encapsulation on dismounted warrior performance. Army Research Laboratory technical report ARL-TR-3789, Aberdeen Proving Ground, MD. DTIC Document ADA448610 Glenn SW, Parsons OA (1990) The role of time in neuropsychological performance: investigating and application in an alcoholic population. Clin Neuropsychol 4:344–354 Glenn SW, Parsons OA (1992) Neuropsychological efficiency measures in male and female alcoholics. J Stud Alcohol 53(6):546–552 Gray WD, John BE, Atwood ME (1993) Project Ernestine: validating a GOMS analysis for predicting and explaining real-world task performance. Hum-Comput Interact 8(3):237–309 Gunzelmann G, Gross JB, Gluck KA, Dinges DF (2009) Sleep deprivation and sustained attention performance: integrating mathematical and cognitive modeling. Cogn Sci 33(5):880–910 Hogan J, Broach D, Salas E (1990) Development of a task information taxonomy for human performance systems. Mil Psychol 2(1):1–19 Jennings JR, Wood CC, Lawrence BE (1976) Effects of graded doses of alcohol on speed-accuracy tradeoff in choice reaction time. Percept Psychophys 19(1):85–91 John BE, Kieras DE (1996) The GOMS family of user interface analysis techniques: comparison and contrast. ACM Trans Comput-Hum Interact 3(4):320–351 Johnson R, Kobrick J (1997) Effects of wearing chemical protective clothing on rifle marksmanship and on sensory and psychomotor tasks. Mil Psychol 9(4):301–314 Kieras D, Meyer DE (1997) An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Hum-Comput Interact 12:391–438 Klein GA, Ross KG, Moon BM, Klein DE, Hoffman RR, Hollnagel E (2003) Macrocognition. IEEE Intell Syst May–June:81–85 McGinnis JS, Bensel CK, Lockhart JM (1973) Dexterity afforded by CB protective gloves. Technical report 75-35-PR, US Army Natick Laboratories, Natick, MA, DTIC Document AD0759123

S.T. Mueller et al. MacKay DG (1982) The problems of flexibility, fluency, and speed-accuracy trade-off in skilled behavior. Psychol Rev 89(5):483–506 Meyer DE, Glass JM, Mueller ST, Seymour TL, Kieras DE (2001) Executive-process interactive control: a unified computational theory for answering 20 questions (and more) about cognitive ageing. Eur J Cogn Psychol 13(1):123–164 Meyer DE, Smith JEK, Kornblum S, Abrams RA, Wright CE (1990) Speed-accuracy tradeoffs in aimed movements: toward a theory of rapid voluntary action. In: Attention and performance XIII, pp 173– 226 Morgan GP, Ritter FE, Stevenson WE, Schenck IN, Cohen MA (2005) dTank: an environment for architectural comparisons of competitive agents. In: Proceedings of the 14th conference on behavior representation in modeling and simulation, 05-BRIMS-043. University of Central Florida, Orlando, pp 133–140 Mueller ST, Jones M, Minnery BS, Hiland JMH (2007) The BICA Cognitive Decathlon: a test suite for biologically-inspired cognitive agents. In: Proceedings of the 16th conference on behavior representation in modeling and simulation, 07-BRIMS-015. Simulation Interoperability Standards Organization, Orlando Mueller ST, Zimmerman LA, McClellan GE, Crary DJ, Cheng K (2009) Cognitive performance prediction with the T3 methodology. Technical progress report for Defense Threat Reduction Agency, contract # HDTRA-1-08-C-0025 Neades M, Klopcic JT, Davis EG (1998) New methodology for the assessment of battlefield insults and injuries on the performance of Army, Navy, and Air Force military tasks. In: Proceedings of the RTO HFM meeting on models for aircrew safety assessment, RTO MP-20, p 28 Pachella RG, Pew RW (1968) Speed-accuracy tradeoff in reaction time: effect of discrete criterion times. J Exp Psychol 76:19–24 Parks S (2006) Inside HELP, Administrative and reference manual. VORT Corp, Palo Alto. ISBN 0-89718097-6 Patton EW, Gray WD, Schoelles MJ (2009) SANLab-CM: the stochastic activity network laboratory for cognitive modeling. In: Human factors and ergonomics society annual meeting proceedings, vol 53, pp 1654–1658 Ramirez T (1997) Modeling military task performance for Army and Air Force personnel wearing chemical protective clothing. Mil Psychol 9(4):375–393 Ritter FE, Reifers AL, Klein LC, Schoelles MJ (2007) Lessons from defining theories of stress. In: Gray WD (ed) Integrated Models of Cognitive Systems (IMoCS). Oxford University Press, New York, pp 254–262 Rundell OH, Williams HL (1979) Alcohol and speed-accuracy tradeoff. Hum Factors 21(4):433–443 Salvi L (2001) Development of Improved Performance Research Integration Tool (IMPRINT) performance degradation factors for the Air Warrior Program. Army Research Laboratory Technical Report, Aberdeen Proving Ground, MD. DTIC Document ADA387840 Samsonovich A (2010a) Comparative table of cognitive architectures. Retrieved from http://bicasociety. org/cogarch/ Samsonovich AV (2010b) Toward a unified catalog of implemented cognitive architectures. In: Biologically inspired cognitive architectures 2010. IOS Press, Amsterdam, pp 195–244 Sanders AF (1983) Towards a model of stress and human performance. Acta Psychol 53(1):61–97 Schweickert R, Fisher DL, Proctor RW (2003) Steps toward building mathematical and computer models from cognitive task analyses. Hum Factors 45(1):77–103 Silverman BG, Johns M, Cornwell J, O’Brien K (2006) Human behavior models for agents in simulators and games: part I: enabling science with PMFserv. Presence 15(2):139–162 Teixeira RA, Bensel CK (1990) The effects of chemical protective gloves and glove liners on manual dexterity. US Army Natick Research, Development and Engineering Center Technical Report NATICK/TR/91/002, Natick, MA. DTIC Document ADA231250 Thorne D, Genser S, Sing H, Hegge F (1983) Plumbing human performance limits during 72 hours of high task load. In: Forshaw SE (ed) Proceedings of the 24th DRG seminar on the human as a limiting element in military systems, vol 1. NATO Defense Research Group, Toronto, pp 17–40 (NATO-DRG Rep No SD-A-DR(83) 170) Wickelgren WA (1977) Speed-accuracy tradeoff and information processing dynamics. Acta Psychol 41(1):67–85 Wickens CD (2002) Multiple resources and performance prediction. Theor Issues Ergon 3(2):159–177

Adapting the T3 methodology Wittman J, Harrison CT (2001) OneSAF: a product line approach to simulation development. Technical Report 01E-SIW-061, MITRE Corporation and US Army Simulation, Training, and Instrumentation Command. DTIC Document ADA460127 Williams D, Englund C, Suces A, Overson M (1997) Effects of chemical protective clothing, exercise, and diphenhydramine on cognitive performance during sleep deprivation. Mil Psychol 9(4):329–358 Yerkes RM, Dodson JD (1908) The relation of strength of stimulus to rapidity of habit-formation. J Comp Neurol Psychol 18:459–482

Shane T. Mueller is Senior Research Scientist at Applied Research Associates in Fairborn, OH. He specializes in measuring and modeling human behavior, with emphasis in decision making, human performance, and memory. He is the developer of the PEBL test battery for measuring psychology performance. Benjamin Simpkins is a Research Scientist at Applied Research Associates in Fairborn, OH. He specializes in experimentation, cultural modeling, and human-centered system development. George Anno is an independent consultant on this research effort. He is the originator of the T3 method, and specializes in modeling the impacts of chemical and biological stressors on human performance. Corey K. Fallon is a Staff Research Scientist at Applied Research Associates in Fairborn, OH. He specializes in applied human factors, with an emphasis on understanding the impact of new technology on individual and team workflow. Owen Price is Senior Research Scientist at Applied Research Associates in Raleigh, NC, where he develops the Multiple-Path Particle Dosimetry (MPPD) model, and assists with modeling and software development for other projects in the Health Effects and Medical Response group. Gene E. McClellan is Principal Research Scientist at Applied Research Associates in Arlington, VA. He is Director of the Health Effects/Medical Response group, and has led efforts for estimating battle casualties from NBC attacks in support of U.S. and NATO medical defense planners, and was Program Manager for the development of the medical NBC Casualty and Resource Estimation Support Tool (NBC CREST) for medical planning.