@virginia.edu
University of Virginia
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Adam M. Houser and Matthew L. Bolton
Informa UK Limited
Matthew L. Bolton, Elliot Biltekoff, and Laura Humphrey
Institute of Electrical and Electronics Engineers (IEEE)
Human mental workload can profoundly impact human performance and is thus an important consideration in the design and operation of many systems. The standard method for assessing human mental workload is the NASA Task Load Index (NASA-TLX). This involves a human operator subjectively rating a task based on six dimensions. These dimensions are combined into a single workload score using one of two methods: scaling and summing the dimensions (where scales are derived from a paired comparisons procedure) or averaging dimensions together. Despite its widespread use, the level of measurement of NASA-TLX's dimensions and its computed workload score has not been investigated. Additionally, nobody has researched whether NASA-TLX's two approaches for computing overall workload are mathematically meaningful with respect to the constituent dimensions' levels of measurement. This is a serious deficiency. Knowing what the level of measurement is for NASA-TLX scores will determine what mathematics can be meaningfully applied to them. Furthermore, if NASA-TLX workload syntheses are mathematically meaningless, then the measure lacks construct validity. The research presented in this article used a previously developed method to evaluate the level of measurement of NASA-TLX workload and its dimensions. Results show that the dimensions can, in most situations, be treated as interval in population analyses and ordinal for individuals. Our results also suggest that the methods for combining dimensions into workload scores are meaningless. We recommend that analysts evaluate the dimensions of NASA-TLX without combining them.
Matthew L. Bolton, Skye Solace Taylor, and Laura Humphrey
IEEE
Mental workload extremes are associated with poor human performance and safety problems across safety critical domains. Mental workload is a complex, difficult-to-predict phenomenon, where issues may only arise due to concurrency between resource-conflicting tasks. This research addresses this deficiency by presenting a novel method for using model checking (for performing formal proofs about concurrent systems) to predict mental workload. Our method combines multiple resource theory and formal methods based on hierarchical task analysis to identify mental workload extremes in a complex system. This paper presents this method and shows preliminary validation using a texting and driving task. Implications of our results and future research are discussed.
Matthew L. Bolton, Svetlana Riabova, Yeonbin Son, and Eunsuk Kang
IEEE
Previous research has shown how statistical model checking can be used with human task behavior modeling and human reliability analysis to make realistic predictions about human errors and error rates. However, these efforts have not accounted for the impact that design changes can have on human reliability. In this research, we address this deficiency by using similarity theory from human cognitive modeling. This replicates how negative transfer can cause people to perform old task behaviors on modified systems. We present details about how this approach was realized with the PRISM model checker and the enhanced operator function model. We report results of a validation exercise using an application from the literature. We discuss the implications of our results and describe future research.
Changjian Zhang, Tarang Saluja, Rômulo Meira-Góes, Matthew Bolton, David Garlan, and Eunsuk Kang
IEEE
Modern software systems are deployed in a highly dynamic, uncertain environment. Ideally, a system that is robust should be capable of establishing its most critical requirements even in the presence of possible deviations in the environment. We propose a technique called behavioral robustification, which involves systematically and rigorously improving the robustness of a design against potential deviations. Given behavioral models of a system and its environment, along with a set of user-specified deviations, our robustification method produces a redesign that is capable of satisfying a desired property even when the environment exhibits those deviations. In particular, we describe how the robustification problem can be formulated as a multi-objective optimization problem, where the goal is to restrict the deviating environment from causing a violation of a desired property, while maximizing the amount of existing functionality and minimizing the cost of changes to the original design. We demonstrate the effectiveness of our approach on case studies involving the robustness of an electronic voting machine and safety-critical interfaces.
Matthew L. Bolton
Institute of Electrical and Electronics Engineers (IEEE)
Matthew Bolton, Elliot Biltekoff, and Laura Humphrey
Institute of Electrical and Electronics Engineers (IEEE)
Situation awareness (SA), a measure of how well a person understands the situation, is frequently used to evaluate the safety and effectiveness of critical systems that depend on human behavior. While there are objective ways of measuring SA, subjective assessments, such as the SA rating technique (SART), are still widely used. However, it is not clear what the level of measurement is for SART-measured SA or its constituent dimensions This is a significant gap because the level of measurement determines what mathematics and statistics can be meaningfully used to synthesize and evaluate measures. This research uses a previously developed method for determining the level of measurement of psychometric ratings to evaluate the level of measurement of SART and its elements. Results show that all of the dimensions of SA can be treated as interval in most situations, but that each is on a separate interval scale. This result casts doubt on the validity of the formula SART uses to compute SA from its subcomponents. We ultimately discuss our results and explore future research directions.
Matthew L. Bolton, Judy R. Edworthy, and Andrew D. Boyd
SAGE Publications
Objective In this work, we systematically evaluated the reserved alarm sounds of the IEC 60601-1-8 international medical alarm standard to determine when and how they can be totally and partially masked. Background IEC 60601-1-8 gives engineers instruction for creating human-perceivable auditory medical alarms. This includes reserved alarm sounds: common types of alarms where each is a tonal melody. Even when this standard is honored, practitioners still fail to hear alarms, causing practitioner nonresponse and, thus, potential patient harm. Simultaneous masking, a condition where one or more alarms is imperceptible in the presence of other concurrently sounding alarms due to limitations of the human sensory system, is partially responsible for this. Methods In this research, we use automated proof techniques to determine if masking can occur in a modeled configuration of medical alarms. This allows us to determine when and how reserved alarm sound can mask other reserved alarms and to explore parameters to address discovered problems. Results We report the minimum number of other alarm sounds it takes to both totally and partially mask each of the high-, medium-, and low-priority alarm sounds from the standard. Conclusions Significant masking problems were found for both the total and partial masking of high-, medium-, and low-priority reserved alarm sounds. Application We show that discovered problems can be mitigated by setting alarm volumes to standard values based on priority level and by randomizing the timing of alarm tones.
Matthew L. Bolton, Elliot Biltekoff, and Kevin Byrne
IEEE
Formal human-machine analyses based around human mental models have shown great utility for discovering when and how people may develop mode confusion and be surprised or disoriented by automated behavior. Such analyses represent mental models with finite state machine formalisms. These are limited in that they assume unrealistic precision in how people thing about states and input events. This paper proposing a new formalism called Fuzzy Mental Model Finite State Machines (FMMFSMs). FMMFSMs combine state machine modeling with fuzzy logic to allow for precise reasoning about the imprecision of human mental model states and inputs. This has the potential to enable formal mental model analyses to support traditional mode confusion, but also account for things like drift: where the humans mental model changes over time due to stagnant or slowly changing conditions. This paper presents the FMMFSMs formalism and illustrates is potential for finding mode confusion, automation surprise, and trust disruption with an automobile automation application. Implications of these developments and future research are discussed.
Matthew L. Bolton
SAGE Publications
There is currently significant research and industry interest in engineering machines and algorithms that humans will trust. This is justified as a means for facilitating the adoption of developing technology. However, there are many problems with trust that directly relate to its epistemological validity, usefulness, ethical implications, and potential for human disempowerment. This article explores trust from this perspective in the hopes of encouraging the human factors engineering community to de-emphasize trust as an end goal and replace it with more objective measures and good human factors engineering practices.
Matthew L. Bolton
Springer International Publishing
Matthew L. Bolton, Xi Zheng, and Eunsuk Kang
Elsevier BV
Pengyuan Wan and Matthew L. Bolton
IEEE
A forcing function is an intervention for constraining human behavior. However, the literature describing forcing functions provides little guidance for when and how to apply forcing functions or their associated trade-offs. In this paper, we address these shortcomings by introducing a novel taxonomy of forcing functions. This taxonomy extends the previous methods in four ways. First, it identifies two levels of forcing function solidity: hard forcing functions, which explicitly enforce constraints through the system, and soft forcing functions, which convey or communicate constraints. Second, each solidity level is decomposed into specific types. Third, the taxonomy hierarchically ranks forcing function solidities and types based on trade-offs of constraint and resilience. Fourth, for hard forcing functions, our taxonomy offers formal guidance for identifying the minimally constraining intervention that will prevent a specific error from occurring. We validated the ability of our method to identify effective error interventions by applying it to systems with known errors from the literature. We then compared the solutions offered by our method to known, effective interventions. We discuss our results and offer suggestions for further developments in future research.
Xi Zheng, Matthew L. Bolton, and Christopher Daly
Elsevier BV
Xi Zheng, Matthew L. Bolton, Christopher Daly, and Elliot Biltekoff
Elsevier BV
Matthew L. Bolton, Xi Zheng, Meng Li, Judy Reed Edworthy, and Andrew D. Boyd
SAGE Publications
Objective This research investigated whether the psychoacoustics of simultaneous masking, which are integral to a model-checking-based method, previously developed for detecting perceivability problems in alarm configurations, could predict when IEC 60601-1-8-compliant medical alarm sounds are audible. Background The tonal nature of sounds prescribed by IEC 60601-1-8 makes them potentially susceptible to simultaneous masking: where concurrent sounds render one or more inaudible due to human sensory limitations. No work has experimentally assessed whether the psychoacoustics of simultaneous masking accurately predict IEC 60601-1-8 alarm perceivability. Method In two signal detection experiments, 28 nursing students judged whether alarm sounds were present in collections of concurrently sounding standard-compliant tones. The first experiment used alarm sounds with single-frequency (primary harmonic) tones. The second experiment’s sounds included the additional, standard-required frequencies (often called subharmonics). T tests compared miss, false alarm, sensitivity, and bias measures between masking and nonmasking conditions and between the two experiments. Results Miss rates were significantly higher and sensitivity was significantly lower for the masking condition than for the nonmasking one. There were no significant differences between the measures of the two experiments. Conclusion These results validate the predictions of the psychoacoustics of simultaneous masking for medical alarms and the masking detection capabilities of our method that relies on them. The results also show that masking of an alarm’s primary harmonic is sufficient to make an alarm sound indistinguishable. Application Findings have profound implications for medical alarm design, the international standard, and masking detection methods.
Xiaomei Wang, Ann M. Bisantz, Matthew L. Bolton, Lora Cavuoto, and Varun Chandola
SAGE Publications
The reach of artificial intelligence continues to grow, particularly with the expansion of machine learning techniques that capitalize on increased computing power. Such systems could have tremendous benefits by providing predictions and suggestions. However, they are limited by the fact that they offer incomplete explanations of their predictions to human decision makers. The objective of this work was to summarize general information that could help users make judgments about whether a system is trustworthy and whether the system’s training “makes sense.” A preliminary study was summarized to show the importance of iterative design and testing for visualizing explanations.
Jiajun Wei, Matthew L. Bolton, and Laura Humphrey
Informa UK Limited
Abstract Psychometrics are increasingly used to evaluate trust in the automation of systems, many of them safety-critical. There is no consensus on what the highest level of measurement is for trust. This is important as the level of measurement determines what mathematics and statistics can be meaningfully applied to ratings. In this work, we introduce a new method for determining what the maximum level of measurement is for psychometrically assessed phenomenon. We use this to determine the level of measurement of trust in automation using human ratings about the behaviour of unmanned aerial systems performing search tasks. Results show that trust is best represented at an ordinal level and that it can be treated as interval in most situations. It is unlikely that trust in automation can be considered ratio. We discuss these results, their implications, and future research.
Changxu Wu, L. Rothrock and M. Bolton
Institute of Electrical and Electronics Engineers (IEEE)
HUMAN performance modeling (HPM) is a method of quantifying human behavior, cognition, and processes; a tool used by human factors researchers and practitioners for both the analysis of human function and for the development of systems designed for optimal user experience and interaction. Different from data-driven approaches (e.g., neural networks), most HPM approaches use “top-down” modeling methods based on the fundamental mechanisms of human cognition and behavior. This special issue introduces several human performance modeling articles that make use of mathematical modeling, production systems, and formal methods. We hope that readers of the special issue can benefit from the variety of modeling articles using different modeling approaches. In the following paragraphs, we summarize the features of each modeling approach and briefly introduce the articles in this special issue.
Matthew L. Bolton, Kylie A. Molinaro, and Adam M. Houser
Elsevier BV
Jiajun Wei, Matthew L. Bolton, and Laura Humphrey
SAGE Publications
Psychometrics are increasingly being used to evaluate trust in the automation of safety-critical systems. There is no consensus on what the highest level of measurement is for psychometric trust. This is important as the level of measurement determines what mathematics and statistics can be meaningfully applied to ratings. In this work, we introduce a new method for determining what the maximum level of measurement is for psychometric ratings. We use this to assess the level of measurement of trust in automation using human ratings about the behavior of unmanned aerial systems performing search tasks. Results show that trust is best represented at an ordinal level and that it can be treated as interval in most situations. It is unlikely that trust in automation ratings are ratio. We discuss these results, their implications, and future research.
Meng Li and Matthew L. Bolton
SAGE Publications
Testing is an effective approach for finding discrepancies between intended and actual system behavior. However, the complexity of modern system can make it difficult for analysts to anticipate all the interactions that need to be tested. This is particularly true for human-interactive systems where humans may do things that were not anticipated by analysts. We address this by introducing a novel approach to automated test case generation for human-machine interaction. We do this by combining formal models of human-machine interfaces with formal models of human task behavior. We then use the robust search capabilities of model checking to generate test sequences guaranteed to satisfy test coverage criteria. We demonstrate the capabilities of our approach with of a pod-based coffee machine. Results and future research are discussed.
Kylie A. Molinaro and Matthew L. Bolton
SAGE Publications
With the growing threat of phishing emails and the limited effectiveness of current mitigation approaches, there is an urgent need to better understand what leads to phishing victimization. There is a limited body of phishing research that identified cognitive automaticity as a potential factor, but more research on the relationship between user cognition and victimization is needed. Additionally, the current phishing research has not considered the characteristics of the environment in which phishing judgments are made. To fill these gaps, this work used the analysis capabilities afforded by the double system lens model (a judgment analysis technique) and the cognitive continuum theory, specifically the task continuum index and the cognitive continuum index. By calculating a task continuum index score, the cognition best suited for the email sorting task was identified. This calculation resulted in a value which indicated that more analytical cognition was most effective. The cognitive continuum index score evaluated the participants’s cognition level while making judgments. The relationships between these measures and achievement were evaluated. Results indicated that more analytical cognition was associated with lower rates of phishing victimization. This work provides a deeper insight into the phishing problem and has implications for combating phishing.
Lanssie M. Ma, Adam Houser, Karen M. Feigh, and Matthew Bolton
American Institute of Aeronautics and Astronautics