Matthew L. Bolton

@virginia.edu

University of Virginia

https://researchid.co/mbolton

Scopus Publications

1673

Scholar Citations

Scholar h-index

Scholar i10-index

Scopus Publications

Formal Mental Models for Human-Centered Cybersecurity
Adam M. Houser and Matthew L. Bolton
Informa UK Limited

RL-HAT: A New Framework for Understanding Human-Agent Teaming

The Mathematical Meaninglessness of the NASA Task Load Index: A Level of Measurement Analysis
Matthew L. Bolton, Elliot Biltekoff, and Laura Humphrey
Institute of Electrical and Electronics Engineers (IEEE)
Human mental workload can profoundly impact human performance and is thus an important consideration in the design and operation of many systems. The standard method for assessing human mental workload is the NASA Task Load Index (NASA-TLX). This involves a human operator subjectively rating a task based on six dimensions. These dimensions are combined into a single workload score using one of two methods: scaling and summing the dimensions (where scales are derived from a paired comparisons procedure) or averaging dimensions together. Despite its widespread use, the level of measurement of NASA-TLX's dimensions and its computed workload score has not been investigated. Additionally, nobody has researched whether NASA-TLX's two approaches for computing overall workload are mathematically meaningful with respect to the constituent dimensions' levels of measurement. This is a serious deficiency. Knowing what the level of measurement is for NASA-TLX scores will determine what mathematics can be meaningfully applied to them. Furthermore, if NASA-TLX workload syntheses are mathematically meaningless, then the measure lacks construct validity. The research presented in this article used a previously developed method to evaluate the level of measurement of NASA-TLX workload and its dimensions. Results show that the dimensions can, in most situations, be treated as interval in population analyses and ordinal for individuals. Our results also suggest that the methods for combining dimensions into workload scores are meaningless. We recommend that analysts evaluate the dimensions of NASA-TLX without combining them.

A Formal Method for Assessing Mental Workload
Matthew L. Bolton, Skye Solace Taylor, and Laura Humphrey
IEEE
Mental workload extremes are associated with poor human performance and safety problems across safety critical domains. Mental workload is a complex, difficult-to-predict phenomenon, where issues may only arise due to concurrency between resource-conflicting tasks. This research addresses this deficiency by presenting a novel method for using model checking (for performing formal proofs about concurrent systems) to predict mental workload. Our method combines multiple resource theory and formal methods based on hierarchical task analysis to identify mental workload extremes in a complex system. This paper presents this method and shows preliminary validation using a texting and driving task. Implications of our results and future research are discussed.

Negative Transfer in Task-Based Human Reliability Analysis: A Formal Methods Approach
Matthew L. Bolton, Svetlana Riabova, Yeonbin Son, and Eunsuk Kang
IEEE
Previous research has shown how statistical model checking can be used with human task behavior modeling and human reliability analysis to make realistic predictions about human errors and error rates. However, these efforts have not accounted for the impact that design changes can have on human reliability. In this research, we address this deficiency by using similarity theory from human cognitive modeling. This replicates how negative transfer can cause people to perform old task behaviors on modified systems. We present details about how this approach was realized with the PRISM model checker and the enhanced operator function model. We report results of a validation exercise using an application from the literature. We discuss the implications of our results and describe future research.

Robustification of Behavioral Designs against Environmental Deviations
Changjian Zhang, Tarang Saluja, Rômulo Meira-Góes, Matthew Bolton, David Garlan, and Eunsuk Kang
IEEE
Modern software systems are deployed in a highly dynamic, uncertain environment. Ideally, a system that is robust should be capable of establishing its most critical requirements even in the presence of possible deviations in the environment. We propose a technique called behavioral robustification, which involves systematically and rigorously improving the robustness of a design against potential deviations. Given behavioral models of a system and its environment, along with a set of user-specified deviations, our robustification method produces a redesign that is capable of satisfying a desired property even when the environment exhibits those deviations. In particular, we describe how the robustification problem can be formulated as a multi-objective optimization problem, where the goal is to restrict the deviating environment from causing a violation of a desired property, while maximizing the amount of existing functionality and minimizing the cost of changes to the original design. We demonstrate the effectiveness of our approach on case studies involving the robustness of an electronic voting machine and safety-critical interfaces.

Humanistic Engineering: Engineering for the People
Matthew L. Bolton
Institute of Electrical and Electronics Engineers (IEEE)

The Level of Measurement of Subjective Situation Awareness and Its Dimensions in the Situation Awareness Rating Technique (SART)
Matthew Bolton, Elliot Biltekoff, and Laura Humphrey
Institute of Electrical and Electronics Engineers (IEEE)
Situation awareness (SA), a measure of how well a person understands the situation, is frequently used to evaluate the safety and effectiveness of critical systems that depend on human behavior. While there are objective ways of measuring SA, subjective assessments, such as the SA rating technique (SART), are still widely used. However, it is not clear what the level of measurement is for SART-measured SA or its constituent dimensions This is a significant gap because the level of measurement determines what mathematics and statistics can be meaningfully used to synthesize and evaluate measures. This research uses a previously developed method for determining the level of measurement of psychometric ratings to evaluate the level of measurement of SART and its elements. Results show that all of the dimensions of SA can be treated as interval in most situations, but that each is on a separate interval scale. This result casts doubt on the validity of the formula SART uses to compute SA from its subcomponents. We ultimately discuss our results and explore future research directions.

Masking Between Reserved Alarm Sounds of the IEC 60601-1-8 International Medical Alarm Standard: A Systematic, Formal Analysis
Matthew L. Bolton, Judy R. Edworthy, and Andrew D. Boyd
SAGE Publications
Objective In this work, we systematically evaluated the reserved alarm sounds of the IEC 60601-1-8 international medical alarm standard to determine when and how they can be totally and partially masked. Background IEC 60601-1-8 gives engineers instruction for creating human-perceivable auditory medical alarms. This includes reserved alarm sounds: common types of alarms where each is a tonal melody. Even when this standard is honored, practitioners still fail to hear alarms, causing practitioner nonresponse and, thus, potential patient harm. Simultaneous masking, a condition where one or more alarms is imperceptible in the presence of other concurrently sounding alarms due to limitations of the human sensory system, is partially responsible for this. Methods In this research, we use automated proof techniques to determine if masking can occur in a modeled configuration of medical alarms. This allows us to determine when and how reserved alarm sound can mask other reserved alarms and to explore parameters to address discovered problems. Results We report the minimum number of other alarm sounds it takes to both totally and partially mask each of the high-, medium-, and low-priority alarm sounds from the standard. Conclusions Significant masking problems were found for both the total and partial masking of high-, medium-, and low-priority reserved alarm sounds. Application We show that discovered problems can be mitigated by setting alarm volumes to standard values based on priority level and by randomizing the timing of alarm tones.

Fuzzy Mental Model Finite State Machines: A Mental Modeling Formalism for Assessing Mode Confusion and Human-machine 'Trust'
Matthew L. Bolton, Elliot Biltekoff, and Kevin Byrne
IEEE
Formal human-machine analyses based around human mental models have shown great utility for discovering when and how people may develop mode confusion and be surprised or disoriented by automated behavior. Such analyses represent mental models with finite state machine formalisms. These are limited in that they assume unrealistic precision in how people thing about states and input events. This paper proposing a new formalism called Fuzzy Mental Model Finite State Machines (FMMFSMs). FMMFSMs combine state machine modeling with fuzzy logic to allow for precise reasoning about the imprecision of human mental model states and inputs. This has the potential to enable formal mental model analyses to support traditional mode confusion, but also account for things like drift: where the humans mental model changes over time due to stagnant or slowly changing conditions. This paper presents the FMMFSMs formalism and illustrates is potential for finding mode confusion, automation surprise, and trust disruption with an automobile automation application. Implications of these developments and future research are discussed.

Trust is Not a Virtue: Why We Should Not Trust Trust
Matthew L. Bolton
SAGE Publications
There is currently significant research and industry interest in engineering machines and algorithms that humans will trust. This is justified as a means for facilitating the adoption of developing technology. However, there are many problems with trust that directly relate to its epistemological validity, usefulness, ethical implications, and potential for human disempowerment. This article explores trust from this perspective in the hopes of encouraging the human factors engineering community to de-emphasize trust as an end goal and replace it with more objective measures and good human factors engineering practices.

Preliminary Evidence of Sexual Bias in Voice over Internet Protocol Audio Compression
Matthew L. Bolton
Springer International Publishing

A formal method for including the probability of erroneous human task behavior in system analyses
Matthew L. Bolton, Xi Zheng, and Eunsuk Kang
Elsevier BV

A Taxonomy of Forcing Functions for Addressing Human Errors in Human-machine Interaction<inf>*</inf>
Pengyuan Wan and Matthew L. Bolton
IEEE
A forcing function is an intervention for constraining human behavior. However, the literature describing forcing functions provides little guidance for when and how to apply forcing functions or their associated trade-offs. In this paper, we address these shortcomings by introducing a novel taxonomy of forcing functions. This taxonomy extends the previous methods in four ways. First, it identifies two levels of forcing function solidity: hard forcing functions, which explicitly enforce constraints through the system, and soft forcing functions, which convey or communicate constraints. Second, each solidity level is decomposed into specific types. Third, the taxonomy hierarchically ranks forcing function solidities and types based on trade-offs of constraint and resilience. Fourth, for hard forcing functions, our taxonomy offers formal guidance for identifying the minimally constraining intervention that will prevent a specific error from occurring. We validated the ability of our method to identify effective error interventions by applying it to systems with known errors from the literature. We then compared the solutions offered by our method to known, effective interventions. We discuss our results and offer suggestions for further developments in future research.

Extended SAFPH℞ (Systems Analysis for Formal Pharmaceutical Human Reliability): Two approaches based on extended CREAM and a comparative analysis
Xi Zheng, Matthew L. Bolton, and Christopher Daly
Elsevier BV

The development of a next-generation human reliability analysis: Systems analysis for formal pharmaceutical human reliability (SAFPH[Formula presented])
Xi Zheng, Matthew L. Bolton, Christopher Daly, and Elliot Biltekoff
Elsevier BV

An Experimental Validation of Masking in IEC 60601-1-8:2006-Compliant Alarm Sounds
Matthew L. Bolton, Xi Zheng, Meng Li, Judy Reed Edworthy, and Andrew D. Boyd
SAGE Publications
Objective This research investigated whether the psychoacoustics of simultaneous masking, which are integral to a model-checking-based method, previously developed for detecting perceivability problems in alarm configurations, could predict when IEC 60601-1-8-compliant medical alarm sounds are audible. Background The tonal nature of sounds prescribed by IEC 60601-1-8 makes them potentially susceptible to simultaneous masking: where concurrent sounds render one or more inaudible due to human sensory limitations. No work has experimentally assessed whether the psychoacoustics of simultaneous masking accurately predict IEC 60601-1-8 alarm perceivability. Method In two signal detection experiments, 28 nursing students judged whether alarm sounds were present in collections of concurrently sounding standard-compliant tones. The first experiment used alarm sounds with single-frequency (primary harmonic) tones. The second experiment’s sounds included the additional, standard-required frequencies (often called subharmonics). T tests compared miss, false alarm, sensitivity, and bias measures between masking and nonmasking conditions and between the two experiments. Results Miss rates were significantly higher and sensitivity was significantly lower for the masking condition than for the nonmasking one. There were no significant differences between the measures of the two experiments. Conclusion These results validate the predictions of the psychoacoustics of simultaneous masking for medical alarms and the masking detection capabilities of our method that relies on them. The results also show that masking of an alarm’s primary harmonic is sufficient to make an alarm sound indistinguishable. Application Findings have profound implications for medical alarm design, the international standard, and masking detection methods.

Explaining Supervised Learning Models: A Preliminary Study on Binary Classifiers
Xiaomei Wang, Ann M. Bisantz, Matthew L. Bolton, Lora Cavuoto, and Varun Chandola
SAGE Publications
The reach of artificial intelligence continues to grow, particularly with the expansion of machine learning techniques that capitalize on increased computing power. Such systems could have tremendous benefits by providing predictions and suggestions. However, they are limited by the fact that they offer incomplete explanations of their predictions to human decision makers. The objective of this work was to summarize general information that could help users make judgments about whether a system is trustworthy and whether the system’s training “makes sense.” A preliminary study was summarized to show the importance of iterative design and testing for visualizing explanations.

The level of measurement of trust in automation
Jiajun Wei, Matthew L. Bolton, and Laura Humphrey
Informa UK Limited
Abstract Psychometrics are increasingly used to evaluate trust in the automation of systems, many of them safety-critical. There is no consensus on what the highest level of measurement is for trust. This is important as the level of measurement determines what mathematics and statistics can be meaningfully applied to ratings. In this work, we introduce a new method for determining what the maximum level of measurement is for psychometrically assessed phenomenon. We use this to determine the level of measurement of trust in automation using human ratings about the behaviour of unmanned aerial systems performing search tasks. Results show that trust is best represented at an ordinal level and that it can be treated as interval in most situations. It is unlikely that trust in automation can be considered ratio. We discuss these results, their implications, and future research.

Editorial Special Issue on Computational Human Performance Modeling
Changxu Wu, L. Rothrock and M. Bolton
Institute of Electrical and Electronics Engineers (IEEE)
HUMAN performance modeling (HPM) is a method of quantifying human behavior, cognition, and processes; a tool used by human factors researchers and practitioners for both the analysis of human function and for the development of systems designed for optimal user experience and interaction. Different from data-driven approaches (e.g., neural networks), most HPM approaches use “top-down” modeling methods based on the fundamental mechanisms of human cognition and behavior. This special issue introduces several human performance modeling articles that make use of mathematical modeling, production systems, and formal methods. We hope that readers of the special issue can benefit from the variety of modeling articles using different modeling approaches. In the following paragraphs, we summarize the features of each modeling approach and briefly introduce the articles in this special issue.

A formal method for assessing the impact of task-based erroneous human behavior on system safety
Matthew L. Bolton, Kylie A. Molinaro, and Adam M. Houser
Elsevier BV

Subjective Measurement of Trust: Is It on the Level?
Jiajun Wei, Matthew L. Bolton, and Laura Humphrey
SAGE Publications
Psychometrics are increasingly being used to evaluate trust in the automation of safety-critical systems. There is no consensus on what the highest level of measurement is for psychometric trust. This is important as the level of measurement determines what mathematics and statistics can be meaningfully applied to ratings. In this work, we introduce a new method for determining what the maximum level of measurement is for psychometric ratings. We use this to assess the level of measurement of trust in automation using human ratings about the behavior of unmanned aerial systems performing search tasks. Results show that trust is best represented at an ordinal level and that it can be treated as interval in most situations. It is unlikely that trust in automation ratings are ratio. We discuss these results, their implications, and future research.

Task-based Automated Test Case Generation for Human-machine Interactio
Meng Li and Matthew L. Bolton
SAGE Publications
Testing is an effective approach for finding discrepancies between intended and actual system behavior. However, the complexity of modern system can make it difficult for analysts to anticipate all the interactions that need to be tested. This is particularly true for human-interactive systems where humans may do things that were not anticipated by analysts. We address this by introducing a novel approach to automated test case generation for human-machine interaction. We do this by combining formal models of human-machine interfaces with formal models of human task behavior. We then use the robust search capabilities of model checking to generate test sequences guaranteed to satisfy test coverage criteria. We demonstrate the capabilities of our approach with of a pod-based coffee machine. Results and future research are discussed.

Using the Lens Model and Cognitive Continuum Theory to Understand the Effects of Cognition on Phishing Victimization
Kylie A. Molinaro and Matthew L. Bolton
SAGE Publications
With the growing threat of phishing emails and the limited effectiveness of current mitigation approaches, there is an urgent need to better understand what leads to phishing victimization. There is a limited body of phishing research that identified cognitive automaticity as a potential factor, but more research on the relationship between user cognition and victimization is needed. Additionally, the current phishing research has not considered the characteristics of the environment in which phishing judgments are made. To fill these gaps, this work used the analysis capabilities afforded by the double system lens model (a judgment analysis technique) and the cognitive continuum theory, specifically the task continuum index and the cognitive continuum index. By calculating a task continuum index score, the cognition best suited for the email sorting task was identified. This calculation resulted in a value which indicated that more analytical cognition was most effective. The cognitive continuum index score evaluated the participants’s cognition level while making judgments. The relationships between these measures and achievement were evaluated. Results indicated that more analytical cognition was associated with lower rates of phishing victimization. This work provides a deeper insight into the phishing problem and has implications for combating phishing.

An analysis of air traffic management concepts of operation using simulation and formal verification
Lanssie M. Ma, Adam Houser, Karen M. Feigh, and Matthew Bolton
American Institute of Aeronautics and Astronautics

RECENT SCHOLAR PUBLICATIONS

Formal Mental Models for Human-Centered Cybersecurity
AM Houser, ML Bolton
International Journal of Human–Computer Interaction, 1-17 2024

Human-Agent Teaming: A System-Theoretic Overview
KJ Meimandi, ML Bolton, PA Beling
Authorea Preprints 2024

RL-HAT: A New Framework for Understanding Human-Agent Teaming
KJ Meimandi, M Bolton, P Beling
Proceedings of the AAAI Symposium Series 1 (1), 80-85 2023

Negative Transfer in Task-Based Human Reliability Analysis: A Formal Methods Approach
ML Bolton, S Riabova, Y Son, E Kang
2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2023

A Formal Method for Assessing Mental Workload
ML Bolton, SS Taylor, L Humphrey
2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2023

Does Trust Have Value? A Discussion About the Importance of Trust to Human Factors and Engineering
ML Bolton, PA Hancock, JD Lee, E Montague, XJ Yang
Proceedings of the Human Factors and Ergonomics Society Annual Meeting 67 (1 2023

The mathematical meaninglessness of the NASA task load index: A level of measurement analysis
ML Bolton, E Biltekoff, L Humphrey
IEEE Transactions on Human-Machine Systems 2023

Robustification of behavioral designs against environmental deviations
C Zhang, T Saluja, R Meira-Ges, M Bolton, D Garlan, E Kang
2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE 2023

Can you hear me? Simultaneous masking between the stars air traffic control alarms
C Hall, E Biltekoff, ML Bolton
Proceedings of the 22nd International Symposium of Aviation Psychology, 6 pages 2023

A Formal Method for the Analysis of the Veteran’s Ebenefits’ Website
G Camacho, M Bolton, J Peng, P Wagle, L Feng
AHFE International, Human Factors in Software and Systems Engineering 94 2023

To Shoot or Not to Shoot? Human, Robot, & Automated Voice Directive Compliance with Target Acquisition & Engagement
G Camacho, M Bolton, J Loggi, K Smith, E Rice, T Iqbal
AHFE International, Human Factors in Robots, Drones and Unmanned Systems 93 2023

Clean Your Hands: Using Computational Modeling to Improve Infection Rates in Anesthesia Induction
OC Rose, ML Bolton, MA Miller
Proceedings of the IEEE Conference on Systems, Man, and Cybernetics, 2 pages 2023

Cognitive Modeling for Cognitive Engineering
ML Bolton, WD Gray
The Cambridge Handbook of Computational Cognitive Sciences, 1088-1112 2023

Humanistic engineering: Engineering for the people
ML Bolton
IEEE Technology and Society Magazine 41 (4), 23-38 2022

2022 Index IEEE Transactions on Human Machine Systems Vol. 52
M Abdel-Rasoul, N Abe, B Abibullaev, V Aggarwal, K Ahiska, A Ajoudani, ...
IEEE TRANSACTIONS ON HUMAN MACHINE SYSTEMS 52 (6) 2022

Fuzzy Mental Model Finite State Machines: A Mental Modeling Formalism for Assessing Mode Confusion and Human-machine “Trust”
ML Bolton, E Biltekoff, K Byrne
2022 IEEE 3rd International Conference on Human-Machine Systems (ICHMS), 1-4 2022

On the level of measurement of subjective psychometric ratings
ML Bolton, E Biltekoff, J Wei, L Humphrey
Proceedings of the Human Factors and Ergonomics Society Annual Meeting 66 (1 2022

Masking between reserved alarm sounds of the IEC 60601-1-8 international medical alarm standard: a systematic, formal analysis
ML Bolton, JR Edworthy, AD Boyd
Human factors 64 (5), 835-851 2022

Preliminary Evidence of Sexual Bias in Voice over Internet Protocol Audio Compression
ML Bolton
International Conference on Human-Computer Interaction, 227-237 2022

Trust is not a virtue: Why we should not trust trust
ML Bolton
Ergonomics in Design, 10648046221130171 2022

MOST CITED SCHOLAR PUBLICATIONS

Using formal verification to evaluate human-automation interaction: A review
ML Bolton, EJ Bass, RI Siminiceanu
IEEE Transactions on Systems, Man, and Cybernetics: Systems 43 (3), 488-503 2013
Citations: 215

A systematic approach to model checking human–automation interaction using task analytic models
ML Bolton, RI Siminiceanu, EJ Bass
IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and 2011
Citations: 139

Generating phenotypical erroneous human behavior to evaluate human–automation interaction using model checking
ML Bolton, EJ Bass, RI Siminiceanu
International Journal of Human-Computer Studies 70 (11), 888-906 2012
Citations: 87

Formally verifying human–automation interaction as part of a system model: limitations and tradeoffs
ML Bolton, EJ Bass
Innovations in systems and software engineering 6, 219-231 2010
Citations: 76

Generating erroneous human behavior from strategic knowledge in task models and evaluating its impact on system safety with model checking
ML Bolton, EJ Bass
IEEE Transactions on Systems, Man, and Cybernetics: Systems 43 (6), 1314-1327 2013
Citations: 51

Automatically generating specification properties from task models for the formal verification of human–automation interaction
ML Bolton, N Jimnez, MM van Paassen, M Trujillo
IEEE Transactions on Human-Machine Systems 44 (5), 561-575 2014
Citations: 46

Toward a multi-method approach to formalizing human-automation interaction and human-human communications
EJ Bass, ML Bolton, K Feigh, D Griffith, E Gunter, W Mansky, J Rushby
2011 IEEE International Conference on Systems, Man, and Cybernetics, 1817-1824 2011
Citations: 43

A method for the formal verification of human-interactive systems
ML Bolton, EJ Bass
Proceedings of the Human Factors and Ergonomics Society Annual Meeting 53 2009
Citations: 43

A formal approach to discovering simultaneous additive masking between auditory medical alarms
B Hasanain, AD Boyd, J Edworthy, ML Bolton
Applied ergonomics 58, 500-514 2017
Citations: 41

Using model checking to explore checklist-guided pilot behavior
ML Bolton, EJ Bass
The International Journal of Aviation Psychology 22 (4), 343-366 2012
Citations: 38

Using task analytic models to visualize model checker counterexamples
ML Bolton, EJ Bass
2010 IEEE International Conference on Systems, Man and Cybernetics, 2069-2074 2010
Citations: 38

Getting better hospital alarm sounds into a global standard
JR Edworthy, RR McNeer, CL Bennett, R Dudaryk, SJP McDougall, ...
Ergonomics in Design 26 (4), 4-13 2018
Citations: 36

Properties for formally assessing the performance level of human-human collaborative procedures with miscommunications and erroneous human behavior
D Pan, ML Bolton
International Journal of Industrial Ergonomics 63, 75-88 2018
Citations: 35

Enhanced operator function model: A generic human task behavior modeling language
ML Bolton, EJ Bass
2009 IEEE International Conference on Systems, Man and Cybernetics, 2904-2911 2009
Citations: 33

Evaluating the applicability of the double system lens model to the analysis of phishing email judgments
KA Molinaro, ML Bolton
computers & security 77, 128-137 2018
Citations: 32

Spatial awareness in synthetic vision systems: Using spatial and temporal judgments to evaluate texture and field of view
ML Bolton, EJ Bass, J Raymond Comstock Jr
Human Factors 49 (6), 961-974 2007
Citations: 30

Model checking human–human communication protocols using task models and miscommunication generation
ML Bolton
Journal of Aerospace Information Systems 12 (7), 476-489 2015
Citations: 29

Comparing perceptual judgment and subjective measures of spatial awareness
ML Bolton, EJ Bass
Situational Awareness, 211-222 2017
Citations: 28

Using model checking to detect simultaneous masking in medical alarms
B Hasanain, AD Boyd, ML Bolton
IEEE Transactions on Human-Machine Systems 46 (2), 174-185 2015
Citations: 27

A formal machine–learning approach to generating human–machine interfaces from task models
M Li, J Wei, X Zheng, ML Bolton
IEEE Transactions on Human-Machine Systems 47 (6), 822-833 2017
Citations: 26