frapebe@doctor.upv.es

@iti.es

Associate Professor and Researcher
Universitat Politècnica de València / Instituto Tecnológico de la Informática

EDUCATION

Ph.D. Mathematics and Technical Engineering in Computers
14

Scopus Publications

202

Scholar Citations

9

Scholar h-index

8

Scholar i10-index

Scopus Publications

  • Breast cancer risk assessment for screening: a hybrid artificial intelligence approach
    Raquel Tendero, Andrés Larroza, Francisco Javier Pérez-Benito, Juan Carlos Perez-Cortes, Marta Román, Rafael Llobet
    European Radiology, 2026
    Objectives This study evaluates whether integrating clinical data with mammographic features using artificial intelligence (AI) improves 2-year breast cancer risk prediction compared to using either data type alone. Materials and methods This retrospective nested case-control study included 2193 women (mean age, 59 ± 5 years) screened at Hospital del Mar, Spain (2013–2020), with 418 cases (mammograms taken 2 years before diagnosis) and 1775 controls (cancer-free for ≥ 2 years). Three models were evaluated: (1) ERTpd + im, based on Extremely Randomized Trees (ERT), split into sub-models for personal data (ERTpd) and image features (ERTim); (2) an image-only model (CNN); and (3) a hybrid model (ERTpd + im + CNN). Five-fold cross-validation, area under the receiver operating characteristic curve (AUC), bootstrapping for confidence intervals, and DeLong tests for paired data assessed performance. Robustness was evaluated across breast density quartiles and detection type (screen-detected vs. interval cancers). Results The hybrid model achieved an AUC of 0.75 (95% CI: 0.71–0.76), significantly outperforming the CNN model (AUC, 0.74; 95% CI: 0.70–0.75; p < 0.05) and slightly surpassing ERTpd + im (AUC, 0.74; 95% CI: 0.70–0.76). Sub-models ERTpd and ERTim had AUCs of 0.59 and 0.73, respectively. The hybrid model performed consistently across breast density quartiles (p > 0.05) and better for screen-detected (AUC, 0.79) than interval cancers (AUC, 0.59; p < 0.001). Conclusions This study shows that integrating clinical and mammographic data with AI improves 2-year breast cancer risk prediction, outperforming single-source models. The hybrid model demonstrated higher accuracy and robustness across breast density quartiles, with better performance for screen-detected cancers. Key Points Question Current breast cancer risk models have limitations in accuracy. Can integrating clinical and mammographic data using artificial intelligence (AI) improve short-term risk prediction? Findings A hybrid model combining clinical and imaging data achieved the highest accuracy in predicting 2-year breast cancer risk, outperforming models using either data type alone. Clinical relevance Integrating clinical and mammographic data with AI improves breast cancer risk prediction. This approach enables personalized screening strategies and supports early detection. It helps identify high-risk women and optimizes the use of additional assessments within screening programs. Graphical Abstract
  • Three-Blind Validation Strategy of Deep Learning Models for Image Segmentation
    Andrés Larroza, Francisco Javier Pérez-Benito, Raquel Tendero, Juan Carlos Perez-Cortes, Marta Román, Rafael Llobet
    Journal of Imaging, 2025
    Image segmentation plays a central role in computer vision applications such as medical imaging, industrial inspection, and environmental monitoring. However, evaluating segmentation performance can be particularly challenging when ground truth is not clearly defined, as is often the case in tasks involving subjective interpretation. These challenges are amplified by inter- and intra-observer variability, which complicates the use of human annotations as a reliable reference. To address this, we propose a novel validation framework—referred to as the three-blind validation strategy—that enables rigorous assessment of segmentation models in contexts where subjectivity and label variability are significant. The core idea is to have a third independent expert, blind to the labeler identities, assess a shuffled set of segmentations produced by multiple human annotators and/or automated models. This allows for the unbiased evaluation of model performance and helps uncover patterns of disagreement that may indicate systematic issues with either human or machine annotations. The primary objective of this study is to introduce and demonstrate this validation strategy as a generalizable framework for robust model evaluation in subjective segmentation tasks. We illustrate its practical implementation in a mammography use case involving dense tissue segmentation while emphasizing its potential applicability to a broad range of segmentation scenarios.
  • Breast Delineation in Full-Field Digital Mammography Using the Segment Anything Model
    Andrés Larroza, Francisco Javier Pérez-Benito, Raquel Tendero, Juan Carlos Perez-Cortes, Marta Román, Rafael Llobet
    Diagnostics, 2024
    Breast cancer is a major health concern worldwide. Mammography, a cost-effective and accurate tool, is crucial in combating this issue. However, low contrast, noise, and artifacts can limit the diagnostic capabilities of radiologists. Computer-Aided Diagnosis (CAD) systems have been developed to overcome these challenges, with the accurate outlining of the breast being a critical step for further analysis. This study introduces the SAM-breast model, an adaptation of the Segment Anything Model (SAM) for segmenting the breast region in mammograms. This method enhances the delineation of the breast and the exclusion of the pectoral muscle in both medio lateral-oblique (MLO) and cranio-caudal (CC) views. We trained the models using a large, multi-center proprietary dataset of 2492 mammograms. The proposed SAM-breast model achieved the highest overall Dice Similarity Coefficient (DSC) of 99.22% ± 1.13 and Intersection over Union (IoU) 98.48% ± 2.10 over independent test images from five different datasets (two proprietary and three publicly available). The results are consistent across the different datasets, regardless of the vendor or image resolution. Compared with other baseline and deep learning-based methods, the proposed method exhibits enhanced performance. The SAM-breast model demonstrates the power of the SAM to adapt when it is tailored to specific tasks, in this case, the delineation of the breast in mammograms. Comprehensive evaluations across diverse datasets—both private and public—attest to the method’s robustness, flexibility, and generalization capabilities.
  • Breast Dense Tissue Segmentation with Noisy Labels: A Hybrid Threshold-Based and Mask-Based Approach
    Andrés Larroza, Francisco Javier Pérez-Benito, Juan-Carlos Perez-Cortes, Marta Román, Marina Pollán, Beatriz Pérez-Gómez, Dolores Salas-Trejo, María Casals, Rafael Llobet
    Diagnostics, 2022
    Breast density assessed from digital mammograms is a known biomarker related to a higher risk of developing breast cancer. Supervised learning algorithms have been implemented to determine this. However, the performance of these algorithms depends on the quality of the ground-truth information, which expert readers usually provide. These expert labels are noisy approximations to the ground truth, as there is both intra- and inter-observer variability among them. Thus, it is crucial to provide a reliable method to measure breast density from mammograms. This paper presents a fully automated method based on deep learning to estimate breast density, including breast detection, pectoral muscle exclusion, and dense tissue segmentation. We propose a novel confusion matrix (CM)—YNet model for the segmentation step. This architecture includes networks to model each radiologist’s noisy label and gives the estimated ground-truth segmentation as well as two parameters that allow interaction with a threshold-based labeling tool. A multi-center study involving 1785 women whose “for presentation” mammograms were obtained from 11 different medical facilities was performed. A total of 2496 mammograms were used as the training corpus, and 844 formed the testing corpus. Additionally, we included a totally independent dataset from a different center, composed of 381 women with one image per patient. Each mammogram was labeled independently by two expert radiologists using a threshold-based tool. The implemented CM-Ynet model achieved the highest DICE score averaged over both test datasets (0.82±0.14) when compared to the closest dense-tissue segmentation assessment from both radiologists. The level of concordance between the two radiologists showed a DICE score of 0.76±0.17. An automatic breast density estimator based on deep learning exhibited higher performance when compared with two experienced radiologists. This suggests that modeling each radiologist’s label allows for better estimation of the unknown ground-truth segmentation. The advantage of the proposed model is that it also provides the threshold parameters that enable user interaction with a threshold-based tool.
  • A deep learning framework to classify breast density with noisy labels regularization
    Hector Lopez-Almazan, Francisco Javier Pérez-Benito, Andrés Larroza, Juan-Carlos Perez-Cortes, Marina Pollan, Beatriz Perez-Gomez, Dolores Salas Trejo, María Casals, Rafael Llobet
    Computer Methods and Programs in Biomedicine, 2022
    BACKGROUND AND OBJECTIVE: Breast density assessed from digital mammograms is a biomarker for higher risk of developing breast cancer. Experienced radiologists assess breast density using the Breast Image and Data System (BI-RADS) categories. Supervised learning algorithms have been developed with this objective in mind, however, the performance of these algorithms depends on the quality of the ground-truth information which is usually labeled by expert readers. These labels are noisy approximations of the ground truth, as there is often intra- and inter-reader variability among labels. Thus, it is crucial to provide a reliable method to obtain digital mammograms matching BI-RADS categories. This paper presents RegL (Labels Regularizer), a methodology that includes different image pre-processes to allow both a correct breast segmentation and the enhancement of image quality through an intensity adjustment, thus allowing the use of deep learning to classify the mammograms into BI-RADS categories. The Confusion Matrix (CM) - CNN network used implements an architecture that models each radiologist's noisy label. The final methodology pipeline was determined after comparing the performance of image pre-processes combined with different DL architectures. METHODS: A multi-center study composed of 1395 women whose mammograms were classified into the four BI-RADS categories by three experienced radiologists is presented. A total of 892 mammograms were used as the training corpus, 224 formed the validation corpus, and 279 the test corpus. RESULTS: The combination of five networks implementing the RegL methodology achieved the best results among all the models in the test set. The ensemble model obtained an accuracy of (0.85) and a kappa index of 0.71. CONCLUSIONS: The proposed methodology has a similar performance to the experienced radiologists in the classification of digital mammograms into BI-RADS categories. This suggests that the pre-processing steps and modelling of each radiologist's label allows for a better estimation of the unknown ground truth labels.
  • Extended a Priori Probability (EAPP): A Data-Driven Approach for Machine Learning Binary Classification Tasks
    Vicent Ortiz Castello, Francisco Javier Perez-Benito, Omar Del Tejo Catala, Ismael Salvador Igual, Rafael Llobet, Juan-Carlos Perez-Cortes
    IEEE Access, 2022
    The a priori probability of a dataset is usually used as a baseline for comparing a particular algorithm’s accuracy in a given binary classification task. ZeroR is the simplest algorithm for this, predicting the majority class for all examples. However, this is an extremely simple approach that has no predictive power and does not describe other dataset features that could lead to a more demanding baseline. In this paper, we present the Extended A Priori Probability (EAPP), a novel semi-supervised baseline metric for binary classification tasks that considers not only the a priori probability but also some possible bias present in the dataset as well as other features that could provide a relatively trivial separability of the target classes. The approach is based on the area under the ROC curve (AUC ROC), known to be quite insensitive to class imbalance. The procedure involves multiobjective feature extraction and a clustering stage in the input space with autoencoders and a subsequent combinatory weighted assignation from clusters to classes depending on the distance to nearest clusters for each class. Class labels are then assigned to establish the combination that maximizes AUC ROC for each number of clusters considered. To avoid overfit in the combined feature extraction and clustering method, a cross-validation scheme is performed in each case. EAPP is defined for different numbers of clusters, starting from the inverse of the minority class proportion, which is useful for a fair comparison among diversely imbalanced datasets. A high EAPP usually relates to an easy binary classification task, but it also may be due to a significant coarse-grained bias in the dataset, when the task is previously known to be difficult. This metric represents a baseline beyond the a priori probability to assess the actual capabilities of binary classification models.
  • Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients
    Omar Del Tejo Catala, Ismael Salvador Igual, Francisco Javier Perez-Benito, David Millan Escriva, Vicent Ortiz Castello, Rafael Llobet, Juan-Carlos Perez-Cortes
    IEEE Access, 2021
    Chest X-ray images are useful for early COVID-19 diagnosis with the advantage that X-ray devices are already available in health centers and images are obtained immediately. Some datasets containing X-ray images with cases (pneumonia or COVID-19) and controls have been made available to develop machine-learning-based methods to aid in diagnosing the disease. However, these datasets are mainly composed of different sources coming from pre-COVID-19 datasets and COVID-19 datasets. Particularly, we have detected a significant bias in some of the released datasets used to train and test diagnostic systems, which might imply that the results published are optimistic and may overestimate the actual predictive capacity of the techniques proposed. In this article, we analyze the existing bias in some commonly used datasets and propose a series of preliminary steps to carry out before the classic machine learning pipeline in order to detect possible biases, to avoid them if possible and to report results that are more representative of the actual predictive power of the methods under analysis.
  • A deep learning system to obtain the optimal parameters for a threshold-based breast and dense tissue segmentation
    Francisco Javier Pérez-Benito, François Signol, Juan-Carlos Perez-Cortes, Alejandro Fuster-Baggetto, Marina Pollan, Beatriz Pérez-Gómez, Dolores Salas-Trejo, Maria Casals, Inmaculada Martínez, Rafael LLobet
    Computer Methods and Programs in Biomedicine, 2020
  • Community detection-based deep neural network architectures: A fully automated framework based on Likert-scale data
    Francisco Javier Pérez‐Benito, Juan Miguel García‐Gómez, Esperanza Navarro‐Pardo, J. Alberto Conejero
    Mathematical Methods in the Applied Sciences, 2020
    Deep neural networks (DNNs) have emerged as a state‐of‐the‐art tool in very different research fields due to its adaptive power to the decision space since they do not presuppose any linear relationship between data. Some of the main disadvantages of these trending models are that the choice of the network underlying architecture profoundly influences the performance of the model and that the architecture design requires prior knowledge of the field of study. The use of questionnaires is hugely extended in social/behavioral sciences. The main contribution of this work is to automate the process of a DNN architecture design by using an agglomerative hierarchical algorithm that mimics the conceptual structure of such surveys. Although the train had regression purposes, it is easily convertible to deal with classification tasks. Our proposed methodology will be tested with a database containing socio‐demographic data and the responses to five psychometric Likert scales related to the prediction of happiness. These scales have been already used to design a DNN architecture based on the subdimension of the scales. We show that our new network configurations outperform the previous existing DNN architectures.
  • Subgrouping Factors Influencing Migraine Intensity in Women: A Semi-automatic Methodology Based on Machine Learning and Information Geometry
    Francisco J. Pérez‐Benito, J. Alberto Conejero, Carlos Sáez, Juan M. García‐Gómez, Esperanza Navarro‐Pardo, Lidiane L. Florencio, César Fernández‐de‐las‐Peñas
    Pain Practice, 2020
    Migraine is a heterogeneous condition with multiple clinical manifestations. Machine learning algorithms permit the identification of population groups, providing analytical advantages over other modeling techniques.
  • Divisibility patterns within Pascal divisibility networks
    Pedro A. Solares-Hernández, Fernando A. Manzano, Francisco J. Pérez-Benito, J. Alberto Conejero
    Mathematics, 2020
  • Temporal variability analysis reveals biases in electronic health records due to hospital process reengineering interventions over seven years
    Francisco Javier Pérez-Benito, Carlos Sáez, J. Alberto Conejero, Salvador Tortajada, Bernardo Valdivieso, Juan M. García-Gómez
    Plos One, 2019
  • Global parenchymal texture features based on histograms of oriented gradients improve cancer development risk estimation from healthy breasts
    Francisco Javier Pérez-Benito, Francois Signol, Juan-Carlos Pérez-Cortés, Marina Pollán, Beatriz Pérez-Gómez, Dolores Salas-Trejo, María Casals, Inmaculada Martínez, Rafael LLobet
    Computer Methods and Programs in Biomedicine, 2019
  • A happiness degree predictor using the conceptual data structure for deep learning architectures
    Francisco Javier Pérez-Benito, Patricia Villacampa-Fernández, J. Alberto Conejero, Juan M. García-Gómez, Esperanza Navarro-Pardo
    Computer Methods and Programs in Biomedicine, 2019

RECENT SCHOLAR PUBLICATIONS

  • Breast cancer risk assessment for screening: a hybrid artificial intelligence approach
    R Tendero, A Larroza, FJ Pérez-Benito, JC Perez-Cortes, M Román, ...
    European Radiology 36 (3), 1932-1942 , 2026
    2026
    Citations: 2
  • Three-Blind Validation Strategy of Deep Learning Models for Image Segmentation
    A Larroza, FJ Pérez-Benito, R Tendero, JC Perez-Cortes, M Román, ...
    Journal of Imaging 11 (5), 170 , 2025
    2025
    Citations: 4
  • Breast delineation in full-field digital mammography using the segment anything model
    A Larroza, FJ Pérez-Benito, R Tendero, JC Perez-Cortes, M Román, ...
    Diagnostics 14 (10), 1015 , 2024
    2024
    Citations: 9
  • Extended A Priori Probability (EAPP): a data-driven approach for machine learning binary classification tasks
    FJ Pérez-benito, ODT Catalá, IS Igual, R Llobet, JC Perez-Cortes
    IEEE Access 10, 120074-120085 , 2022
    2022
  • Breast dense tissue segmentation with noisy labels: A hybrid threshold-based and mask-based approach
    A Larroza, FJ Pérez-Benito, JC Perez-Cortes, M Román, M Pollán, ...
    Diagnostics 12 (8), 1822 , 2022
    2022
    Citations: 14
  • A deep learning framework to classify breast density with noisy labels regularization
    H Lopez-Almazan, FJ Pérez-Benito, A Larroza, JC Perez-Cortes, M Pollan, ...
    Computer Methods and Programs in Biomedicine 221, 106885 , 2022
    2022
    Citations: 19
  • Bias analysis on public X-ray image datasets of pneumonia and COVID-19 patients
    ODT Catalá, IS Igual, FJ Pérez-Benito, DM Escrivá, VO Castelló, R Llobet, ...
    Ieee Access 9, 42370-42383 , 2021
    2021
    Citations: 37
  • A deep learning system to obtain the optimal parameters for a threshold-based breast and dense tissue segmentation
    FJ Pérez-Benito, F Signol, JC Perez-Cortes, A Fuster-Baggetto, M Pollan, ...
    Computer Methods and Programs in Biomedicine 195, 105668 , 2020
    2020
    Citations: 38
  • Community detection‐based deep neural network architectures: A fully automated framework based on Likert‐scale data
    FJ Pérez‐Benito, JM García‐Gómez, E Navarro‐Pardo, JA Conejero
    Mathematical Methods in the Applied Sciences 43 (14), 8290-8301 , 2020
    2020
    Citations: 4
  • Subgrouping factors influencing migraine intensity in women: a semi‐automatic methodology based on machine learning and information geometry
    FJ Pérez‐Benito, JA Conejero, C Saez, JM García‐Gómez, ...
    Pain Practice 20 (3), 297-309 , 2020
    2020
    Citations: 15
  • Divisibility patterns within Pascal divisibility networks
    PA Solares-Hernández, FA Manzano, FJ Pérez-Benito, JA Conejero
    Mathematics 8 (2), 254 , 2020
    2020
    Citations: 7
  • Healthcare data heterogeneity and its contribution to machine learning performance
    FJ Pérez Benito
    Healthcare data heterogeneity and its contribution to machine learning … , 2020
    2020
    Citations: 1
  • Temporal variability analysis reveals biases in electronic health records due to hospital process reengineering interventions over seven years
    FJ Pérez-Benito, C Sáez, JA Conejero, S Tortajada, B Valdivieso, ...
    PLoS One 14 (8), e0220369 , 2019
    2019
    Citations: 12
  • Global parenchymal texture features based on histograms of oriented gradients improve cancer development risk estimation from healthy breasts
    FJ Pérez-Benito, F Signol, JC Perez-Cortes, M Pollan, B Perez-Gomez, ...
    Computer methods and programs in biomedicine 177, 123-132 , 2019
    2019
    Citations: 14
  • A happiness degree predictor using the conceptual data structure for deep learning architectures
    FJ Pérez-Benito, P Villacampa-Fernández, JA Conejero, ...
    Computer methods and programs in biomedicine 168, 59-68 , 2019
    2019
    Citations: 26

MOST CITED SCHOLAR PUBLICATIONS

  • A deep learning system to obtain the optimal parameters for a threshold-based breast and dense tissue segmentation
    FJ Pérez-Benito, F Signol, JC Perez-Cortes, A Fuster-Baggetto, M Pollan, ...
    Computer Methods and Programs in Biomedicine 195, 105668 , 2020
    2020
    Citations: 38
  • Bias analysis on public X-ray image datasets of pneumonia and COVID-19 patients
    ODT Catalá, IS Igual, FJ Pérez-Benito, DM Escrivá, VO Castelló, R Llobet, ...
    Ieee Access 9, 42370-42383 , 2021
    2021
    Citations: 37
  • A happiness degree predictor using the conceptual data structure for deep learning architectures
    FJ Pérez-Benito, P Villacampa-Fernández, JA Conejero, ...
    Computer methods and programs in biomedicine 168, 59-68 , 2019
    2019
    Citations: 26
  • A deep learning framework to classify breast density with noisy labels regularization
    H Lopez-Almazan, FJ Pérez-Benito, A Larroza, JC Perez-Cortes, M Pollan, ...
    Computer Methods and Programs in Biomedicine 221, 106885 , 2022
    2022
    Citations: 19
  • Subgrouping factors influencing migraine intensity in women: a semi‐automatic methodology based on machine learning and information geometry
    FJ Pérez‐Benito, JA Conejero, C Saez, JM García‐Gómez, ...
    Pain Practice 20 (3), 297-309 , 2020
    2020
    Citations: 15
  • Breast dense tissue segmentation with noisy labels: A hybrid threshold-based and mask-based approach
    A Larroza, FJ Pérez-Benito, JC Perez-Cortes, M Román, M Pollán, ...
    Diagnostics 12 (8), 1822 , 2022
    2022
    Citations: 14
  • Global parenchymal texture features based on histograms of oriented gradients improve cancer development risk estimation from healthy breasts
    FJ Pérez-Benito, F Signol, JC Perez-Cortes, M Pollan, B Perez-Gomez, ...
    Computer methods and programs in biomedicine 177, 123-132 , 2019
    2019
    Citations: 14
  • Temporal variability analysis reveals biases in electronic health records due to hospital process reengineering interventions over seven years
    FJ Pérez-Benito, C Sáez, JA Conejero, S Tortajada, B Valdivieso, ...
    PLoS One 14 (8), e0220369 , 2019
    2019
    Citations: 12
  • Breast delineation in full-field digital mammography using the segment anything model
    A Larroza, FJ Pérez-Benito, R Tendero, JC Perez-Cortes, M Román, ...
    Diagnostics 14 (10), 1015 , 2024
    2024
    Citations: 9
  • Divisibility patterns within Pascal divisibility networks
    PA Solares-Hernández, FA Manzano, FJ Pérez-Benito, JA Conejero
    Mathematics 8 (2), 254 , 2020
    2020
    Citations: 7
  • Three-Blind Validation Strategy of Deep Learning Models for Image Segmentation
    A Larroza, FJ Pérez-Benito, R Tendero, JC Perez-Cortes, M Román, ...
    Journal of Imaging 11 (5), 170 , 2025
    2025
    Citations: 4
  • Community detection‐based deep neural network architectures: A fully automated framework based on Likert‐scale data
    FJ Pérez‐Benito, JM García‐Gómez, E Navarro‐Pardo, JA Conejero
    Mathematical Methods in the Applied Sciences 43 (14), 8290-8301 , 2020
    2020
    Citations: 4
  • Breast cancer risk assessment for screening: a hybrid artificial intelligence approach
    R Tendero, A Larroza, FJ Pérez-Benito, JC Perez-Cortes, M Román, ...
    European Radiology 36 (3), 1932-1942 , 2026
    2026
    Citations: 2
  • Healthcare data heterogeneity and its contribution to machine learning performance
    FJ Pérez Benito
    Healthcare data heterogeneity and its contribution to machine learning … , 2020
    2020
    Citations: 1
  • Extended A Priori Probability (EAPP): a data-driven approach for machine learning binary classification tasks
    FJ Pérez-benito, ODT Catalá, IS Igual, R Llobet, JC Perez-Cortes
    IEEE Access 10, 120074-120085 , 2022
    2022