Ensemble Machine Learning on Bulk RNA-Seq Identifies 17-Gene Signature Predicting Neoadjuvant Chemotherapy Response in Breast Cancer Stelios Lamprou, Styliana Georgiou, Triantafyllos Stylianopoulos, Chrysovalantis Voutouri Current Issues in Molecular Biology, 2026 Predicting neoadjuvant chemotherapy response in breast cancer remains critical for optimizing treatment strategies, yet robust predictive biomarkers are lacking. This study implemented an ensemble machine learning approach to identify a gene expression signature predicting pathological complete response (pCR) versus residual disease (RD) using bulk RNA-sequencing data from GSE163882 (138 RD, 80 pCR). We employed TMM normalization with differential expression analysis (250 genes, FDR < 0.05, |log2FC| ≥ 1), ensemble feature selection across five classifiers (Random Forest, Gradient Boosting, SVM, k-NN, and Neural Network) with 10-fold repeated cross-validation, and stacked ensemble development. Consensus selection identified a 17-gene signature consistently ranked across algorithms. The stacked ensemble achieved 0.97 AUC post-testing on hold-out test data. External validation on the independent GSE240671 cohort (37 pCR, 25 RD) following ComBat batch correction achieved ROC AUC of 0.78 and PR AUC of 0.85 with isotonic calibration, demonstrating balanced accuracy of 0.71 and 0.86 sensitivity for pCR detection. Pathway enrichment revealed associations with cell cycle regulation (E2F3, MKI67), DNA repair (BRCA2), and transcriptional control (MED1), with six priority genes (MED1, BRCA2, E2F3, PITPNB, H1-1, and FARP2) showing established breast cancer relevance. This externally validated 17-gene signature provides a biologically grounded tool for NAC response prediction in precision oncology.
Early Prediction of Adverse Stroke Outcomes Using Nonclinical Factors and Missing Data: A Machine Learning Study Lucie Tvrda, Kalliopi Mavromati, Stelios Lamprou, Katryna Cisek, Esra Zihni, et al. Cerebrovascular Diseases, 2026 Introduction: Early prediction of stroke outcomes using prognostic tools may help clinical decision-making and inform resource allocation. However, clinical information required to inform prediction tools is often missing. We evaluated the performance of machine learning (ML) prediction models of adverse stroke outcome at 90 days post-admission that exploit non-clinical data, and missingness, alongside traditional clinical and demographic predictors. Methods: We used routine hospital data from UK clinical sites (NHS SafeHaven) to train three gradient-boosted models. We compared baseline clinical features with nonclinical features and missingness to predict a composite 90-day adverse stroke outcome: mortality, stroke recurrence, or new care-home discharge. Model validation used 10% of the data. Model performance was evaluated by accuracy (correct predictions/total predictions) and area under the receiver operating characteristics curve (AUC) while DeLong’s test was used to compare performance of the three models. We used Brier score to evaluate model calibration. SHapley Additive exPlanations (SHAP) analyses determined the contribution of each model feature in predicting adverse stroke outcome. Results: The final sample included 3,530 stroke patients with 51% males (mean age = 72 years; SD = 14). Clinical data were incomplete with five clinical features having >63% missing values. The performance of the three models was not significantly different (p = 0.5–0.9). The model with non-clinical and missingness features demonstrated 71% accuracy and AUC of 0.76 with Brier score of 0.19. Nonclinical factors, such as time to clinical assessment and time to admission, were among the five most important predictors of adverse stroke outcome (mean |SHAP| = 0.03 and 0.05), alongside Glasgow Coma Scale (0.08), age (0.03), and temperature (0.02). Missing clinical values (pulse and LDL) predicted adverse stroke outcome (mean |SHAP| = 0.02 and 0.02) and were correlated with age (ρ = 0.2), arrival by ambulance (ρ = 0.3), length of stay (ρ = −0.3), and transient ischaemic attack (ρ = 0.3). Conclusion: We demonstrate that nonclinical factors and missingness of data can assist in early predictions of 90-day adverse stroke outcomes. As these factors are often well documented in electronic health systems, they could complement or supplement traditional clinical predictive factors.
Identification of hypertension subtypes using microRNA profiles and machine learning Smarti Reel, Parminder S Reel, Josie Van Kralingen, Casper K Larsen, Stacy Robertson, et al. European Journal of Endocrinology, 2025 Objective Hypertension is a major cardiovascular risk factor affecting about 1 in 3 adults. Although the majority of hypertension cases (∼90%) are classified as “primary hypertension” (PHT), endocrine hypertension (EHT) accounts for ∼10% of cases and is caused by underlying conditions such as primary aldosteronism (PA), Cushing's syndrome (CS), pheochromocytoma or paraganglioma (PPGL). EHT is often misdiagnosed as PHT leading to delays in treatment for the underlying condition, reduced quality of life and costly, often ineffective, antihypertensive treatment. MicroRNA (miRNA) circulating in the plasma is emerging as an attractive potential biomarker for various clinical conditions due to its ease of sampling, the accuracy of its measurement and the correlation of particular disease states with circulating levels of specific miRNAs. Methods This study systematically presents the most discriminating circulating miRNA features responsible for classifying and distinguishing EHT and its subtypes (PA, PPGL, and CS) from PHT using 8 different supervised machine learning (ML) methods for the prediction. Results The trained models successfully classified PPGL, CS, and EHT from PHT with area under the curve (AUC) of 0.9 and PA from PHT with AUC 0.8 from the test set. The most prominent circulating miRNA features for hypertension identification of different disease combinations were hsa-miR-15a-5p and hsa-miR-32-5p. Conclusions This study confirms the potential of circulating miRNAs to serve as diagnostic biomarkers for EHT and the viability of ML as a tool for identifying the most informative miRNA species.
Early Prediction of Adverse Stroke Outcomes using Non-clinical Factors and Missing Data: A Machine Learning Study L Tvrda, K Mavromati, S Lamprou, K Cisek, E Zihni, JD Kelleher, TJ Quinn Cerebrovascular Diseases , 2026 2026
Predicting Adverse Stroke Outcomes From Admission Factors and Data Missingness: A Machine Learning Study L Tvrda, K Mavromati, S Lamprou, K Cisek, E Zihni, J Kelleher, T Quinn INTERNATIONAL JOURNAL OF STROKE 20 (3_ SUPPL) , 2025 2025
Predicting Amyloid Positivity Through Proteomic and Machine Learning Approaches S Lamprou, FJ Gunn-Moore, K Mavromati, TJ Quinn medRxiv, 2025.10. 20.25337968 , 2025 2025
Identification of hypertension subtypes using microRNA profiles and machine learning S Reel, PS Reel, J Van Kralingen, CK Larsen, S Robertson, ... European Journal of Endocrinology 192 (4), 418-428 , 2025 2025 Citations: 8
Circulating microRNAs as diagnostic biomarkers for endocrine hypertension S Lamprou University of Glasgow , 2025 2025
MicroRNAs in aldosterone production and action SM MacKenzie, LA Birch, S Lamprou, P Rezvanisanijouybari, M Fayad, ... Vitamins and hormones 124, 137-163 , 2024 2024 Citations: 1
FRI135 Differential Expression Of Circulating MicroRNAs In Primary Aldosteronism: Potential Biomarkers For Improved Diagnosis S Lamprou, SM MacKenzie, JD McClure, S Robertson, A Riddell, ... Journal of the Endocrine Society 7 (Suppl 1), A350-A351 , 2023 2023
MOST CITED SCHOLAR PUBLICATIONS
Identification of hypertension subtypes using microRNA profiles and machine learning S Reel, PS Reel, J Van Kralingen, CK Larsen, S Robertson, ... European Journal of Endocrinology 192 (4), 418-428 , 2025 2025 Citations: 8
MicroRNAs in aldosterone production and action SM MacKenzie, LA Birch, S Lamprou, P Rezvanisanijouybari, M Fayad, ... Vitamins and hormones 124, 137-163 , 2024 2024 Citations: 1
Early Prediction of Adverse Stroke Outcomes using Non-clinical Factors and Missing Data: A Machine Learning Study L Tvrda, K Mavromati, S Lamprou, K Cisek, E Zihni, JD Kelleher, TJ Quinn Cerebrovascular Diseases , 2026 2026
Predicting Adverse Stroke Outcomes From Admission Factors and Data Missingness: A Machine Learning Study L Tvrda, K Mavromati, S Lamprou, K Cisek, E Zihni, J Kelleher, T Quinn INTERNATIONAL JOURNAL OF STROKE 20 (3_ SUPPL) , 2025 2025
Predicting Amyloid Positivity Through Proteomic and Machine Learning Approaches S Lamprou, FJ Gunn-Moore, K Mavromati, TJ Quinn medRxiv, 2025.10. 20.25337968 , 2025 2025
Circulating microRNAs as diagnostic biomarkers for endocrine hypertension S Lamprou University of Glasgow , 2025 2025
FRI135 Differential Expression Of Circulating MicroRNAs In Primary Aldosteronism: Potential Biomarkers For Improved Diagnosis S Lamprou, SM MacKenzie, JD McClure, S Robertson, A Riddell, ... Journal of the Endocrine Society 7 (Suppl 1), A350-A351 , 2023 2023