Subset binding enables detection of multimodal patient subgroup patterns and drug target discovery in idiopathic pulmonary fibrosis Yayoi Natsume-Kitatani, Mari N Itoh, Yoshito Takeda, Masataka Kuroda, Haruhiko Hirata, et al. Briefings in Bioinformatics, 2026 Idiopathic pulmonary fibrosis (IPF) is an intractable lung disease that belongs to idiopathic interstitial pneumonia (IIP) with limited therapeutic options. Conventional patient stratification approaches often fail to integrate diverse data modalities, particularly heterogeneous electronic medical records (EMR) containing mixed discrete and continuous values, with omics data, or fail to extract the interpretable many-to-many relationships crucial for precision medicine. We introduce subset binding (SB), a novel unsupervised algorithm that extends fuzzy association rule mining to robustly integrate heterogeneous clinical data (EMR) and omics data. This framework is uniquely designed to identify clinically meaningful patient subgroup patterns and discover associated molecular signatures based on observable symptoms rather than relying on ambiguous conventional diagnostic categories, such as IIPs. Applying SB to a dataset including 602 samples (from 403 IIPs including IPF patients and 39 healthy controls), we successfully identified 20 proteins linked with key IPF clinical features. Network-based pathway analysis nominated tyrosine kinases as critical drug target candidates, leading to the proposal of ponatinib, a multi-kinase inhibitor, as a candidate therapeutic. Functional validation using a TGF-β-induced epithelial-mesenchymal transition (EMT) model confirmed ponatinib’s ability to at least partially suppress TGF-β-induced EMT. This inhibitory effect is consistent with the anti-fibrotic mechanism of the existing IPF drug, nintedanib, and reinforces prior evidence supporting ponatinib’s anti-fibrotic property. This study demonstrates that SB enables transparent, reproducible, and robust, molecularly defined patient stratification from multimodal patient data. By establishing a data-driven framework that focuses on observation-based rules, this work lays the critical foundation for future prognostic validation and tailored treatment strategies, offering clinically actionable insights and therapeutic discovery in diagnostically ambiguous diseases like IPF, with ponatinib emerging as a compelling repurposing candidate. Significance statement Idiopathic pulmonary fibrosis (IPF) is a progressive lung disease with limited therapeutic options. IPF is classified as idiopathic interstitial pneumonia (IIP), but distinguishing it from other similar diseases in IIP is not straightforward. The ambiguities in distinguishing IPF from other IIPs necessitate the identification of molecules associated with specific clinical features, rather than relying on solely on diagnosis. Existing methods for multi-omics data analysis often fail to effectively integrate heterogeneous data – such as EMR (containing mixed discrete and continuous values) and omics – or to extract many-to-many molecular-phenotypic relationships. We developed subset binding (SB), a novel, interpretable unsupervised machine learning method to specifically address these technical limitations by integrating EMR and omics data. Our approach successfully detected proteins in serum extracellular vesicles associated with IPF-related features, highlighted several tyrosine kinases as potential drug targets, and proposed the multi-kinase inhibitor ponatinib as a compelling candidate for drug repurposing. This data-driven framework establishes a scalable and interpretable foundation for biomarker and drug target discovery for intractable diseases whose mechanisms are not fully understood.
Serum vesicle biomarkers reflect the disease activity of idiopathic pulmonary fibrosis Yuya Shirai, Takatoshi Enomoto, Yoshito Takeda, Ryuya Edahiro, Miho Takahashi-Itoh, et al. Journal of Translational Medicine, 2025 Background Idiopathic pulmonary fibrosis (IPF) is a heterogeneous disease caused by an interplay of genetic and environmental factors. Biomarkers that reflect the progression of fibrosis are required for the management of IPF. Methods We extracted serum extracellular vesicles from a discovery cohort (127 IPF patients and 34 controls) and a validation cohort (20 IPF patients and 22 controls). Non-targeted proteomic analysis was performed by a data-independent acquisition method. We investigated the proteomic profiles in relation to multiple clinical parameters associated with IPF. To further evaluate the biological relevance of the identified biomarkers, we analyzed publicly available single-cell RNA sequencing datasets of lung tissue and conducted immunochemical validation using our collected lung samples. Results We obtained 2420 protein profiles in serum extracellular vesicles and identified 19 IPF-associated proteins; their expressions were significantly lung-specific. Protein module analyses revealed that the upstream components of the complement system were increased in IPF. These IPF-associated proteins were involved in various IPF-associated genes and heterogeneously increased in IPF patients. Notably, surfactant protein B (SFTPB) not only showed superior diagnostic performance over the existing marker but was also significantly associated with progressive disease activity, such as the extent of fibrosis and decline in lung function. Furthermore, single-cell RNA-sequencing analysis revealed that SFTPB was associated with the TGF-β/SMAD pathway in SCGB3A2 + cells in IPF lungs. SFTPB expression in SCGB3A2 + cells was confirmed by immunostaining. Conclusions Serum extracellular vesicles could capture heterogenetic fibrotic profiles in IPF, and SFTPB can be a promising biomarker reflecting the disease activity.
Homology-feature-assisted quantification of fibrotic lesions in computed tomography images: a proof of concept for CT image feature-based prediction for gene-expression-distribution Kentaro Doi, Hodaka Numasaki, Yusuke Anetai, Yayoi Natsume-Kitatani International Journal of Computer Assisted Radiology and Surgery, 2025 Purpose Computed tomography (CT) image is promising for diagnosing of interstitial idiopathic pneumonias (IIPs); however, quantification of IIPs lesions in CT images is required. This study aimed to quantitatively evaluate fibrotic lesions in CT images using homology-based image analysis. Methods We collected publicly available CT images comprising 47 fibrotic images and 36 non-fibrotic images. The homology-profile (HP) image analysis method provides b0 and b1 profiles, indicating the number of isolated components and holes in a binary image. We locally applied the HP method to the CT image and generated homology-based feature (HF) maps as resultant images. The collected images were randomly divided into the tuning dataset and the testing dataset. The cut-off value for classifying the HF map for fibrotic or non-fibrotic images was defined using receiver operating characteristic (ROC) analysis with the tuning dataset. This cut-off value was evaluated using the testing dataset with accuracy, sensitivity, specificity, and precision. Results We successfully visualized the quantification of fibrotic lesions in the HF map. The b0 HF map was more suitable for quantifying fibrotic lesions than b1. The mean cut-off value of the b0 HF map was 199, with all performances achieved at 1.0. Furthermore, the classification of the b0 HF map for fibrotic or lung cancer images achieved all maximum performances at 1.0. Conclusion This study demonstrated the feasibility of using the HF in quantitatively evaluating fibrotic lesions in CT images. Our proposed HP-based method can also be promising in quantifying the fibrotic lesions of patients with IIPs, which can be applicable to assist the diagnosis of IIPs.
Idiopathic pulmonary fibrosis-specific Bayesian network integrating extracellular vesicle proteome and clinical information Mei Tomoto, Yohei Mineharu, Noriaki Sato, Yoshinori Tamada, Mari Nogami-Itoh, et al. Scientific Reports, 2024 Idiopathic pulmonary fibrosis (IPF) is a progressive disease characterized by severe lung fibrosis and a poor prognosis. Although the biomolecules related to IPF have been extensively studied, molecular mechanisms of the pathogenesis and their association with serum biomarkers and clinical findings have not been fully elucidated. We constructed a Bayesian network using multimodal data consisting of a proteome dataset from serum extracellular vesicles, laboratory examinations, and clinical findings from 206 patients with IPF and 36 controls. Differential protein expression analysis was also performed by edgeR and incorporated into the constructed network. We have successfully visualized the relationship between biomolecules and clinical findings with this approach. The IPF-specific network included modules associated with TGF-β signaling (TGFB1 and LRC32), fibrosis-related (A2MG and PZP), myofibroblast and inflammation (LRP1 and ITIH4), complement-related (SAA1 and SAA2), as well as serum markers, and clinical symptoms (KL-6, SP-D and fine crackles). Notably, it identified SAA2 associated with lymphocyte counts and PSPB connected with the serum markers KL-6 and SP-D, along with fine crackles as clinical manifestations. These results contribute to the elucidation of the pathogenesis of IPF and potential therapeutic targets.
Correlation of CT-based radiomics analysis with pathological cellular infiltration in fibrosing interstitial lung diseases Akira Haga, Tae Iwasawa, Toshihiro Misumi, Koji Okudela, Tsuneyuki Oda, et al. Japanese Journal of Radiology, 2024 Purpose We aimed to identify computed tomography (CT) radiomics features that are associated with cellular infiltration and construct CT radiomics models predictive of cellular infiltration in patients with fibrotic ILD. Materials and methods CT images of patients with ILD who underwent surgical lung biopsy (SLB) were analyzed. Radiomics features were extracted using artificial intelligence-based software and PyRadiomics. We constructed a model predicting cell counts in histological specimens, and another model predicting two classifications of higher or lower cellularity. We tested these models using external validation. Results Overall, 100 patients (mean age: 62 ± 8.9 [standard deviation] years; 61 men) were included. The CT radiomics model used to predict cell count in 140 histological specimens predicted the actual cell count in 59 external validation specimens (root-mean-square error: 0.797). The two-classification model’s accuracy was 70% and the F1 score was 0.73 in the external validation dataset including 30 patients. Conclusion The CT radiomics-based model developed in this study provided useful information regarding the cellular infiltration in the ILD with good correlation with SLB specimens.
SFTPB in serum extracellular vesicles as a biomarker of progressive pulmonary fibrosis Takatoshi Enomoto, Yuya Shirai, Yoshito Takeda, Ryuya Edahiro, Shigeyuki Shichino, et al. Jci Insight, 2024 Progressive pulmonary fibrosis (PPF), defined as the worsening of various interstitial lung diseases (ILDs), currently lacks useful biomarkers. To identify novel biomarkers for early detection of patients at risk of PPF, we performed a proteomic analysis of serum extracellular vesicles (EVs). Notably, the identified candidate biomarkers were enriched for lung-derived proteins participating in fibrosis-related pathways. Among them, pulmonary surfactant-associated protein B (SFTPB) in serum EVs could predict ILD progression better than the known biomarkers, serum KL-6 and SP-D, and it was identified as an independent prognostic factor from ILD-gender-age-physiology index. Subsequently, the utility of SFTPB for predicting ILD progression was evaluated further in 2 cohorts using serum EVs and serum, respectively, suggesting that SFTPB in serum EVs but not in serum was helpful. Among SFTPB forms, pro-SFTPB levels were increased in both serum EVs and lungs of patients with PPF compared with those of the control. Consistently, in a mouse model, the levels of pro-SFTPB, primarily originating from alveolar epithelial type 2 cells, were increased similarly in serum EVs and lungs, reflecting pro-fibrotic changes in the lungs, as supported by single-cell RNA sequencing. SFTPB, especially its pro-form, in serum EVs could serve as a biomarker for predicting ILD progression.
BiomedCurator: Data Curation for Biomedical Literature Mohammad Golam Sohrab, Khoa N.A. Duong, Ikeda Masami, Goran Topić, Yayoi Natsume-Kitatani, et al. Proceedings of the 2nd Conference of the Asia Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing Long Paper Aacl Ijcnlp 2022, 2022