@purdue.edu
Research Assistant Professor, Weldon School of Biomedical Engineering
Purdue University
Surgery, Computer Science Applications, Biomedical Engineering
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Matthias Carstens, Shubha Vasisht, Zheyuan Zhang, Iulia Barbur, Annika Reinke, Lena Maier-Hein, Daniel A. Hashimoto, and Fiona R. Kolbinger
Springer Science and Business Media LLC
Abstract Surgical scene understanding (SSU) uses artificial intelligence (AI) to interpret visual data from surgeries, such as laparoscopic videos. Despite promising foundational research on instrument and anatomy recognition, clinical adoption remains minimal. This systematic review and meta-analysis (PROSPERO: CRD420251005301) evaluates current SSU research in minimally invasive abdominal surgery, focusing on data curation, model design, validation, reporting standards, and clinical relevance. A total of 188 studies were reviewed. Most relied on small, single-center datasets (70.7%), primarily laparoscopic cholecystectomies (59.0%), reflecting an overall narrow topical breadth. Validation practices were often weak, rarely involving external datasets (10.1%) or clinical experts. Few studies addressed clinical translation (5.9%), model performance variability estimation (38.3%), or made code available (29.8%). Overall, limited progress toward clinical integration has been made over the past decade. Our findings highlight the need for diverse, multi-institutional datasets, robust validation practices, and clinically driven development to unlock the full potential of SSU in surgical practice.
Rick H. Overwijk, Fiona R. Kolbinger, Mark Eijgelsheim, Olaf M. Dekkers, Andreas Kronbichler, and Ingeborg M. Bajema
Elsevier BV
Danush Kumar Venkatesh, Isabel Funke, Micha Pfeiffer, Fiona Kolbinger, Hanna Maria Schmeiser, Marius Distler, Jürgen Weitz, and Stefanie Speidel
Springer Nature Switzerland
Fiona R. Kolbinger, Omar S. M. El Nahhas, Maja Carina Nackenhorst, Christine Brostjan, Wolf Eilenberg, Albert Busch, and Jakob Nikolas Kather
Springer Science and Business Media LLC
Fiona R. Kolbinger and Jakob Nikolas Kather
Springer Science and Business Media LLC
Matthias Carstens, Micha Pfeiffer, Stefanie Speidel, Marius Distler, Jürgen Weitz, and Fiona R. Kolbinger
Springer Science and Business Media LLC
Zusammenfassung Künstliche Intelligenz (KI) bietet enormes Potenzial für die Chirurgie. Anwendungsfelder reichen von interdisziplinärer Therapiestratifizierung über die Unterstützung der Operationsplanung bis zur Entscheidungsunterstützung im Operationssaal, die im Fokus dieses Beitrags steht. Künstliche neuronale Netzwerke zur Analyse chirurgischer Videos können chirurgische Sicherheit, Effizienz und Planbarkeit verbessern. Voraussetzung dafür sind hochwertige, vielfältige (Meta‑)Daten, deren Annotation, Training und Validierung komplexe Anforderungen stellen. Trotz technischer Fortschritte scheitert die klinische Umsetzung bis dato oft an fehlender Datenstandardisierung, unzureichender Infrastruktur, regulatorischen Hürden und ethischen Unsicherheiten. Viele Modelle bleiben Black Boxes, was Akzeptanz und Vertrauen hemmt. Systeme müssen zudem robust, transparent und praktikabel in klinische Abläufe integrierbar sein. Um die klinische Translation von KI in der Chirurgie zu fördern, sind konsequente Datenerhebungsstrategien, datenschutzkonforme Lernverfahren, Explainable AI und Human-in-the-loop -Ansätze entscheidend. Auch regulatorische Rahmenbedingungen wie die EU Medical Device Regulation bzw. das Medizinprodukterecht-Durchführungsgesetz und der EU AI Act müssen KI-spezifisch für den medizinischen und insbesondere den interventionellen Bereich weiterentwickelt werden, um sichere, interdisziplinäre Assistenztechnologien im Operationssaal zu ermöglichen, die den chirurgischen Alltag sinnvoll ergänzen.
Sophie-Caroline Schwarzkopf, Jean-Paul Bereuter, Mark Enrik Geissler, Jürgen Weitz, Marius Distler, and Fiona R. Kolbinger
Public Library of Science (PLoS)
Managing postoperative complications is an essential part of surgical care and largely depends on the medical team’s experience. Large Language Models (LLMs) have demonstrated immense potential in supporting medical professionals. To evaluate the potential of LLMs in surgical patient care, we compared the performance of three state-of-the-art LLMs in managing postoperative complications to that of a panel of medical professionals based on six postsurgical patient cases. Six realistic postoperative patient cases were queried using GPT-3, GPT-4, and Gemini-Advanced and presented to human surgical caregivers. Humans and LLMs provided a triage assessment, an initial suspected diagnosis, and an acute management plan, including initial diagnostic and therapeutic measures. Responses were compared based on medical contextual correctness, coherence, and completeness. In comparison to human caregivers, GPT-3 and GPT-4 possess considerable competencies in correctly identifying postoperative complications (humans: 76.3% vs. GPT-3: 75.0% vs. GPT-4: 96.7%, p = 0.47) as well as triaging patients accordingly (humans: 84.8% vs. GPT-3: 50% vs. GPT-4: 38.3%, p = 0.19). With regard to diagnostic and therapeutic management of postoperative complications, GPT-3 and GPT-4 provided comprehensive management plans. Gemini-Advanced often provided no diagnostic or therapeutic recommendations and censored its outputs. In summary, LLMs can accurately interpret postoperative care scenarios and provide comprehensive management recommendations. These results showcase the improvements in LLMs performance with regard to postoperative surgical use cases and provide evidence for their potential value to support and augment surgical routine care.
Muhammad Ibtsaam Qadir, Jackson A. Baril, Michele T. Yip-Schneider, Duane Schonlau, Thi Thanh Thoa Tran, C. Max Schmidt, and Fiona R. Kolbinger
Public Library of Science (PLoS)
Based on the Fukuoka and Kyoto international consensus guidelines, the current clinical management of intraductal papillary mucinous neoplasm (IPMN) largely depends on imaging features. While these criteria are highly sensitive in detecting high-risk IPMN, they lack specificity, resulting in surgical overtreatment. Artificial Intelligence (AI)-based medical image analysis has the potential to augment the clinical management of IPMNs by improving diagnostic accuracy. Based on a systematic review of the academic literature on AI in IPMN imaging, 1041 publications were identified of which 25 published studies were included in the analysis. The studies were stratified based on prediction target, underlying data type and imaging modality, patient cohort size, and stage of clinical translation and were subsequently analyzed to identify trends and gaps in the field. Research on AI in IPMN imaging has been increasing in recent years. The majority of studies utilized CT imaging to train computational models. Most studies presented computational models developed on single-center datasets (n = 11,44%) and included less than 250 patients (n = 18,72%). Methodologically, convolutional neural network (CNN)-based algorithms were most commonly used. Thematically, most studies reported models augmenting differential diagnosis (n = 9,36%) or risk stratification (n = 10,40%) rather than IPMN detection (n = 5,20%) or IPMN segmentation (n = 2,8%). This systematic review provides a comprehensive overview of the research landscape of AI in IPMN imaging. Computational models have potential to enhance the accurate and precise stratification of patients with IPMN. Multicenter collaboration and datasets comprising various modalities are necessary to fully utilize this potential, alongside concerted efforts towards clinical translation.
Sebastian Hempel, Fiona R. Kolbinger, Florian Oehme, Olga Radulova-Mauersberger, Janine Schmid, Undine Schubert, Florian Schepp, Stefan Bornstein, Sandra Korn, Evelyn Trips,et al.
Public Library of Science (PLoS)
Introduction Pancreatic surgery remains associated with significant morbidity. Pancreatoduodenectomy (PD) with high-risk stigmata for postoperative pancreatic fistula (POPF) may delay or hinder administration of adjuvant therapy. Total pancreatectomy (TP) prevents POPF-associated complications but implies permanent exocrine and endocrine insufficiency. Islet autotransplantation (IAT) has the potential to compensate endocrine function. Methods and analysis XANDTX is a single-centre randomized controlled pilot trial comparing high-risk PD with TP and simultaneous IAT in patients with periampullary cancer. After screening for eligibility and obtaining informed consent, a total of 32 adult patients will be intraoperatively randomized in a 1:1 ratio. The primary hypothesis is that TP with IAT prevents POPF-associated complications and leads to a shorter period to initiation of adjuvant therapy and a higher overall rate of adjuvant therapy administration. Secondary endpoints include perioperative morbidity and mortality, metabolic outcome, quality of life (QoL) and oncological long-term outcome. Each patient will be followed up for 5 years. Discussion The XANDTX pilot trial will aim to provide surgical and oncological feasibility and safety data of total pancreatectomy with simultaneous islet autotransplantation in management of resectable periampullary cancer. The results will form the basis for a further confirmatory controlled study. Trial registration This study was registered on ClinicalTrials.gov (NCT05843877) on February 27, 2023 and EudraCT (2023-507773-17-00) on April 18, 2024.
Fiona R. Kolbinger, Nithya Bhasker, Felix Schön, Daniel Cser, Alex Zwanenburg, Steffen Löck, Sebastian Hempel, André Schulze, Nadiia Skorobohach, Hanna M. Schmeiser,et al.
Ovid Technologies (Wolters Kluwer Health)
Background: The risk of postoperative pancreatic fistula (POPF), one of the most dreaded complications after pancreatic surgery, can be predicted from preoperative imaging and tabular clinical routine data. However, existing studies suffer from limited clinical applicability due to a need for manual data annotation and a lack of external validation. We propose AutoFRS (automated fistula risk score software), an externally validated end-to-end prediction tool for POPF risk stratification based on multimodal preoperative data. Materials and methods: We trained AutoFRS on preoperative contrast-enhanced computed tomography imaging and clinical data from 108 patients undergoing pancreatic head resection and validated it on an external cohort of 61 patients. Prediction performance was assessed using the area under the receiver operating characteristic curve (AUC) and balanced accuracy. In addition, model performance was compared to the updated alternative fistula risk score (ua-FRS), the current clinical gold standard method for intraoperative POPF risk stratification. Results: AutoFRS achieved an AUC of 0.81 and a balanced accuracy of 0.72 in internal validation and an AUC of 0.79 and a balanced accuracy of 0.70 in external validation. In a patient subset with documented intraoperative POPF risk factors, AutoFRS (AUC: 0.84 ± 0.05) performed on par with the uaFRS (AUC: 0.85 ± 0.06). The AutoFRS web application facilitates annotation-free prediction of POPF from preoperative imaging and clinical data based on the AutoFRS prediction model. Conclusion: POPF can be predicted from multimodal clinical routine data without human data annotation, automating the risk prediction process. We provide additional evidence of the clinical feasibility of preoperative POPF risk stratification and introduce a software pipeline for future prospective evaluation. Graphical Abstract
Jean-Paul Bereuter, Mark Enrik Geissler, Anna Klimova, Robert-Patrick Steiner, Kevin Pfeiffer, Fiona R. Kolbinger, Isabella C. Wiest, Hannah Sophie Muti, and Jakob Nikolas Kather
Elsevier BV
Muhammad Ibtsaam Qadir, Ravi S Hira, and Fiona R Kolbinger
Oxford University Press (OUP)
Danush Kumar Venkatesh, Dominik Rivoir, Micha Pfeiffer, Fiona Kolbinger, and Stefanie Speidel
IEEE
Mark Enrik Geissler, Merle Goeben, Kira A. Glasmacher, Jean-Paul Bereuter, Rona Berit Geissler, Isabella C. Wiest, Fiona R. Kolbinger, and Jakob Nikolas Kather
Deutscher Arzte-Verlag GmbH
Fiona R. Kolbinger, Gregory P. Veldhuizen, Jiefu Zhu, Daniel Truhn, and Jakob Nikolas Kather
Springer Science and Business Media LLC
Abstract Background The field of Artificial Intelligence (AI) holds transformative potential in medicine. However, the lack of universal reporting guidelines poses challenges in ensuring the validity and reproducibility of published research studies in this field. Methods Based on a systematic review of academic publications and reporting standards demanded by both international consortia and regulatory stakeholders as well as leading journals in the fields of medicine and medical informatics, 26 reporting guidelines published between 2009 and 2023 were included in this analysis. Guidelines were stratified by breadth (general or specific to medical fields), underlying consensus quality, and target research phase (preclinical, translational, clinical) and subsequently analyzed regarding the overlap and variations in guideline items. Results AI reporting guidelines for medical research vary with respect to the quality of the underlying consensus process, breadth, and target research phase. Some guideline items such as reporting of study design and model performance recur across guidelines, whereas other items are specific to particular fields and research stages. Conclusions Our analysis highlights the importance of reporting guidelines in clinical AI research and underscores the need for common standards that address the identified variations and gaps in current guidelines. Overall, this comprehensive overview could help researchers and public stakeholders reinforce quality standards for increased reliability, reproducibility, clinical validity, and public trust in AI research in healthcare. This could facilitate the safe, effective, and ethical translation of AI methods into clinical applications that will ultimately improve patient outcomes.
Fiona R. Kolbinger, Sebastian Bodenstedt, Matthias Carstens, Stefan Leger, Stefanie Krell, Franziska M. Rinner, Thomas P. Nielen, Johanna Kirchberg, Johannes Fritzmann, Jürgen Weitz,et al.
Elsevier BV
Omar S. M. El Nahhas, Chiara M. L. Loeffler, Zunamys I. Carrero, Marko van Treeck, Fiona R. Kolbinger, Katherine J. Hewitt, Hannah S. Muti, Mara Graziani, Qinghe Zeng, Julien Calderaro,et al.
Springer Science and Business Media LLC
Omar S. M. El Nahhas, Chiara M. L. Loeffler, Zunamys I. Carrero, Marko van Treeck, Fiona R. Kolbinger, Katherine J. Hewitt, Hannah S. Muti, Mara Graziani, Qinghe Zeng, Julien Calderaro,et al.
Springer Science and Business Media LLC
AbstractDeep Learning (DL) can predict biomarkers from cancer histopathology. Several clinically approved applications use this technology. Most approaches, however, predict categorical labels, whereas biomarkers are often continuous measurements. We hypothesize that regression-based DL outperforms classification-based DL. Therefore, we develop and evaluate a self-supervised attention-based weakly supervised regression method that predicts continuous biomarkers directly from 11,671 images of patients across nine cancer types. We test our method for multiple clinically and biologically relevant biomarkers: homologous recombination deficiency score, a clinically used pan-cancer biomarker, as well as markers of key biological processes in the tumor microenvironment. Using regression significantly enhances the accuracy of biomarker prediction, while also improving the predictions’ correspondence to regions of known clinical relevance over classification. In a large cohort of colorectal cancer patients, regression-based prediction scores provide a higher prognostic value than classification-based scores. Our open-source regression approach offers a promising alternative for continuous biomarker analysis in computational pathology.
Alexander C. Jenke, Sebastian Bodenstedt, Fiona R. Kolbinger, Marius Distler, Jürgen Weitz, and Stefanie Speidel
Springer Science and Business Media LLC
Abstract Purpose Understanding surgical scenes is crucial for computer-assisted surgery systems to provide intelligent assistance functionality. One way of achieving this is via scene segmentation using machine learning (ML). However, such ML models require large amounts of annotated training data, containing examples of all relevant object classes, which are rarely available. In this work, we propose a method to combine multiple partially annotated datasets, providing complementary annotations, into one model, enabling better scene segmentation and the use of multiple readily available datasets. Methods Our method aims to combine available data with complementary labels by leveraging mutual exclusive properties to maximize information. Specifically, we propose to use positive annotations of other classes as negative samples and to exclude background pixels of these binary annotations, as we cannot tell if a positive prediction by the model is correct. Results We evaluate our method by training a DeepLabV3 model on the publicly available Dresden Surgical Anatomy Dataset, which provides multiple subsets of binary segmented anatomical structures. Our approach successfully combines 6 classes into one model, significantly increasing the overall Dice Score by 4.4% compared to an ensemble of models trained on the classes individually. By including information on multiple classes, we were able to reduce the confusion between classes, e.g. a 24% drop for stomach and colon. Conclusion By leveraging multiple datasets and applying mutual exclusion constraints, we developed a method that improves surgical scene segmentation performance without the need for fully annotated datasets. Our results demonstrate the feasibility of training a model on multiple complementary datasets. This paves the way for future work further alleviating the need for one specialized large, fully segmented dataset but instead the use of already existing datasets.
Danush Kumar Venkatesh, Dominik Rivoir, Micha Pfeiffer, Fiona Kolbinger, Marius Distler, Jürgen Weitz, and Stefanie Speidel
Springer Science and Business Media LLC
Abstract Purpose In surgical computer vision applications, data privacy and expert annotation challenges impede the acquisition of labeled training data. Unpaired image-to-image translation techniques have been explored to automatically generate annotated datasets by translating synthetic images into a realistic domain. The preservation of structure and semantic consistency, i.e., per-class distribution during translation, poses a significant challenge, particularly in cases of semantic distributional mismatch. Method This study empirically investigates various translation methods for generating data in surgical applications, explicitly focusing on semantic consistency. Through our analysis, we introduce a novel and simple combination of effective approaches, which we call ConStructS. The defined losses within this approach operate on multiple image patches and spatial resolutions during translation. Results Various state-of-the-art models were extensively evaluated on two challenging surgical datasets. With two different evaluation schemes, the semantic consistency and the usefulness of the translated images on downstream semantic segmentation tasks were evaluated. The results demonstrate the effectiveness of the ConStructS method in minimizing semantic distortion, with images generated by this model showing superior utility for downstream training. Conclusion In this study, we tackle semantic inconsistency in unpaired image translation for surgical applications with minimal labeled data. The simple model (ConStructS) enhances consistency during translation and serves as a practical way of generating fully labeled and semantically consistent datasets at minimal cost. Our code is available at https://gitlab.com/nct_tso_public/constructs.
Xinne Zhao, Fiona R. Kolbinger, Marius Distler, Jürgen Weitz, Denys Makarov, Michael Bachmann, and Larysa Baraban
Elsevier BV
Fiona R. Kolbinger, Jiangpeng He, Jinge Ma, and Fengqing Zhu
IEEE
Reuben Docea, Jinjing Xu, Wei Ling, Alexander C. Jenke, Fiona R. Kolbinger, Marius Distler, Carina Riediger, Jürgen Weitz, Stefanie Speidel, and Micha Pfeiffer
Institute of Electrical and Electronics Engineers (IEEE)
Dominik Rivoir, Martin Wagner, Sebastian Bodenstedt, Keno März, Fiona Kolbinger, Lena Maier-Hein, Silvia Seidlitz, Johanna Brandenburg, Beat Peter Müller-Stich, Marius Distler,et al.
Springer Nature Switzerland
Philipp Gauckler, Jana S. Kesenheimer, Johannes Leierer, Maren Kruus, Michael Schreinlechner, Fabian Theurl, Axel Bauer, Sara Denicolò, Alexander Egger, Beata Seeber,et al.
Elsevier BV