Andre Lamurias

Scopus Publications

Text Mining for Bioinformatics Using Biomedical Literature
Andre Lamurias, Diana F. Sousa, Francisco M. Couto
Encyclopedia of Bioinformatics and Computational Biology, 2025
Semantic Similarity Definition
Francisco M. Couto, Andre Lamurias, Pedro Ruas
Encyclopedia of Bioinformatics and Computational Biology, 2025
Team NOVA LINCS @ BIOASQ12 MultiCardioNER Track: Entity Recognition with Additional Entity Types
Ceur Workshop Proceedings, 2024
DGH-GO: dissecting the genetic heterogeneity of complex diseases using gene ontology
Muhammad Asif, Hugo F. M. C. Martiniano, Andre Lamurias, Samina Kausar, Francisco M. Couto
BMC Bioinformatics, 2023
Background Complex diseases such as neurodevelopmental disorders (NDDs) exhibit multiple etiologies. The multi-etiological nature of complex-diseases emerges from distinct but functionally similar group of genes. Different diseases sharing genes of such groups show related clinical outcomes that further restrict our understanding of disease mechanisms, thus, limiting the applications of personalized medicine approaches to complex genetic disorders. Results Here, we present an interactive and user-friendly application, called DGH-GO. DGH-GO allows biologists to dissect the genetic heterogeneity of complex diseases by stratifying the putative disease-causing genes into clusters that may contribute to distinct disease outcome development. It can also be used to study the shared etiology of complex-diseases. DGH-GO creates a semantic similarity matrix for the input genes by using Gene Ontology (GO). The resultant matrix can be visualized in 2D plots using different dimension reduction methods (T-SNE, Principal component analysis, umap and Principal coordinate analysis). In the next step, clusters of functionally similar genes are identified from genes functional similarities assessed through GO. This is achieved by employing four different clustering methods (K-means, Hierarchical, Fuzzy and PAM). The user may change the clustering parameters and explore their effect on stratification immediately. DGH-GO was applied to genes disrupted by rare genetic variants in Autism Spectrum Disorder (ASD) patients. The analysis confirmed the multi-etiological nature of ASD by identifying four clusters of genes that were enriched for distinct biological mechanisms and clinical outcome. In the second case study, the analysis of genes shared by different NDDs showed that genes causing multiple disorders tend to aggregate in similar clusters, indicating a possible shared etiology. Conclusion DGH-GO is a user-friendly application that allows biologists to study the multi-etiological nature of complex diseases by dissecting their genetic heterogeneity. In summary, functional similarities, dimension reduction and clustering methods, coupled with interactive visualization and control over analysis allows biologists to explore and analyze their datasets without requiring expert knowledge on these methods. The source code of proposed application is available at https://github.com/Muh-Asif/DGH-GO
Multilingual bi-encoder models for biomedical entity linking
Zekeriya Anil Guven, Andre Lamurias
Expert Systems, 2023
Abstract Natural language processing (NLP) is a field of study that focuses on data analysis on texts with certain methods. NLP includes tasks such as sentiment analysis, spam detection, entity linking, and question answering, to name a few. Entity linking is an NLP task that is used to map mentions specified in the text to the entities of a Knowledge Base. In this study, we analysed the efficacy of bi‐encoder entity linking models for multilingual biomedical texts. Using surface‐based, approximate nearest neighbour search and embedding approaches during the candidate generation phase, accuracy, and recall values were measured on language representation models such as BERT, SapBERT, BioBERT, and RoBERTa according to language and domain. The proposed entity linking framework was analysed on the BC5CDR and Cantemist datasets for English and Spanish, respectively. The framework achieved 76.75% accuracy for the BC5CDR and 60.19% for the Cantemist. In addition, the proposed framework was compared with previous studies. The results highlight the challenges that come with domain‐specific multilingual datasets.
Metagenomic Binning using Connectivity-constrained Variational Autoencoders
Proceedings of Machine Learning Research, 2023
BinChill: A Metagenomic Binning Ensemble Method
Oliver S. Bak, Marcus D. Jensen, Frederik M. Trudslev, Andreas Windfeld, Andre Lamurias
IEEE Access, 2023
The goal of metagenomic binning is to reconstruct genomes from a mixture of DNA sequences into genomic bins, which can be considered a clustering task. Multiple methods have been proposed for this task, such as distance-based metrics, machine learning, and ensemble approaches. We propose BinChill, a metagenomic ensemble method, based on the generic co-occurrence ensembler method, ACE. BinChill incorporates domain information in the form of Single-Copy Genes (SCG) with a co-occurrence strategy. This strategy combines multiple clustering partitions according to how often two items co-occur in the same cluster. BinChill was able to reconstruct more or equally as many high- and medium quality while having an equal or faster runtime than other metagenomics-specific methods on a smaller simulated dataset. On larger datasets, both simulated and real-world, BinChill outperformed other methods in reconstructing high-quality bins, at the cost of an increased processing time when compared to generic ensemble clustering algorithms. This is due to the domain-specific steps that our method implements. Our results show that the strengths of multiple partitions can be combined to generate a partition of higher quality.
Metagenomic binning with assembly graph embeddings
Andre Lamurias, Mantas Sereika, Mads Albertsen, Katja Hose, Thomas Dyhre Nielsen
Bioinformatics, 2022
Motivation Despite recent advancements in sequencing technologies and assembly methods, obtaining high-quality microbial genomes from metagenomic samples is still not a trivial task. Current metagenomic binners do not take full advantage of assembly graphs and are not optimized for long-read assemblies. Deep graph learning algorithms have been proposed in other fields to deal with complex graph data structures. The graph structure generated during the assembly process could be integrated with contig features to obtain better bins with deep learning. Results We propose GraphMB, which uses graph neural networks to incorporate the assembly graph into the binning process. We test GraphMB on long-read datasets of different complexities, and compare the performance with other binners in terms of the number of High Quality (HQ) genome bins obtained. With our approach, we were able to obtain unique bins on all real datasets, and obtain more bins on most datasets. In particular, we obtained on average 17.5% more HQ bins when compared with state-of-the-art binners and 13.7% when aggregating the results of our binner with the others. These results indicate that a deep learning model can integrate contig-specific and graph-structure information to improve metagenomic binning. Availability and implementation GraphMB is available from https://github.com/MicrobialDarkMatter/GraphMB. Supplementary information Supplementary data are available at Bioinformatics online.
Using Neural Networks for Relation Extraction from Biomedical Literature
Diana Sousa, Andre Lamurias, Francisco M. Couto
Methods in Molecular Biology, 2021
Information Retrieval Using Machine Learning for Biomarker Curation in the Exposome-Explorer
Andre Lamurias, Sofia Jesus, Vanessa Neveu, Reza M. Salek, Francisco M. Couto
Frontiers in Research Metrics and Analytics, 2021
Objective: In 2016, the International Agency for Research on Cancer, part of the World Health Organization, released the Exposome-Explorer, the first database dedicated to biomarkers of exposure for environmental risk factors for diseases. The database contents resulted from a manual literature search that yielded over 8,500 citations, but only a small fraction of these publications were used in the final database. Manually curating a database is time-consuming and requires domain expertise to gather relevant data scattered throughout millions of articles. This work proposes a supervised machine learning pipeline to assist the manual literature retrieval process.Methods: The manually retrieved corpus of scientific publications used in the Exposome-Explorer was used as training and testing sets for the machine learning models (classifiers). Several parameters and algorithms were evaluated to predict an article’s relevance based on different datasets made of titles, abstracts and metadata.Results: The top performance classifier was built with the Logistic Regression algorithm using the title and abstract set, achieving an F2-score of 70.1%. Furthermore, we extracted 1,143 entities from these articles with a classifier trained for biomarker entity recognition. Of these, we manually validated 45 new candidate entries to the database.Conclusion: Our methodology reduced the number of articles to be manually screened by the database curators by nearly 90%, while only misclassifying 22.1% of the relevant articles. We expect that this methodology can also be applied to similar biomarkers datasets or be adapted to assist the manual curation process of similar chemical or disease databases.
Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature
Pedro Ruas, Andre Lamurias, Francisco M. Couto
Journal of Cheminformatics, 2020
Extreme Multi-Label Classification applied to the Biomedical and Multilingual Panorama
Ceur Workshop Proceedings, 2020
A hybrid approach toward biomedical relation extraction training corpora: Combining distant supervision with crowdsourcing
Diana Sousa, Andre Lamurias, Francisco M Couto
Database, 2020
Improving accessibility and distinction between negative results in biomedical relation extraction
Diana Sousa, Andre Lamurias, Francisco M. Couto
Genomics and Informatics, 2020
Towards a multilingual corpus for named entity linking evaluation in the clinical domain
Ceur Workshop Proceedings, 2020
Generating Biomedical Question Answering Corpora from QA Forums
Andre Lamurias, Diana Sousa, Francisco M. Couto
IEEE Access, 2020
LasigeBioTM team at CLEF2020 ChEMU evaluation lab: Named Entity Recognition and Event extraction from chemical reactions described in patents using BioBERT NER and RE
Ceur Workshop Proceedings, 2020
Biomedical question answering using extreme multi- label classification and ontologies in the multilingual panorama
Ceur Workshop Proceedings, 2020
An Extended Overview of the CLEF 2020 ChEMU Lab: Information Extraction of Chemical Reactions from Patents
Ceur Workshop Proceedings, 2020
PPR-SSM: Personalized PageRank and semantic similarity measures for entity linking
Andre Lamurias, Pedro Ruas, Francisco M. Couto
BMC Bioinformatics, 2019
BO-LSTM: Classifying relations via long short-term memory networks along biomedical ontologies
Andre Lamurias, Diana Sousa, Luka A. Clarke, Francisco M. Couto
BMC Bioinformatics, 2019
Semantic Similarity Definition
Francisco M. Couto, Andre Lamurias
Encyclopedia of Bioinformatics and Computational Biology Abc of Bioinformatics, 2019
LasigeBioTM at MEDIQA 2019: Biomedical question answering using bidirectional transformers and named entity recognition
Andre Lamurias, Francisco M Couto
Bionlp 2019 Sigbiomed Workshop on Biomedical Natural Language Processing Proceedings of the 18th Bionlp Workshop and Shared Task, 2019
Text Mining for Bioinformatics Using Biomedical Literature
Andre Lamurias, Francisco M. Couto
Encyclopedia of Bioinformatics and Computational Biology Abc of Bioinformatics, 2019
A silver standard corpus of human phenotype-gene relations
Diana Sousa, Andre Lamurias, Francisco M. Couto
Naacl Hlt 2019 2019 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Proceedings of the Conference, 2019
MER: A shell script and annotation server for minimal named entity recognition and linking
Francisco M. Couto, Andre Lamurias
Journal of Cheminformatics, 2018
Text mining for bioinformatics using biomedical literature
Andre Lamurias, Francisco M. Couto
Encyclopedia of Bioinformatics and Computational Biology Abc of Bioinformatics, 2018
Semantic similarity definition
Francisco M. Couto, Andre Lamurias
Encyclopedia of Bioinformatics and Computational Biology Abc of Bioinformatics, 2018
Generating a tolerogenic cell therapy knowledge graph from literature
Andre Lamurias, João D. Ferreira, Luka A. Clarke, Francisco M. Couto
Frontiers in Immunology, 2017
Extracting microRNA-gene relations from biomedical literature using distant supervision
Andre Lamurias, Luka A. Clarke, Francisco M. Couto
Plos One, 2017
Identifying human phenotype terms by combining machine learning and validation rules
Manuel Lobo, Andre Lamurias, Francisco M. Couto
Biomed Research International, 2017
ULISBOA at SemEval-2017 Task 12: Extraction and classification of temporal expressions and events
Andre Lamurias, Diana Sousa, Sofia Pereira, Luka Clarke, Francisco M Couto
Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2017
Extraction of Regulatory Events using Kernel-based Classifiers and Distant Supervision
Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2016
ULISBOA at SemEval-2016 task 12: Extraction of temporal expressions, clinical events and relations using IBEnt
Marcia Barros, André Lamúrias, Gonçalo Figueiró, Marta Antunes, Joana Teixeira, Alexandre Pinheiro, Francisco M. Couto
Semeval 2016 10th International Workshop on Semantic Evaluation Proceedings, 2016
IICE: Web tool for automatic identification of chemical entities and interactions
Andre Lamurias, Luka A. Clarke, Francisco M. Couto
Lecture Notes in Computer Science, 2015
Improving chemical entity recognition through h-index based semantic similarity
Andre Lamurias, João D Ferreira, Francisco M Couto
Journal of Cheminformatics, 2015
Annotating biomedical ontology terms in electronic health records using crowd-sourcing
Ceur Workshop Proceedings, 2015
The CHEMDNER corpus of chemicals and drugs and its annotation principles
Martin Krallinger, Obdulia Rabal, Florian Leitner, Miguel Vazquez, David Salgado, Zhiyong Lu, Robert Leaman, Yanan Lu, Donghong Ji, Daniel M Lowe, Roger A Sayle, Riza Theresa Batista-Navarro, Rafal Rak, Torsten Huber, Tim Rocktäschel, Sérgio Matos, David Campos, Buzhou Tang, Hua Xu, Tsendsuren Munkhdalai, Keun Ho Ryu, SV Ramanan, Senthil Nathan, Slavko Žitnik, Marko Bajec, Lutz Weber, Matthias Irmer, Saber A Akhondi, Jan A Kors, Shuo Xu, Xin An, Utpal Kumar Sikdar, Asif Ekbal, Masaharu Yoshioka, Thaer M Dieb, Miji Choi, Karin Verspoor, Madian Khabsa, C Lee Giles, Hongfang Liu, Komandur Elayavilli Ravikumar, Andre Lamurias, Francisco M Couto, Hong-Jie Dai, Richard Tzong-Han Tsai, Caglar Ata, Tolga Can, Anabel Usié, Rui Alves, Isabel Segura-Bedmar, Paloma Martínez, Julen Oyarzabal, Alfonso Valencia
Journal of Cheminformatics, 2015
Chemical named entity recognition: Improving recall using a comprehensive list of lexical features
Andre Lamurias, João Ferreira, Francisco M. Couto
Advances in Intelligent Systems and Computing, 2014
Identifying interactions between chemical entities in biomedical text
André Lamúrias, Francisco M. Couto, João D. Ferreira
Journal of Integrative Bioinformatics, 2014