Joao Correia

@uminho.pt

University of Minho

Joao Correia

RESEARCH, TEACHING, or OTHER INTERESTS

Artificial Intelligence
9

Scopus Publications

368

Scholar Citations

6

Scholar h-index

5

Scholar i10-index

Scopus Publications

  • Deepmol: an automated machine and deep learning framework for computational chemistry
    João Correia, João Capela, Miguel Rocha
    Journal of Cheminformatics, 2024
  • Combining evolutionary algorithms with reaction rules towards focused molecular design
    João Correia, Vítor Pereira, Miguel Rocha
    Gecco 2023 Proceedings of the 2023 Genetic and Evolutionary Computation Conference, 2023
    Designing novel small molecules with desirable properties and feasible synthesis continues to pose a significant challenge in drug discovery, particularly in the realm of natural products. Reaction-based gradient-free methods are promising approaches for designing new molecules as they ensure synthetic feasibility and provide potential synthesis paths. However, it is important to note that the novelty and diversity of the generated molecules highly depend on the availability of comprehensive reaction templates. To address this challenge, we introduce ReactEA, a new open-source evolutionary framework for computer-aided drug discovery that solely utilizes biochemical reaction rules. ReactEA optimizes molecular properties using a comprehensive set of 22,949 reaction rules, ensuring chemical validity and synthetic feasibility. ReactEA is versatile, as it can virtually optimize any objective function and track potential synthetic routes during the optimization process. To demonstrate its effectiveness, we apply ReactEA to various case studies, including the design of novel drug-like molecules and the optimization of pre-existing ligands. The results show that ReactEA consistently generates novel molecules with improved properties and reasonable synthetic routes, even for complex tasks such as improving binding affinity against the PARP1 enzyme when compared to existing inhibitors.
  • Evaluating molecular representations in machine learning models for drug response prediction and interpretability
    Delora Baptista, João Correia, Bruno Pereira, Miguel Rocha
    Journal of Integrative Bioinformatics, 2022
    Machine learning (ML) is increasingly being used to guide drug discovery processes. When applying ML approaches to chemical datasets, molecular descriptors and fingerprints are typically used to represent compounds as numerical vectors. However, in recent years, end-to-end deep learning (DL) methods that can learn feature representations directly from line notations or molecular graphs have been proposed as alternatives to using precomputed features. This study set out to investigate which compound representation methods are the most suitable for drug sensitivity prediction in cancer cell lines. Twelve different representations were benchmarked on 5 compound screening datasets, using DeepMol, a new chemoinformatics package developed by our research group, to perform these analyses. The results of this study show that the predictive performance of end-to-end DL models is comparable to, and at times surpasses, that of models trained on molecular fingerprints, even when less training data is available. This study also found that combining several compound representation methods into an ensemble can improve performance. Finally, we show that a post hoc feature attribution method can boost the explainability of the DL models.
  • Development of Deep Learning approaches to predict relationships between chemical structures and sweetness
    Joao Capela, Joao Correia, Vitor Pereira, Miguel Rocha
    Proceedings of the International Joint Conference on Neural Networks, 2022
    The non-caloric sweeteners market is catching up with the market of conventionally used sugars due to the benefits of preventing obesity, tooth decay and other health problems. Developing strategies for designing easier-to-produce novel molecules with a sweet taste and less toxicity are up-to-date motivations for the food industry. In this sense, Machine Learning (ML) approaches have been reported as cutting-edge technologies to guide the design of new molecules towards specific objectives, including sweet taste. The largest known dataset of sweet molecules is here provided. The dataset contains fully integrated 9541 sweeteners and 1141 bitterants from FooDB, FlavorDB and literature. This robust dataset allowed the development of standard Machine and Deep Learning pipelines towards conceiving Structure-Activity Relationships (SAR) between molecules and sweetness. In this work, we showcase that Textual Convolutional Neural Networks (TextCNN), Graph Convolutional Networks (GCN), and Deep Neural Networks (DNNs) outperformed most of traditional “shallow” learning approaches. These Deep Learning (DL) models produced platforms to guide the design of new sweeteners and repurposing existing compounds. Sixty million compounds from PubChem were evaluated using these models. Herein, we deliver a dataset of 67724 compounds that present high probabilities of being sweet. Quick searches in literature allowed us to find 13 molecules reported as potent sweetening agents, revealing that our approach is suitable for finding new sweeteners, valuable to expand food chemistry databases, repurposing existing chemicals and designing novel molecules with a sweet taste.
  • Predicting the number of biochemical transformations needed to synthesize a compound
    Joao Correia, Rafael Carreira, Vitor Pereira, Miguel Rocha
    Proceedings of the International Joint Conference on Neural Networks, 2022
    Exploiting the natural metabolic abilities of microorganisms for the production of bioactive compounds has been a research problem of great interest. The economical and environmental costs associated with petrochemical-derived industries have promoted the emergence of biochemical processes from renewable carbon sources. However, optimally rewiring microbial metabolism in a competitive and sustainable manner is still a challenge. Recently, some retrobiosynthesis tools for the design of de novo biosynthetic pathways have been proposed. These tools generate a large number of intermediate compounds that are beyond experimental feasibility. Thus, effective methods to reduce the number of compounds by selecting the most promising ones are still needed. Here, we propose the use of classification and regression deep learning models, such as fully-connected neural networks and 1D convolutional neural networks, to predict the number of biochemical transformations needed to produce a compound. The data to train and evaluate the models was generated using a set of 13055 reaction rules and 673 compounds from Escherichia coli metabolism as starting compounds. The data was generated up to 5 steps resulting in a dataset of over 2.6 million compounds. This approach can be effectively used in biochemical applications, including retrobiosyntesis, to prioritize compounds that can be produced using fewer biochemical transformations.
  • A Comparison of Different Compound Representations for Drug Sensitivity Prediction
    Delora Baptista, João Correia, Bruno Pereira, Miguel Rocha
    Lecture Notes in Networks and Systems, 2022
  • Generative Deep Learning for Targeted Compound Design
    Tiago Sousa, João Correia, Vítor Pereira, Miguel Rocha
    Journal of Chemical Information and Modeling, 2021
    In the past few years, de novo molecular design has increasingly been using generative models from the emergent field of Deep Learning, proposing novel compounds that are likely to possess desired properties or activities. De novo molecular design finds applications in different fields ranging from drug discovery and materials sciences to biotechnology. A panoply of deep generative models, including architectures as Recurrent Neural Networks, Autoencoders, and Generative Adversarial Networks, can be trained on existing data sets and provide for the generation of novel compounds. Typically, the new compounds follow the same underlying statistical distributions of properties exhibited on the training data set Additionally, different optimization strategies, including transfer learning, Bayesian optimization, reinforcement learning, and conditional generation, can direct the generation process toward desired aims, regarding their biological activities, synthesis processes or chemical features. Given the recent emergence of these technologies and their relevance, this work presents a systematic and critical review on deep generative models and related optimization methods for targeted compound design, and their applications.
  • Combining Multi-objective Evolutionary Algorithms with Deep Generative Models Towards Focused Molecular Design
    Tiago Sousa, João Correia, Vitor Pereira, Miguel Rocha
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2021
  • Artificial Intelligence in Biological Activity Prediction
    João Correia, Tiago Resende, Delora Baptista, Miguel Rocha
    Advances in Intelligent Systems and Computing, 2020

RECENT SCHOLAR PUBLICATIONS

  • NeuroScaler: Towards Energy-Optimal Autoscaling for Container-Based Services
    AO Chaves, R Moreira, LFR Moreira, J Correia, D Santos, R Silva, ...
    arXiv preprint arXiv:2602.08191 , 2026
    2026
  • Deepmol: an automated machine and deep learning framework for computational chemistry
    J Correia, J Capela, M Rocha
    Journal of Cheminformatics 16 (1), 136 , 2024
    2024
    Citations: 25
  • DeepRetro: a computational framework for retrosynthesis and pathway design towards optimizing compound bioproduction
    JFS Correia
    PQDT-Global , 2024
    2024
  • Combining Evolutionary Algorithms with Reaction Rules Towards Focused Molecular Design
    J Correia, V Pereira, M Rocha
    Proceedings of the Genetic and Evolutionary Computation Conference, 900-909 , 2023
    2023
    Citations: 2
  • Evaluating molecular representations in machine learning models for drug response prediction and interpretability
    D Baptista, J Correia, B Pereira, M Rocha
    Journal of Integrative Bioinformatics 19 (3), 20220006 , 2022
    2022
    Citations: 72
  • Predicting the number of biochemical transformations needed to synthesize a compound
    J Correia, R Carreira, V Pereira, M Rocha
    2022 International Joint Conference on Neural Networks (IJCNN), 1-8 , 2022
    2022
    Citations: 3
  • Development of Deep Learning approaches to predict relationships between chemical structures and sweetness
    J Capela, J Correia, V Pereira, M Rocha
    2022 International Joint Conference on Neural Networks (IJCNN), 1-8 , 2022
    2022
    Citations: 8
  • Generative deep learning for targeted compound design
    T Sousa, J Correia, V Pereira, M Rocha
    Journal of Chemical Information and Modeling 61 (11), 5343-5361 , 2021
    2021
    Citations: 226
  • A comparison of different compound representations for drug sensitivity prediction
    D Baptista, J Correia, B Pereira, M Rocha
    International Conference on Practical Applications of Computational Biology … , 2021
    2021
    Citations: 5
  • Combining multi-objective evolutionary algorithms with deep generative models towards focused molecular design
    T Sousa, J Correia, V Pereira, M Rocha
    International Conference on the Applications of Evolutionary Computation … , 2021
    2021
    Citations: 12
  • Artificial intelligence in biological activity prediction
    J Correia, T Resende, D Baptista, M Rocha
    International Conference on Practical Applications of Computational Biology … , 2019
    2019
    Citations: 15
  • Hiv-Tb-Host Protein Interaction Network
    JFS Correia
    PQDT-Global , 2018
    2018

MOST CITED SCHOLAR PUBLICATIONS

  • Generative deep learning for targeted compound design
    T Sousa, J Correia, V Pereira, M Rocha
    Journal of Chemical Information and Modeling 61 (11), 5343-5361 , 2021
    2021
    Citations: 226
  • Evaluating molecular representations in machine learning models for drug response prediction and interpretability
    D Baptista, J Correia, B Pereira, M Rocha
    Journal of Integrative Bioinformatics 19 (3), 20220006 , 2022
    2022
    Citations: 72
  • Deepmol: an automated machine and deep learning framework for computational chemistry
    J Correia, J Capela, M Rocha
    Journal of Cheminformatics 16 (1), 136 , 2024
    2024
    Citations: 25
  • Artificial intelligence in biological activity prediction
    J Correia, T Resende, D Baptista, M Rocha
    International Conference on Practical Applications of Computational Biology … , 2019
    2019
    Citations: 15
  • Combining multi-objective evolutionary algorithms with deep generative models towards focused molecular design
    T Sousa, J Correia, V Pereira, M Rocha
    International Conference on the Applications of Evolutionary Computation … , 2021
    2021
    Citations: 12
  • Development of Deep Learning approaches to predict relationships between chemical structures and sweetness
    J Capela, J Correia, V Pereira, M Rocha
    2022 International Joint Conference on Neural Networks (IJCNN), 1-8 , 2022
    2022
    Citations: 8
  • A comparison of different compound representations for drug sensitivity prediction
    D Baptista, J Correia, B Pereira, M Rocha
    International Conference on Practical Applications of Computational Biology … , 2021
    2021
    Citations: 5
  • Predicting the number of biochemical transformations needed to synthesize a compound
    J Correia, R Carreira, V Pereira, M Rocha
    2022 International Joint Conference on Neural Networks (IJCNN), 1-8 , 2022
    2022
    Citations: 3
  • Combining Evolutionary Algorithms with Reaction Rules Towards Focused Molecular Design
    J Correia, V Pereira, M Rocha
    Proceedings of the Genetic and Evolutionary Computation Conference, 900-909 , 2023
    2023
    Citations: 2
  • NeuroScaler: Towards Energy-Optimal Autoscaling for Container-Based Services
    AO Chaves, R Moreira, LFR Moreira, J Correia, D Santos, R Silva, ...
    arXiv preprint arXiv:2602.08191 , 2026
    2026
  • DeepRetro: a computational framework for retrosynthesis and pathway design towards optimizing compound bioproduction
    JFS Correia
    PQDT-Global , 2024
    2024
  • Hiv-Tb-Host Protein Interaction Network
    JFS Correia
    PQDT-Global , 2018
    2018