Francisco S Melo

@inesc-id.pt

Associate Professor, Department of Computer Science
INESC-ID and Instituto Superior Técnico, University of Lisbon



                    

https://researchid.co/fmelo

RESEARCH INTERESTS

Artificial Intelligence; Machine Learning; Reinforcement Learning

143

Scopus Publications

3693

Scholar Citations

33

Scholar h-index

75

Scholar i10-index

Scopus Publications

  • A Comparative Study of Continual Backpropagation
    Jacopo Silvestrin, Francisco S. Melo, and Manuel Lopes

    Springer Nature Switzerland

  • Preface
    Steven Davy and Danyal Aftab

    IOS Press
    Large language models demonstrate impressive proficiency in language understanding and generation. Nonetheless, training these models from scratch, even the least complex billion-parameter variant demands significant computational resources rendering it economically impractical for many organizations. With large language models functioning as general-purpose task solvers, this paper investigates their task-specific fine-tuning. We employ task-specific datasets and prompts to fine-tune two pruned LLaMA models having 5 billion and 4 billion parameters. This process utilizes the pre-trained weights and focuses on a subset of weights using the LoRA method. One challenge in fine-tuning the LLaMA model is crafting a precise prompt tailored to the specific task. To address this, we propose a novel approach to fine-tune the LLaMA model under two primary constraints: task specificity and prompt effectiveness. Our approach, Tailored LLaMA initially employs structural pruning to reduce the model sizes from 7B to 5B and 4B parameters. Subsequently, it applies a carefully designed prompt specific to the task and utilizes the LoRA method to accelerate the fine-tuning process. Moreover, fine-tuning a model pruned by 50\\% for less than one hour restores the mean accuracy of classification tasks to 95.68\\% at a 20\\% compression ratio and to 86.54\\% at a 50\\% compression ratio through few-shot learning with 50 shots. Our validation of Tailored LLaMA on these two pruned variants demonstrates that even when compressed to 50\\%, the models maintain over 65\\% of the baseline model accuracy in few-shot classification and generation tasks. These findings highlight the efficacy of our tailored approach in maintaining high performance with significantly reduced model sizes.

  • The impact of data distribution on Q-learning with function approximation
    Pedro P. Santos, Diogo S. Carvalho, Alberto Sardinha, and Francisco S. Melo

    Springer Science and Business Media LLC
    AbstractWe study the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a unified theoretical and empirical analysis as to how different properties of the data distribution influence the performance of Q-learning-based algorithms. We connect different lines of research, as well as validate and extend previous results, being primarily focused on offline settings. First, we analyze the impact of the data distribution by using optimization as a tool to better understand which data distributions yield low concentrability coefficients. We motivate high-entropy distributions from a game-theoretical point of view and propose an algorithm to find the optimal data distribution from the point of view of concentrability. Second, from an empirical perspective, we introduce a novel four-state MDP specifically tailored to highlight the impact of the data distribution in the performance of Q-learning-based algorithms with function approximation. Finally, we experimentally assess the impact of the data distribution properties on the performance of two offline Q-learning-based algorithms under different environments. Our results attest to the importance of different properties of the data distribution such as entropy, coverage, and data quality (closeness to optimal policy).

  • When a Robot Is Your Teammate
    Filipa Correia, Francisco S. Melo, and Ana Paiva

    Wiley
    Creating effective teamwork between humans and robots involves not only addressing their performance as a team but also sustaining the quality and sense of unity among teammates, also known as cohesion. This paper explores the research problem of: how can we endow robotic teammates with social capabilities to improve the cohesive alliance with humans? By defining the concept of a human-robot cohesive alliance in the light of the multidimensional construct of cohesion from the social sciences, we propose to address this problem through the idea of multifaceted human-robot cohesion. We present our preliminary effort from previous works to examine each of the five dimensions of cohesion: social, collective, emotional, structural, and task. We finish the paper with a discussion on how human-robot cohesion contributes to the key questions and ongoing challenges of creating robotic teammates. Overall, cohesion in human-robot teams might be a key factor to propel team performance and it should be considered in the design, development, and evaluation of robotic teammates.

  • HOTSPOT: An ad hoc teamwork platform for mixed human-robot teams
    João G. Ribeiro, Luis Müller Henriques, Sérgio Colcher, Julio Cesar Duarte, Francisco S. Melo, Ruy Luiz Milidiú, and Alberto Sardinha

    Public Library of Science (PLoS)
    Ad hoc teamwork is a research topic in multi-agent systems whereby an agent (the “ad hoc agent”) must successfully collaborate with a set of unknown agents (the “teammates”) without any prior coordination or communication protocol. However, research in ad hoc teamwork is predominantly focused on agent-only teams, but not on agent-human teams, which we believe is an exciting research avenue and has enormous application potential in human-robot teams. This paper will tap into this potential by proposing HOTSPOT, the first framework for ad hoc teamwork in human-robot teams. Our framework comprises two main modules, addressing the two key challenges in the interaction between a robot acting as the ad hoc agent and human teammates. First, a decision-theoretic module that is responsible for all task-related decision-making (task identification, teammate identification, and planning). Second, a communication module that uses natural language processing to parse all communication between the robot and the human. To evaluate our framework, we use a task where a mobile robot and a human cooperatively collect objects in an open space, illustrating the main features of our framework in a real-world task.

  • “Guess what I'm doing”: Extending legibility to sequential decision tasks
    Miguel Faria, Francisco S. Melo, and Ana Paiva

    Elsevier BV

  • Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning


  • Interactively Teaching an Inverse Reinforcement Learner with Limited Feedback
    Rustam Zayanov, Francisco Melo, and Manuel Lopes

    SCITEPRESS - Science and Technology Publications

  • NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks


  • TEAMSTER: Model-based reinforcement learning for ad hoc teamwork
    João G. Ribeiro, Gonçalo Rodrigues, Alberto Sardinha, and Francisco S. Melo

    Elsevier BV

  • Theoretical Remarks on Feudal Hierarchies and Reinforcement Learning
    Diogo S. Carvalho, Francisco S. Melo, and Pedro A. Santos

    IOS Press
    Hierarchical reinforcement learning is an increasingly demanded resource for learning to make sequential decisions towards long term goals. Feudal hierarchies are among the most deployed frameworks. However, there are few theoretical results for hierarchical structures. In this work, we formalize the common two-level feudal hierarchy as two Markov decision processes, with the one on the high level being dependent on the policy executed at the low level. Despite the non-stationarity raised by the dependency, we show that each of the processes presents stable behavior. We then build on the first result to show that, regardless of the convergent learning algorithm used for the low level, convergence of both prediction and control algorithms at the high-level is guaranteed. Our results contribute with theoretical support for the use of feudal hierarchies in combination with standard reinforcement learning methods at each level.

  • Making Friends in the Dark: Ad Hoc Teamwork Under Partial Observability
    João G. Ribeiro, Cassandro Martinho, Alberto Sardinha, and Francisco S. Melo

    IOS Press
    This paper introduces a formal definition of the setting of ad hoc teamwork under partial observability and proposes a first-principled model-based approach which relies only on prior knowledge and partial observations of the environment in order to perform ad hoc teamwork. We make three distinct assumptions that set it apart previous works, namely: i) the state of the environment is always partially observable, ii) the actions of the teammates are always unavailable to the ad hoc agent and iii) the ad hoc agent has no access to a reward signal which could be used to learn the task from scratch. Our results in 70 POMDPs from 11 domains show that our approach is not only effective in assisting unknown teammates in solving unknown tasks but is also robust in scaling to more challenging problems. Supplementary material is available at https://github.com/jmribeiro/adhoc-teamwork-under-partial-observability.

  • Pre-training with Augmentations for Efficient Transfer in Model-Based Reinforcement Learning
    Bernardo Esteves, Miguel Vasco, and Francisco S. Melo

    Springer Nature Switzerland

  • Learning to Perceive in Deep Model-Free Reinforcement Learning


  • Robotic Gaze Responsiveness in Multiparty Teamwork
    Filipa Correia, Joana Campos, Francisco S. Melo, and Ana Paiva

    Springer Science and Business Media LLC

  • “Sequencing Matters”: Investigating Suitable Action Sequences in Robot-Assisted Autism Therapy
    Kim Baraka, Marta Couto, Francisco S. Melo, Ana Paiva, and Manuela Veloso

    Frontiers Media SA
    Social robots have been shown to be promising tools for delivering therapeutic tasks for children with Autism Spectrum Disorder (ASD). However, their efficacy is currently limited by a lack of flexibility of the robot’s social behavior to successfully meet therapeutic and interaction goals. Robot-assisted interventions are often based on structured tasks where the robot sequentially guides the child towards the task goal. Motivated by a need for personalization to accommodate a diverse set of children profiles, this paper investigates the effect of different robot action sequences in structured socially interactive tasks targeting attention skills in children with different ASD profiles. Based on an autism diagnostic tool, we devised a robotic prompting scheme on a NAO humanoid robot, aimed at eliciting goal behaviors from the child, and integrated it in a novel interactive storytelling scenario involving screens. We programmed the robot to operate in three different modes: diagnostic-inspired (Assess), personalized therapy-inspired (Therapy), and random (Explore). Our exploratory study with 11 young children with ASD highlights the usefulness and limitations of each mode according to different possible interaction goals, and paves the way towards more complex methods for balancing short-term and long-term goals in personalized robot-assisted therapy.

  • Leveraging hierarchy in multimodal generative models for effective cross-modality inference
    Miguel Vasco, Hang Yin, Francisco S. Melo, and Ana Paiva

    Elsevier BV
    This work addresses the problem of cross-modality inference (CMI), i.e., inferring missing data of unavailable perceptual modalities (e.g., sound) using data from available perceptual modalities (e.g., image). We overview single-modality variational autoencoder methods and discuss three problems of computational cross-modality inference, arising from recent developments in multimodal generative models. Inspired by neural mechanisms of human recognition, we contribute the Nexus model, a novel hierarchical generative model that can learn a multimodal representation of an arbitrary number of modalities in an unsupervised way. By exploiting hierarchical representation levels, Nexus is able to generate high-quality, coherent data of missing modalities given any subset of available modalities. To evaluate CMI in a natural scenario with a high number of modalities, we contribute the "Multimodal Handwritten Digit" (MHD) dataset, a novel benchmark dataset that combines image, motion, sound and label information from digit handwriting. We access the key role of hierarchy in enabling high-quality samples during cross-modality inference and discuss how a novel training scheme enables Nexus to learn a multimodal representation robust to missing modalities at test time. Our results show that Nexus outperforms current state-of-the-art multimodal generative models in regards to their cross-modality inference capabilities.

  • Geometric Multimodal Contrastive Representation Learning


  • Perceive, Represent, Generate: Translating Multimodal Information to Robotic Motion Trajectories
    Fabio Vital, Miguel Vasco, Alberto Sardinha, and Francisco Melo

    IEEE
    We present Perceive-Represent-Generate (PRG), a novel three-stage framework that maps perceptual information of different modalities (e.g., visual or sound), corresponding to a series of instructions, to a sequence of movements to be executed by a robot. In the first stage, we perceive and preprocess the given inputs, isolating individual commands from the complete instruction provided by a human user. In the second stage we encode the individual commands into a multimodal latent space, employing a deep generative model. Finally, in the third stage we convert the latent samples into individual trajectories and combine them into a single dynamic movement primitive, allowing its execution by a robotic manipulator. We evaluate our pipeline in the context of a novel robotic handwriting task, where the robot receives as input a word through different perceptual modalities (e.g., image, sound), and generates the corresponding motion trajectory to write it, creating coherent and high-quality handwritten words.

  • Preface


  • FIT: Using Feature Importance to Teach Classification Tasks to Unknown Learners
    Carla Guerra, Francisco S. Melo, and Manuel Lopes

    Springer International Publishing

  • Cooperation and Learning Dynamics under Wealth Inequality and Diversity in Individual Risk Perception
    Ramona Merhej, Fernando P. Santos, Francisco S. Melo, and Francisco C. Santos

    AI Access Foundation
    We examine how wealth inequality and diversity in the perception of risk of a collective disaster impact cooperation levels in the context of a public goods game with uncertain and non-linear returns. In this game, individuals face a collective-risk dilemma where they may contribute or not to a common pool to reduce their chances of future losses. We draw our conclusions based on social simulations with populations of independent reinforcement learners with diverse levels of risk and wealth. We find that both wealth inequality and diversity in risk assessment can hinder cooperation and augment collective losses. Additionally, wealth inequality further exacerbates long term inequality, causing rich agents to become richer and poor agents to become poorer. On the other hand, diversity in risk only amplifies inequality when combined with bias in group assortment—i.e., high probability that agents from the same risk class play together. Our results also suggest that taking wealth inequality into account can help to design effective policies aiming at leveraging cooperation in large group sizes, a configuration where collective action is harder to achieve. Finally, we characterize the circumstances under which risk perception alignment is crucial and those under which reducing wealth inequality constitutes a deciding factor for collective welfare.

  • How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents


  • Cooperation and Learning Dynamics under Risk Diversity and Financial Incentives


  • Socially Reactive Navigation Models for Mobile Robots
    Francisco Melo and Plinio Moreno

    IEEE
    This work considers socially acceptable behaviors in traditional reactive navigation systems, allowing a robot to approach a group of humans in a socially acceptable manner by considering the personal space and the group space. In contrast to the fixed parameters of social distancing, this work presents an adaptive model; that is, the parameters of the personal and group space’s cost functions adapt according to the arrangement of the group and space constraints, avoiding the choice of initial parameters. A socially aware navigation system capable of approaching groups is implemented for a general-purpose mobile robot. The adaptive personal and group space algorithm is integrated with the standard navigation system of ROS, representing their information in a costmap layer. The adaptation of spaces is tested using fixed and adaptive parameters for different groups provided by three datasets. The navigation system is evaluated through simulation experiments, demonstrating that the robot is capable of approaching groups and, at the same time, provides a more realistic space modeling adapted to the context.

RECENT SCHOLAR PUBLICATIONS

  • Implicit Repair with Reinforcement Learning in Emergent Communication
    F Vital, A Sardinha, FS Melo
    arXiv preprint arXiv:2502.12624 2025

  • Distributed Value Decomposition Networks with Networked Agents
    GS Varela, A Sardinha, FS Melo
    arXiv preprint arXiv:2502.07635 2025

  • Networked Agents in the Dark: Team Value Learning under Partial Observability
    GS Varela, A Sardinha, FS Melo
    arXiv preprint arXiv:2501.08778 2025

  • The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes
    PP Santos, A Sardinha, FS Melo
    arXiv preprint arXiv:2409.15128 2024

  • A Comparative Study of Continual Backpropagation
    J Silvestrin, FS Melo, M Lopes
    EPIA Conference on Artificial Intelligence, 324-334 2024

  • The impact of data distribution on Q-learning with function approximation
    PP Santos, DS Carvalho, A Sardinha, FS Melo
    Machine Learning 113 (9), 6141-6163 2024

  • When a robot is your teammate
    F Correia, FS Melo, A Paiva
    Topics in Cognitive Science 16 (3), 527-553 2024

  • HOTSPOT: An ad hoc teamwork platform for mixed human-robot teams
    JG Ribeiro, LM Henriques, S Colcher, JC Duarte, FS Melo, RL Milidi, ...
    Plos one 19 (6), e0305705 2024

  • “Guess what I'm doing”: Extending legibility to sequential decision tasks
    M Faria, FS Melo, A Paiva
    Artificial Intelligence 330, 104107 2024

  • TEAMSTER: model-based reinforcement learning for ad hoc teamwork (abstract reprint)
    JG Ribeiro, G Rodrigues, A Sardinha, FS Melo
    Proceedings of the AAAI Conference on Artificial Intelligence 38 (20), 22708 2024

  • NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks
    B Esteves, M Vasco, FS Melo
    arXiv preprint arXiv:2402.15393 2024

  • NeuralThink: Algorithm Synthesis that Extrapolates in General Tasks
    B Esteves, M Vasco, FS Melo
    arXiv e-prints, arXiv: 2402.15393 2024

  • TEAMSTER: Model-based reinforcement learning for ad hoc teamwork
    JG Ribeiro, G Rodrigues, A Sardinha, FS Melo
    Artificial Intelligence 324, 104013 2023

  • Multi-Bellman operator for convergence of -learning with linear function approximation
    DS Carvalho, PA Santos, FS Melo
    arXiv preprint arXiv:2309.16819 2023

  • Interactively Teaching an Inverse Reinforcement Learner with Limited Feedback
    R Zayanov, FS Melo, M Lopes
    arXiv preprint arXiv:2309.09095 2023

  • Pre-training with Augmentations for Efficient Transfer in Model-Based Reinforcement Learning
    B Esteves, M Vasco, FS Melo
    EPIA Conference on Artificial Intelligence, 133-145 2023

  • Learning to Perceive in Deep Model-Free Reinforcement Learning
    G Querido, A Sardinha, FS Melo
    arXiv preprint arXiv:2301.03730 2023

  • Making Friends in the Dark: Ad Hoc Teamwork Under Partial Observability
    JG Ribeiro, C Martinho, A Sardinha, FS Melo
    ECAI 2023, 1954-1961 2023

  • Theoretical remarks on feudal hierarchies and reinforcement learning
    DS Carvalho, FS Melo, PA Santos
    ECAI 2023, 351-356 2023

  • Robotic gaze responsiveness in multiparty teamwork
    F Correia, J Campos, FS Melo, A Paiva
    International Journal of Social Robotics 15 (1), 27-36 2023

MOST CITED SCHOLAR PUBLICATIONS

  • An analysis of reinforcement learning with function approximation
    FS Melo, SP Meyn, MI Ribeiro
    Proceedings of the 25th international conference on Machine learning, 664-671 2008
    Citations: 347

  • Active learning for reward estimation in inverse reinforcement learning
    M Lopes, F Melo, L Montesano
    Joint European conference on machine learning and knowledge discovery in 2009
    Citations: 254

  • Affordance-based imitation learning in robots
    M Lopes, FS Melo, L Montesano
    2007 IEEE/RSJ international conference on intelligent robots and systems 2007
    Citations: 171

  • Q-Learning with Linear Function Approximation
    FS Melo, MI Ribeiro
    International Conference on Computational Learning Theory, 308-322 2007
    Citations: 146

  • Decentralized MDPs with sparse interactions
    FS Melo, M Veloso
    Artificial Intelligence 175 (11), 1757-1789 2011
    Citations: 133

  • Exploring the impact of fault justification in human-robot trust
    F Correia, C Guerra, S Mascarenhas, FS Melo, A Paiva
    Proceedings of the 17th international conference on autonomous agents and 2018
    Citations: 116

  • Learning of coordination: Exploiting sparse interactions in multiagent systems
    FS Melo, M Veloso
    Proceedings of The 8th International Conference on Autonomous Agents and 2009
    Citations: 112

  • Interaction-driven Markov games for decentralized multiagent planning under uncertainty
    MTJ Spaan, FS Melo
    Proceedings of the 7th international joint conference on Autonomous agents 2008
    Citations: 110

  • Empathic robot for group learning: A field study
    P Alves-Oliveira, P Sequeira, FS Melo, G Castellano, A Paiva
    ACM Transactions on Human-Robot Interaction (THRI) 8 (1), 1-34 2019
    Citations: 99

  • Group-based emotions in teams of humans and robots
    F Correia, S Mascarenhas, R Prada, FS Melo, A Paiva
    Proceedings of the 2018 ACM/IEEE international conference on human-robot 2018
    Citations: 98

  • Just follow the suit! trust in human-robot interactions during card game playing
    F Correia, P Alves-Oliveira, N Maia, T Ribeiro, S Petisca, FS Melo, ...
    2016 25th IEEE international symposium on robot and human interactive 2016
    Citations: 69

  • Personalized assistance for dressing users
    SD Klee, BQ Ferreira, R Silva, JP Costeira, FS Melo, M Veloso
    Social Robotics: 7th International Conference, ICSR 2015, Paris, France 2015
    Citations: 69

  • Geometric multimodal contrastive representation learning
    P Poklukar, M Vasco, H Yin, FS Melo, A Paiva, D Kragic
    International Conference on Machine Learning, 17782-17800 2022
    Citations: 64

  • Monte carlo tree search experiments in hearthstone
    A Santos, PA Santos, FS Melo
    2017 IEEE conference on computational intelligence and games (CIG), 272-279 2017
    Citations: 61

  • Emotion-based intrinsic motivation for reinforcement learning agents
    P Sequeira, FS Melo, A Paiva
    Affective Computing and Intelligent Interaction: 4th International 2011
    Citations: 60

  • Project INSIDE: towards autonomous semi-unstructured human–robot social interaction in autism therapy
    FS Melo, A Sardinha, D Belo, M Couto, M Faria, A Farias, H Gamboa, ...
    Artificial intelligence in medicine 96, 198-216 2019
    Citations: 58

  • Abstraction levels for robotic imitation: Overview and computational approaches
    M Lopes, F Melo, L Montesano, J Santos-Victor
    From Motor Learning to Interaction Learning in Robots, 313-355 2010
    Citations: 56

  • An empathic robotic tutor for school classrooms: Considering expectation and satisfaction of children as end-users
    P Alves-Oliveira, T Ribeiro, S Petisca, E Di Tullio, FS Melo, A Paiva
    Social Robotics: 7th International Conference, ICSR 2015, Paris, France 2015
    Citations: 54

  • Exploring prosociality in human-robot teams
    F Correia, SF Mascarenhas, S Gomes, P Arriaga, I Leite, R Prada, ...
    2019 14th ACM/IEEE international conference on human-robot interaction (HRI 2019
    Citations: 52

  • Convergence of Q-learning with linear function approximation
    FS Melo, MI Ribeiro
    2007 European control conference (ECC), 2671-2678 2007
    Citations: 52