Francisco S Melo

@inesc-id.pt

Associate Professor, Department of Computer Science
INESC-ID and Instituto Superior Técnico, University of Lisbon

https://researchid.co/fmelo

RESEARCH INTERESTS

Artificial Intelligence; Machine Learning; Reinforcement Learning

143

Scopus Publications

3693

Scholar Citations

Scholar h-index

Scholar i10-index

Scopus Publications

A Comparative Study of Continual Backpropagation
Jacopo Silvestrin, Francisco S. Melo, and Manuel Lopes
Springer Nature Switzerland

Preface
Steven Davy and Danyal Aftab
IOS Press
Large language models demonstrate impressive proficiency in language understanding and generation. Nonetheless, training these models from scratch, even the least complex billion-parameter variant demands significant computational resources rendering it economically impractical for many organizations. With large language models functioning as general-purpose task solvers, this paper investigates their task-specific fine-tuning. We employ task-specific datasets and prompts to fine-tune two pruned LLaMA models having 5 billion and 4 billion parameters. This process utilizes the pre-trained weights and focuses on a subset of weights using the LoRA method. One challenge in fine-tuning the LLaMA model is crafting a precise prompt tailored to the specific task. To address this, we propose a novel approach to fine-tune the LLaMA model under two primary constraints: task specificity and prompt effectiveness. Our approach, Tailored LLaMA initially employs structural pruning to reduce the model sizes from 7B to 5B and 4B parameters. Subsequently, it applies a carefully designed prompt specific to the task and utilizes the LoRA method to accelerate the fine-tuning process. Moreover, fine-tuning a model pruned by 50\\% for less than one hour restores the mean accuracy of classification tasks to 95.68\\% at a 20\\% compression ratio and to 86.54\\% at a 50\\% compression ratio through few-shot learning with 50 shots. Our validation of Tailored LLaMA on these two pruned variants demonstrates that even when compressed to 50\\%, the models maintain over 65\\% of the baseline model accuracy in few-shot classification and generation tasks. These findings highlight the efficacy of our tailored approach in maintaining high performance with significantly reduced model sizes.

The impact of data distribution on Q-learning with function approximation
Pedro P. Santos, Diogo S. Carvalho, Alberto Sardinha, and Francisco S. Melo
Springer Science and Business Media LLC
AbstractWe study the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a unified theoretical and empirical analysis as to how different properties of the data distribution influence the performance of Q-learning-based algorithms. We connect different lines of research, as well as validate and extend previous results, being primarily focused on offline settings. First, we analyze the impact of the data distribution by using optimization as a tool to better understand which data distributions yield low concentrability coefficients. We motivate high-entropy distributions from a game-theoretical point of view and propose an algorithm to find the optimal data distribution from the point of view of concentrability. Second, from an empirical perspective, we introduce a novel four-state MDP specifically tailored to highlight the impact of the data distribution in the performance of Q-learning-based algorithms with function approximation. Finally, we experimentally assess the impact of the data distribution properties on the performance of two offline Q-learning-based algorithms under different environments. Our results attest to the importance of different properties of the data distribution such as entropy, coverage, and data quality (closeness to optimal policy).

When a Robot Is Your Teammate
Filipa Correia, Francisco S. Melo, and Ana Paiva
Wiley
Creating effective teamwork between humans and robots involves not only addressing their performance as a team but also sustaining the quality and sense of unity among teammates, also known as cohesion. This paper explores the research problem of: how can we endow robotic teammates with social capabilities to improve the cohesive alliance with humans? By defining the concept of a human-robot cohesive alliance in the light of the multidimensional construct of cohesion from the social sciences, we propose to address this problem through the idea of multifaceted human-robot cohesion. We present our preliminary effort from previous works to examine each of the five dimensions of cohesion: social, collective, emotional, structural, and task. We finish the paper with a discussion on how human-robot cohesion contributes to the key questions and ongoing challenges of creating robotic teammates. Overall, cohesion in human-robot teams might be a key factor to propel team performance and it should be considered in the design, development, and evaluation of robotic teammates.

HOTSPOT: An ad hoc teamwork platform for mixed human-robot teams
João G. Ribeiro, Luis Müller Henriques, Sérgio Colcher, Julio Cesar Duarte, Francisco S. Melo, Ruy Luiz Milidiú, and Alberto Sardinha
Public Library of Science (PLoS)
Ad hoc teamwork is a research topic in multi-agent systems whereby an agent (the “ad hoc agent”) must successfully collaborate with a set of unknown agents (the “teammates”) without any prior coordination or communication protocol. However, research in ad hoc teamwork is predominantly focused on agent-only teams, but not on agent-human teams, which we believe is an exciting research avenue and has enormous application potential in human-robot teams. This paper will tap into this potential by proposing HOTSPOT, the first framework for ad hoc teamwork in human-robot teams. Our framework comprises two main modules, addressing the two key challenges in the interaction between a robot acting as the ad hoc agent and human teammates. First, a decision-theoretic module that is responsible for all task-related decision-making (task identification, teammate identification, and planning). Second, a communication module that uses natural language processing to parse all communication between the robot and the human. To evaluate our framework, we use a task where a mobile robot and a human cooperatively collect objects in an open space, illustrating the main features of our framework in a real-world task.

“Guess what I'm doing”: Extending legibility to sequential decision tasks
Miguel Faria, Francisco S. Melo, and Ana Paiva
Elsevier BV

Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning

Interactively Teaching an Inverse Reinforcement Learner with Limited Feedback
Rustam Zayanov, Francisco Melo, and Manuel Lopes
SCITEPRESS - Science and Technology Publications

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks

TEAMSTER: Model-based reinforcement learning for ad hoc teamwork
João G. Ribeiro, Gonçalo Rodrigues, Alberto Sardinha, and Francisco S. Melo
Elsevier BV

Theoretical Remarks on Feudal Hierarchies and Reinforcement Learning
Diogo S. Carvalho, Francisco S. Melo, and Pedro A. Santos
IOS Press
Hierarchical reinforcement learning is an increasingly demanded resource for learning to make sequential decisions towards long term goals. Feudal hierarchies are among the most deployed frameworks. However, there are few theoretical results for hierarchical structures. In this work, we formalize the common two-level feudal hierarchy as two Markov decision processes, with the one on the high level being dependent on the policy executed at the low level. Despite the non-stationarity raised by the dependency, we show that each of the processes presents stable behavior. We then build on the first result to show that, regardless of the convergent learning algorithm used for the low level, convergence of both prediction and control algorithms at the high-level is guaranteed. Our results contribute with theoretical support for the use of feudal hierarchies in combination with standard reinforcement learning methods at each level.

Making Friends in the Dark: Ad Hoc Teamwork Under Partial Observability
João G. Ribeiro, Cassandro Martinho, Alberto Sardinha, and Francisco S. Melo
IOS Press
This paper introduces a formal definition of the setting of ad hoc teamwork under partial observability and proposes a first-principled model-based approach which relies only on prior knowledge and partial observations of the environment in order to perform ad hoc teamwork. We make three distinct assumptions that set it apart previous works, namely: i) the state of the environment is always partially observable, ii) the actions of the teammates are always unavailable to the ad hoc agent and iii) the ad hoc agent has no access to a reward signal which could be used to learn the task from scratch. Our results in 70 POMDPs from 11 domains show that our approach is not only effective in assisting unknown teammates in solving unknown tasks but is also robust in scaling to more challenging problems. Supplementary material is available at https://github.com/jmribeiro/adhoc-teamwork-under-partial-observability.

Pre-training with Augmentations for Efficient Transfer in Model-Based Reinforcement Learning
Bernardo Esteves, Miguel Vasco, and Francisco S. Melo
Springer Nature Switzerland

Learning to Perceive in Deep Model-Free Reinforcement Learning

Robotic Gaze Responsiveness in Multiparty Teamwork
Filipa Correia, Joana Campos, Francisco S. Melo, and Ana Paiva
Springer Science and Business Media LLC

“Sequencing Matters”: Investigating Suitable Action Sequences in Robot-Assisted Autism Therapy
Kim Baraka, Marta Couto, Francisco S. Melo, Ana Paiva, and Manuela Veloso
Frontiers Media SA
Social robots have been shown to be promising tools for delivering therapeutic tasks for children with Autism Spectrum Disorder (ASD). However, their efficacy is currently limited by a lack of flexibility of the robot’s social behavior to successfully meet therapeutic and interaction goals. Robot-assisted interventions are often based on structured tasks where the robot sequentially guides the child towards the task goal. Motivated by a need for personalization to accommodate a diverse set of children profiles, this paper investigates the effect of different robot action sequences in structured socially interactive tasks targeting attention skills in children with different ASD profiles. Based on an autism diagnostic tool, we devised a robotic prompting scheme on a NAO humanoid robot, aimed at eliciting goal behaviors from the child, and integrated it in a novel interactive storytelling scenario involving screens. We programmed the robot to operate in three different modes: diagnostic-inspired (Assess), personalized therapy-inspired (Therapy), and random (Explore). Our exploratory study with 11 young children with ASD highlights the usefulness and limitations of each mode according to different possible interaction goals, and paves the way towards more complex methods for balancing short-term and long-term goals in personalized robot-assisted therapy.

Leveraging hierarchy in multimodal generative models for effective cross-modality inference
Miguel Vasco, Hang Yin, Francisco S. Melo, and Ana Paiva
Elsevier BV
This work addresses the problem of cross-modality inference (CMI), i.e., inferring missing data of unavailable perceptual modalities (e.g., sound) using data from available perceptual modalities (e.g., image). We overview single-modality variational autoencoder methods and discuss three problems of computational cross-modality inference, arising from recent developments in multimodal generative models. Inspired by neural mechanisms of human recognition, we contribute the Nexus model, a novel hierarchical generative model that can learn a multimodal representation of an arbitrary number of modalities in an unsupervised way. By exploiting hierarchical representation levels, Nexus is able to generate high-quality, coherent data of missing modalities given any subset of available modalities. To evaluate CMI in a natural scenario with a high number of modalities, we contribute the "Multimodal Handwritten Digit" (MHD) dataset, a novel benchmark dataset that combines image, motion, sound and label information from digit handwriting. We access the key role of hierarchy in enabling high-quality samples during cross-modality inference and discuss how a novel training scheme enables Nexus to learn a multimodal representation robust to missing modalities at test time. Our results show that Nexus outperforms current state-of-the-art multimodal generative models in regards to their cross-modality inference capabilities.

Geometric Multimodal Contrastive Representation Learning

Perceive, Represent, Generate: Translating Multimodal Information to Robotic Motion Trajectories
Fabio Vital, Miguel Vasco, Alberto Sardinha, and Francisco Melo
IEEE
We present Perceive-Represent-Generate (PRG), a novel three-stage framework that maps perceptual information of different modalities (e.g., visual or sound), corresponding to a series of instructions, to a sequence of movements to be executed by a robot. In the first stage, we perceive and preprocess the given inputs, isolating individual commands from the complete instruction provided by a human user. In the second stage we encode the individual commands into a multimodal latent space, employing a deep generative model. Finally, in the third stage we convert the latent samples into individual trajectories and combine them into a single dynamic movement primitive, allowing its execution by a robotic manipulator. We evaluate our pipeline in the context of a novel robotic handwriting task, where the robot receives as input a word through different perceptual modalities (e.g., image, sound), and generates the corresponding motion trajectory to write it, creating coherent and high-quality handwritten words.

Preface

FIT: Using Feature Importance to Teach Classification Tasks to Unknown Learners
Carla Guerra, Francisco S. Melo, and Manuel Lopes
Springer International Publishing

Cooperation and Learning Dynamics under Wealth Inequality and Diversity in Individual Risk Perception
Ramona Merhej, Fernando P. Santos, Francisco S. Melo, and Francisco C. Santos
AI Access Foundation
We examine how wealth inequality and diversity in the perception of risk of a collective disaster impact cooperation levels in the context of a public goods game with uncertain and non-linear returns. In this game, individuals face a collective-risk dilemma where they may contribute or not to a common pool to reduce their chances of future losses. We draw our conclusions based on social simulations with populations of independent reinforcement learners with diverse levels of risk and wealth. We find that both wealth inequality and diversity in risk assessment can hinder cooperation and augment collective losses. Additionally, wealth inequality further exacerbates long term inequality, causing rich agents to become richer and poor agents to become poorer. On the other hand, diversity in risk only amplifies inequality when combined with bias in group assortment—i.e., high probability that agents from the same risk class play together. Our results also suggest that taking wealth inequality into account can help to design effective policies aiming at leveraging cooperation in large group sizes, a configuration where collective action is harder to achieve. Finally, we characterize the circumstances under which risk perception alignment is crucial and those under which reducing wealth inequality constitutes a deciding factor for collective welfare.

How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents

Cooperation and Learning Dynamics under Risk Diversity and Financial Incentives

Socially Reactive Navigation Models for Mobile Robots
Francisco Melo and Plinio Moreno
IEEE
This work considers socially acceptable behaviors in traditional reactive navigation systems, allowing a robot to approach a group of humans in a socially acceptable manner by considering the personal space and the group space. In contrast to the fixed parameters of social distancing, this work presents an adaptive model; that is, the parameters of the personal and group space’s cost functions adapt according to the arrangement of the group and space constraints, avoiding the choice of initial parameters. A socially aware navigation system capable of approaching groups is implemented for a general-purpose mobile robot. The adaptive personal and group space algorithm is integrated with the standard navigation system of ROS, representing their information in a costmap layer. The adaptation of spaces is tested using fixed and adaptive parameters for different groups provided by three datasets. The navigation system is evaluated through simulation experiments, demonstrating that the robot is capable of approaching groups and, at the same time, provides a more realistic space modeling adapted to the context.

RECENT SCHOLAR PUBLICATIONS

Implicit Repair with Reinforcement Learning in Emergent Communication
F Vital, A Sardinha, FS Melo
arXiv preprint arXiv:2502.12624 2025

Distributed Value Decomposition Networks with Networked Agents
GS Varela, A Sardinha, FS Melo
arXiv preprint arXiv:2502.07635 2025

Networked Agents in the Dark: Team Value Learning under Partial Observability
GS Varela, A Sardinha, FS Melo
arXiv preprint arXiv:2501.08778 2025

The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes
PP Santos, A Sardinha, FS Melo
arXiv preprint arXiv:2409.15128 2024

A Comparative Study of Continual Backpropagation
J Silvestrin, FS Melo, M Lopes
EPIA Conference on Artificial Intelligence, 324-334 2024

The impact of data distribution on Q-learning with function approximation
PP Santos, DS Carvalho, A Sardinha, FS Melo
Machine Learning 113 (9), 6141-6163 2024

When a robot is your teammate
F Correia, FS Melo, A Paiva
Topics in Cognitive Science 16 (3), 527-553 2024

HOTSPOT: An ad hoc teamwork platform for mixed human-robot teams
JG Ribeiro, LM Henriques, S Colcher, JC Duarte, FS Melo, RL Milidi, ...
Plos one 19 (6), e0305705 2024

“Guess what I'm doing”: Extending legibility to sequential decision tasks
M Faria, FS Melo, A Paiva
Artificial Intelligence 330, 104107 2024

TEAMSTER: model-based reinforcement learning for ad hoc teamwork (abstract reprint)
JG Ribeiro, G Rodrigues, A Sardinha, FS Melo
Proceedings of the AAAI Conference on Artificial Intelligence 38 (20), 22708 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks
B Esteves, M Vasco, FS Melo
arXiv preprint arXiv:2402.15393 2024

NeuralThink: Algorithm Synthesis that Extrapolates in General Tasks
B Esteves, M Vasco, FS Melo
arXiv e-prints, arXiv: 2402.15393 2024

TEAMSTER: Model-based reinforcement learning for ad hoc teamwork
JG Ribeiro, G Rodrigues, A Sardinha, FS Melo
Artificial Intelligence 324, 104013 2023

Multi-Bellman operator for convergence of -learning with linear function approximation
DS Carvalho, PA Santos, FS Melo
arXiv preprint arXiv:2309.16819 2023

Interactively Teaching an Inverse Reinforcement Learner with Limited Feedback
R Zayanov, FS Melo, M Lopes
arXiv preprint arXiv:2309.09095 2023

Pre-training with Augmentations for Efficient Transfer in Model-Based Reinforcement Learning
B Esteves, M Vasco, FS Melo
EPIA Conference on Artificial Intelligence, 133-145 2023

Learning to Perceive in Deep Model-Free Reinforcement Learning
G Querido, A Sardinha, FS Melo
arXiv preprint arXiv:2301.03730 2023

Making Friends in the Dark: Ad Hoc Teamwork Under Partial Observability
JG Ribeiro, C Martinho, A Sardinha, FS Melo
ECAI 2023, 1954-1961 2023

Theoretical remarks on feudal hierarchies and reinforcement learning
DS Carvalho, FS Melo, PA Santos
ECAI 2023, 351-356 2023

Robotic gaze responsiveness in multiparty teamwork
F Correia, J Campos, FS Melo, A Paiva
International Journal of Social Robotics 15 (1), 27-36 2023

MOST CITED SCHOLAR PUBLICATIONS

An analysis of reinforcement learning with function approximation
FS Melo, SP Meyn, MI Ribeiro
Proceedings of the 25th international conference on Machine learning, 664-671 2008
Citations: 347

Active learning for reward estimation in inverse reinforcement learning
M Lopes, F Melo, L Montesano
Joint European conference on machine learning and knowledge discovery in 2009
Citations: 254

Affordance-based imitation learning in robots
M Lopes, FS Melo, L Montesano
2007 IEEE/RSJ international conference on intelligent robots and systems 2007
Citations: 171

Q-Learning with Linear Function Approximation
FS Melo, MI Ribeiro
International Conference on Computational Learning Theory, 308-322 2007
Citations: 146

Decentralized MDPs with sparse interactions
FS Melo, M Veloso
Artificial Intelligence 175 (11), 1757-1789 2011
Citations: 133

Exploring the impact of fault justification in human-robot trust
F Correia, C Guerra, S Mascarenhas, FS Melo, A Paiva
Proceedings of the 17th international conference on autonomous agents and 2018
Citations: 116

Learning of coordination: Exploiting sparse interactions in multiagent systems
FS Melo, M Veloso
Proceedings of The 8th International Conference on Autonomous Agents and 2009
Citations: 112

Interaction-driven Markov games for decentralized multiagent planning under uncertainty
MTJ Spaan, FS Melo
Proceedings of the 7th international joint conference on Autonomous agents 2008
Citations: 110

Empathic robot for group learning: A field study
P Alves-Oliveira, P Sequeira, FS Melo, G Castellano, A Paiva
ACM Transactions on Human-Robot Interaction (THRI) 8 (1), 1-34 2019
Citations: 99

Group-based emotions in teams of humans and robots
F Correia, S Mascarenhas, R Prada, FS Melo, A Paiva
Proceedings of the 2018 ACM/IEEE international conference on human-robot 2018
Citations: 98

Just follow the suit! trust in human-robot interactions during card game playing
F Correia, P Alves-Oliveira, N Maia, T Ribeiro, S Petisca, FS Melo, ...
2016 25th IEEE international symposium on robot and human interactive 2016
Citations: 69

Personalized assistance for dressing users
SD Klee, BQ Ferreira, R Silva, JP Costeira, FS Melo, M Veloso
Social Robotics: 7th International Conference, ICSR 2015, Paris, France 2015
Citations: 69

Geometric multimodal contrastive representation learning
P Poklukar, M Vasco, H Yin, FS Melo, A Paiva, D Kragic
International Conference on Machine Learning, 17782-17800 2022
Citations: 64

Monte carlo tree search experiments in hearthstone
A Santos, PA Santos, FS Melo
2017 IEEE conference on computational intelligence and games (CIG), 272-279 2017
Citations: 61

Emotion-based intrinsic motivation for reinforcement learning agents
P Sequeira, FS Melo, A Paiva
Affective Computing and Intelligent Interaction: 4th International 2011
Citations: 60

Project INSIDE: towards autonomous semi-unstructured human–robot social interaction in autism therapy
FS Melo, A Sardinha, D Belo, M Couto, M Faria, A Farias, H Gamboa, ...
Artificial intelligence in medicine 96, 198-216 2019
Citations: 58

Abstraction levels for robotic imitation: Overview and computational approaches
M Lopes, F Melo, L Montesano, J Santos-Victor
From Motor Learning to Interaction Learning in Robots, 313-355 2010
Citations: 56

An empathic robotic tutor for school classrooms: Considering expectation and satisfaction of children as end-users
P Alves-Oliveira, T Ribeiro, S Petisca, E Di Tullio, FS Melo, A Paiva
Social Robotics: 7th International Conference, ICSR 2015, Paris, France 2015
Citations: 54

Exploring prosociality in human-robot teams
F Correia, SF Mascarenhas, S Gomes, P Arriaga, I Leite, R Prada, ...
2019 14th ACM/IEEE international conference on human-robot interaction (HRI 2019
Citations: 52

Convergence of Q-learning with linear function approximation
FS Melo, MI Ribeiro
2007 European control conference (ECC), 2671-2678 2007
Citations: 52