Multimodal Human Detection using RGB, Thermal and LiDAR modalities for Robotic Perception Kennedy O. S. Mota, Luís Garrote, Cristiano Premebida IEEE International Conference on Automation Science and Engineering, 2024 People detection is a relevant research topic in artificial perception with wide range of applications from security, surveillance, robotics to autonomous driving. Overcoming challenges in this field involves advanced algorithms, combination of machine learning approaches, as well as the use of sensory data e.g., from cameras and LiDARs. This work addresses the problem of people detection using YOLO, a state-of-the-art object detection method, trained on three distinct data sources LiDAR, RGB (color) and ‘thermal’ (long-wave infra-red) images. The rationale for combining multiple-sensory representation relies on the assumption that each sensor has its own advantages and disadvantages, but together they normally complement each other - specially in real-world conditions. LiDAR contributes to a physically-interpretable mapping of the environment, providing precise information regarding size/dimension and location of the objects, while RGB and thermal provide relevant textural features. The sensors have been calibrated w.r.t. each other thus, allowing the LiDAR’s point-clouds to be projected into the image plane, followed by an up-sampling step, to create dense-depth maps (DM) that enable direct use of the YOLO framework. To support the experiments, a new multi-sensory dataset has been collected using a mobile robot. Besides single-modality models, this paper also explores early and late-fusion strategies. Finally, the new dataset has been made available in a Github repository 1.
Exploiting 3D Grids for Indoor SLAM in Featureless Scenarios Luís Garrote, Ulisses Reverendo, Urbano J. Nunes 2024 IEEE International Conference on Autonomous Robot Systems and Competitions Icarsc 2024, 2024 Accurate multi-sensor localization is a challenging task in the navigation of AMRs. Precise localization strategies are essential for AMRs to be able to perform with safety their missions in their surrounding environments. This work proposes a novel ROS-based modular 3D grid-based particle filter-based framework that can be used for Simultaneous Localization and Mapping (SLAM) or as a standalone robust localization strategy. The framework uses odometry and 3D LiDAR data as inputs for localization and SLAM. To further improve localization and representation alignment, a pose refinement stage is employed using Levenberg-Marquardt minimization. The refinement stage considers keypoints in the environment to improve localization and uses the raw 3D point cloud for map maintenance. A pyramid-like 3D grid resolution is used to aid the refinement of the representation, improving pose estimates in featureless scenarios. Experimental validation was carried out with data acquired using an in-house platform, in a set of indoor and semi-structured scenarios comprised of critical featureless areas. The obtained results highlight the robustness of the proposed framework in both SLAM and localization tasks. The code (ROS package) is made available in a GitHub repository 1.
Multimodal Human Detection Using YOLO and Representation Learning for Robot Perception Kennedy O. S. Mota, Diogo S. De Oliveira, Luís Garrote, Cristiano Premebida 2024 7th Iberian Robotics Conference Robot 2024, 2024 This work concentrates on the problem of multisensor people detection using YOLO trained on four distinct modalities: depth and intensity LiDAR-maps, RGB, and ‘thermal’ images. RGB cameras, ubiquitous in this application domain, offer great resolution but struggle with adverse lighting conditions resulting in overexposed or underexposed images which then impact negatively on the performance of the algorithms. Thermal (long-wave infrared) cameras are more resilient against varying light conditions and provide complementary textural features, although with lower resolution when compared to RGB cameras. LiDAR sensors, while having a significantly low resolution, contribute to a physically interpretable mapping of the environment providing precise information regarding size/dimension and location of the objects. The main goal of this work is to tackle people detection using deep-models trained on single and multi-modality representations. To support the experimental part this work introduces a new multimodal dataset (called MID-3K). MID-3K allows the development of data fusion strategies by leveraging four modalities (obtained from three distinct exteroceptive sensors mounted on a mobile robot). Leveraging on a single-modality YOLO framework, we propose a multimodal representation learning approach to improve the baseline performance and to capture more relevant features across all input modalities. The evaluation of the proposed detection pipeline is conducted on the MID-3K dataset, where the reported results are grounded on state-of-the-art performance measures. The new dataset is available in a GitHub repository1 1MID-3K dataset: https://kennedyk1.github.io/MID-3K/.
Two-Stream Architecture with Contrastive and Self-Supervised Attention Feature Fusion for Error-related Potentials Classification Luís Garrote, João Perdiz, Mine Yasemin, Gabriel Pires, Urbano J. Nunes IEEE International Workshop on Robot and Human Communication Ro Man, 2024 Error-related potentials (ErrPs) extracted from electroencephalographic signals hold potential for application in Brain-Machine Interfaces, in contexts such as robot teleoperation or shared control in assistive platforms. Due to difficulties in signal classification, in part caused by its non-stationary and noisy nature, their use has not been fully realized yet.This work proposes a new approach to ErrP classification based on a two-stream deep learning architecture with three training stages. Its first stage is a self-supervised autoencoder architecture with a multi-head attention layer providing relevant latent features. The second stage comprises a supervised contrastive learning approach considering two backbone networks, where one inherits weights from the first stage and the other is updated by considering the feature embeddings distribution. The final stage comprises supervised classification, where the two backbones are fused and used to classify the input EEG signal. At the end of the three stages, a data-driven two-stream ErrP model is obtained.Twenty-five variants of the proposed approach using the Deep Convolutional Network, Shallow Convolutional Network and EEGNet backbones were tested in an ablation study and benchmarked against a large number of classical classification methods, using data from the BNCI dataset intended to assess cross subject generalization capabilities. The proposed approach obtained the best results overall, highlighting the approach’s capabilities in capturing relevant representations of the EEG signal.
DepthCN: Vehicle detection using 3D-LIDAR and ConvNet Alireza Asvadi, Luis Garrote, Cristiano Premebida, Paulo Peixoto, Urbano J. Nunes IEEE Conference on Intelligent Transportation Systems Proceedings ITSC, 2017
Attention-Based Multimodal Fusion for Robust 6D Pose Estimation in Cluttered Industrial Environments M Abreu, E Borges, J Perdiz, L Garrote, A Mendes, UJ Nunes 2026 IEEE International Conference on Autonomous Robot Systems and … , 2026 2026
Distilling apple DepthPro for RGB-LiDAR depth estimation M Abreu, L Garrote, UJ Nunes Robotics and Autonomous Systems, 105437 , 2026 2026 Citations: 1
Generalization of Machine and Deep Learning Models for Brain-Computer Interfaces Across Sessions and Paradigms in a Completely Locked-In Patient L Garrote, R Bettencourt, J Perdiz, G Pires, UJ Nunes 2025 34th IEEE International Conference on Robot and Human Interactive … , 2025 2025
Multimodal Human Detection Using YOLO and Representation Learning for Robot Perception KOS Mota, D S. de Oliveira, L Garrote, C Premebida 7th Iberian Robotics Conference (ROBOT2024) , 2024 2024 Citations: 2
Multimodal 6D Detection of Industrial Pallets, in Real and Virtual Environments, with Applications in Industrial AMRs J Lourenço, G Arsénio, L Garrote, UJ Nunes Proceedings of the 21st International Conference on Informatics in Control … , 2024 2024
A Modular Multimodal Multi-Object Tracking-by-Detection Approach, with Applications in Outdoor and Indoor Environments E Borges, L Garrote, UJ Nunes Proceedings of the 21st International Conference on Informatics in Control … , 2024 2024
Pointnetpgap-slc: A 3d lidar-based place recognition approach with segment-level consistency training for mobile robots in horticulture T Barros, L Garrote, P Conde, MJ Coombes, C Liu, C Premebida, ... IEEE Robotics and Automation Letters 9 (11), 10471-10478 , 2024 2024 Citations: 9
Multimodal human detection using RGB, thermal and LiDAR modalities for robotic perception KOS Mota, L Garrote, C Premebida 2024 IEEE 20th International Conference on Automation Science and … , 2024 2024 Citations: 1
Two-Stream Architecture with Contrastive and Self-Supervised Attention Feature Fusion for Error-related Potentials Classification L Garrote, J Perdiz, M Yasemin, G Pires, UJ Nunes 2024 33rd IEEE International Conference on Robot and Human Interactive … , 2024 2024 Citations: 2
Exploiting 3d grids for indoor slam in featureless scenarios L Garrote, U Reverendo, UJ Nunes 2024 IEEE International Conference on Autonomous Robot Systems and … , 2024 2024 Citations: 3
2024 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC) C Santos, E Pedrosa, JL Lima, L Garrote, L Louro, P Fonseca, S Paiva, ... 2024 Citations: 1
Exploiting object-based and segmentation-based semantic features for deep learning-based indoor scene classification R Pereira, L Garrote, T Barros, A Lopes, UJ Nunes arXiv preprint arXiv:2404.07739 , 2024 2024 Citations: 4
A deep learning-based global and segmentation-based semantic feature fusion approach for indoor scene classification R Pereira, T Barros, L Garrote, A Lopes, UJ Nunes Pattern Recognition Letters 179, 24-30 , 2024 2024 Citations: 27
DeepRL-Based Robot Local Motion Planning in Unknown Dynamic Indoor Environments G Gonçalves, D Palaio, L Garrote, UJ Nunes Robot 2023: Sixth Iberian Robotics Conference: Advances in Robotics, Volume … , 2024 2024
TReR: A lightweight transformer re-ranking approach for 3D LiDAR place recognition T Barros, L Garrote, M Aleksandrov, C Premebida, UJ Nunes 2023 IEEE 26th International Conference on Intelligent Transportation … , 2023 2023 Citations: 6
Late-fusion multimodal human detection based on rgb and thermal images for robotic perception E Sousa, KOS Mota, IP Gomes, L Garrote, DF Wolf, C Premebida 2023 European Conference on Mobile Robots (ECMR), 1-6 , 2023 2023 Citations: 11
Costmap-based local motion planning using deep reinforcement learning L Garrote, J Perdiz, UJ Nunes 2023 32nd IEEE International Conference on Robot and Human Interactive … , 2023 2023 Citations: 3
Orchnet: A robust global feature aggregation approach for 3d lidar-based place recognition in orchards T Barros, L Garrote, P Conde, MJ Coombes, C Liu, C Premebida, ... arXiv preprint arXiv:2303.00477 , 2023 2023 Citations: 3
Attdlnet: Attention-based deep network for 3d lidar place recognition T Barros, L Garrote, R Pereira, C Premebida, UJ Nunes Iberian Robotics conference, 309-320 , 2022 2022 Citations: 36
Dynamic environment-based visual user interface system for intuitive navigation target selection for brain-actuated wheelchairs R Pereira, A Cruz, L Garrote, G Pires, A Lopes, UJ Nunes 2022 31st IEEE International Conference on Robot and Human Interactive … , 2022 2022 Citations: 7
MOST CITED SCHOLAR PUBLICATIONS
Multimodal vehicle detection: fusing 3D-LIDAR and color camera data A Asvadi, L Garrote, C Premebida, P Peixoto, UJ Nunes Pattern Recognition Letters 115, 20-29 , 2018 2018 Citations: 305
Sort and deep-sort based multi-object tracking for mobile robotics: Evaluation with new data association metrics R Pereira, G Carvalho, L Garrote, UJ Nunes Applied Sciences 12 (3), 1319 , 2022 2022 Citations: 160
DepthCN: Vehicle detection using 3D-LIDAR and ConvNet A Asvadi, L Garrote, C Premebida, P Peixoto, UJ Nunes 2017 IEEE 20th international conference on intelligent transportation … , 2017 2017 Citations: 142
High-resolution lidar-based depth mapping using bilateral filter C Premebida, L Garrote, A Asvadi, AP Ribeiro, U Nunes 2016 IEEE 19th international conference on intelligent transportation … , 2016 2016 Citations: 92
Attdlnet: Attention-based deep network for 3d lidar place recognition T Barros, L Garrote, R Pereira, C Premebida, UJ Nunes Iberian Robotics conference, 309-320 , 2022 2022 Citations: 36
Autonomous electric vehicle: Steering and path-following control systems M Silva, L Garrote, F Moita, M Martins, U Nunes 2012 16th IEEE Mediterranean electrotechnical conference, 442-445 , 2012 2012 Citations: 35
Place recognition survey: An update on deep learning approaches T Barros, R Pereira, L Garrote, C Premebida, UJ Nunes arXiv preprint arXiv:2106.10458 , 2021 2021 Citations: 32
Real-time deep convnet-based vehicle detection using 3d-lidar reflection intensity data A Asvadi, L Garrote, C Premebida, P Peixoto, UJ Nunes Iberian Robotics conference, 475-486 , 2017 2017 Citations: 30
An RRT-based navigation approach for mobile robots and automated vehicles L Garrote, C Premebida, M Silva, U Nunes 2014 12th IEEE International Conference on Industrial Informatics (INDIN … , 2014 2014 Citations: 29
A deep learning-based global and segmentation-based semantic feature fusion approach for indoor scene classification R Pereira, T Barros, L Garrote, A Lopes, UJ Nunes Pattern Recognition Letters 179, 24-30 , 2024 2024 Citations: 27
Test and evaluation of connected and autonomous vehicles in real-world scenarios J Pereira, C Premebida, A Asvadi, F Cannata, L Garrote, UJ Nunes 2019 IEEE Intelligent Vehicles Symposium (IV), 14-19 , 2019 2019 Citations: 24
3D point cloud downsampling for 2D indoor scene modelling in mobile robotics L Garrote, J Rosa, J Paulo, C Premebida, P Peixoto, UJ Nunes 2017 IEEE international conference on autonomous robot systems and … , 2017 2017 Citations: 24
Modular software architecture for human-robot interaction applied to the InterBot mobile robot R Cruz, L Garrote, A Lopes, UJ Nunes 2018 IEEE International Conference on Autonomous Robot Systems and … , 2018 2018 Citations: 22
Deep-learning based global and semantic feature fusion for indoor scene classification R Pereira, N Gonçalves, L Garrote, T Barros, A Lopes, UJ Nunes 2020 IEEE international conference on autonomous robot systems and … , 2020 2020 Citations: 20
Mobile robot localization with reinforcement learning map update decision aided by an absolute indoor positioning system L Garrote, M Torres, T Barros, J Perdiz, C Premebida, UJ Nunes 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems … , 2019 2019 Citations: 18
Robot-assisted navigation for a robotic walker with aided user intent L Garrote, J Paulo, J Perdiz, P Peixoto, UJ Nunes 2018 27th IEEE international symposium on robot and human interactive … , 2018 2018 Citations: 18
A Deep Learning-based Indoor Scene Classification Approach Enhanced with Inter-Object Distance Semantic Features R Pereira, L Garrote, T Barros, A Lopes, UJ Nunes 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems … , 2021 2021 Citations: 17
Reinforcement learning aided robot-assisted navigation: A utility and RRT two-stage approach L Garrote, J Paulo, UJ Nunes International Journal of Social Robotics 12 (3), 689-707 , 2020 2020 Citations: 17
A reinforcement learning assisted eye-driven computer game employing a decision tree-based approach and CNN classification J Perdiz, L Garrote, G Pires, UJ Nunes IEEE Access 9, 46011-46021 , 2021 2021 Citations: 15
Absolute indoor positioning-aided laser-based particle filter localization with a refinement stage L Garrote, T Barros, R Pereira, UJ Nunes IECON 2019-45th Annual Conference of the IEEE Industrial Electronics Society … , 2019 2019 Citations: 14