Artificial Intelligence, Computer Vision and Pattern Recognition, Animal Science and Zoology
18
Scopus Publications
400
Scholar Citations
9
Scholar h-index
8
Scholar i10-index
Scopus Publications
CE-VAE: Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement Rita Pucci, Niki Martinel Proceedings 2025 IEEE Winter Conference on Applications of Computer Vision Wacv 2025, 2025 Unmanned underwater image analysis for marine monitoring faces two key challenges: (i) degraded image quality due to light attenuation and (ii) hardware storage constraints limiting high-resolution image collection. Existing methods primarily address image enhancement with approaches that hinge on storing the full-size input. In contrast, we introduce the Capsule Enhanced Variational AutoEncoder (CE-VAE), a novel architecture designed to efficiently compress and enhance degraded underwater images. Our attention-aware image encoder can project the input image onto a latent space representation while being able to run online on a remote device. The only information that needs to be stored on the device or sent to a beacon is a compressed representation. There is a dual-decoder module that performs offline, full-size enhanced image generation. One branch reconstructs spatial details from the compressed latent space, while the second branch utilizes a capsule-clustering layer to capture entity-level structures and complex spatial relationships. This parallel decoding strategy enables the model to balance fine-detail preservation with context-aware enhancements. CE- VAE achieves state-of-the-art performance in underwater image enhancement on six benchmark datasets, providing up to 3 × higher compression efficiency than existing approaches. Code available at https://github.com/iN1k1/ce-vae-underwater-image-enhancement.
Performance of Computer Vision Algorithms for Fine-Grained Classification Using Crowdsourced Insect Images Rita Pucci, Vincent J. Kalkman, Dan Stowell Iet Computer Vision, 2025 With fine‐grained classification, we identify unique characteristics to distinguish among classes of the same super‐class. We are focusing on species recognition in Insecta as they are critical for biodiversity monitoring and at the base of many ecosystems. With citizen science campaigns, billions of images are collected in the wild. Once these are labelled, experts can use them to create distribution maps. However, the labelling process is time consuming, which is where computer vision comes in. The field of computer vision offers a wide range of algorithms, each with its strengths and weaknesses; how do we identify the algorithm that is in line with our application? To answer this question, we provide a full and detailed evaluation of nine algorithms among deep convolutional networks (CNN), vision transformers (ViT) and locality‐based vision transformers (LBVT) on 4 different aspects: classification performance, embedding quality, computational cost and gradient activity. We offer insights that we have not yet had in this domain proving to which extent these algorithms solve the fine‐grained tasks in Insecta. We found that ViT performs the best on inference speed and computational cost, whereas LBVT outperforms the others on performance and embedding quality; the CNN provide a trade‐off among the metrics.
Pro-CCaps: Progressively Teaching Colourisation to Capsules Rita Pucci, Christian Micheloni, Gian Luca Foresti, Niki Martinel Proceedings 2022 IEEE Cvf Winter Conference on Applications of Computer Vision Wacv 2022, 2022 Automatic image colourisation studies how to colourise greyscale images. Existing approaches exploit convolutional layers that extract image-level features learning the colourisation on the entire image, but miss entities-level ones due to pooling strategies. We believe that entity-level features are of paramount importance to deal with the intrinsic multimodality of the problem (i.e., the same object can have different colours, and the same colour can have different properties). Models based on capsule layers aim to identify entity-level features in the image from different points of view, but they do not keep track of global features.Our network architecture integrates entity-level features into the image-level features to generate a plausible image colourisation. We observed that results obtained with direct integration of such two representations are largely dominated by the image-level features, thus resulting in unsaturated colours for the entities. To limit such an issue, we propose a gradual growth of the reconstruction phase of the model while training. By advantaging of prior knowledge from each growing step, we obtain a stable collaboration between image-level and entity-level features that ultimately generates stable and vibrant colourisations. Experimental results on three benchmark datasets, and a user study, demonstrate that our approach has competitive performance with respect to the state-of-the-art and provides more consistent colourisation.
Lord of the Rings: Hanoi Pooling and Self-Knowledge Distillation for Fast and Accurate Vehicle Reidentification Niki Martinel, Matteo Dunnhofer, Rita Pucci, Gian Luca Foresti, Christian Micheloni IEEE Transactions on Industrial Informatics, 2022 Vehicle reidentification has seen increasing interest, thanks to its fundamental impact on intelligent surveillance systems and smart transportation. The visual data acquired from monitoring camera networks come with severe challenges, including occlusions, color and illumination changes, as well as orientation issues (a vehicle can be seen from the side/front/rear due to different camera viewpoints). To deal with such challenges, the community has spent much effort in learning robust feature representations that hinge on additional visual attributes and part-driven methods, but with the side effects of requiring extensive human annotation labor as well as increasing computational complexity. In this article, we propose an approach that learns a feature representation robust to vehicle orientation issues without the need for extra-labeled data and adding negligible computational overheads. The former objective is achieved through the introduction of a Hanoi pooling layer exploiting ring regions and the image pyramid approach yielding a multiscale representation of vehicle appearance. The latter is tackled by transferring the accuracy of a deep network to its first layers, thus reducing the inference effort by the early stop of a test example. This is obtained by means of a self-knowledge distillation framework encouraging multiexit network decisions to agree with each other. Results demonstrate that the proposed approach significantly improves the accuracy of early (i.e., very fast) exits while maintaining the same accuracy of a deep (slow) baseline. Moreover, our solution obtains the best existing performance on three benchmark datasets. 11[Online]. Available: https://github.com/iN1k1/.
Collaborative image and object level features for image colourisation Rita Pucci, Christian Micheloni, Niki Martinel IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2021 Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum. Previous approaches attacked the problem either by requiring intense user-interactions or by exploiting the ability of convolutional neural networks (CNNs) in learning image-level (context) features. However, obtaining human hints is not always feasible and CNNs alone are not able to learn entity-level semantics, unless multiple models pre-trained with supervision are considered. In this work, we propose a single network, named UCapsNet, that takes into consideration the image-level features obtained through convolutions and entity-level features captured by means of capsules. Then, by skip connections over different layers, we enforce collaboration between such the convolutional and entity factors to produce a high-quality and plausible image colourisation. We pose the problem as a classification task that can be addressed by a fully unsupervised approach, thus requires no human effort. Experimental results on three benchmark datasets show that our approach outperforms existing methods on standard quality metrics and achieves state-of-the-art performances on image colourisation. A large scale user study shows that our method is preferred over existing solutions. Code available at https://github.com/Riretta/Image_Colourisation_WiCV_2021.
Self-Attention Agreement among Capsules Rita Pucci, Christian Micheloni, Niki Martinel Proceedings of the IEEE International Conference on Computer Vision, 2021 At the state of the art, Capsule Networks (CapsNets) have shown to be a promising alternative to Convolutional Neural Networks (CNNs) in many computer vision tasks, due to their ability to encode object viewpoint variations. Network capsules provide maps of votes that focus on entities presence in the image and their pose. Each map is the point of view of a given capsule. To compute such votes, CapsNets rely on the routing-by-agreement mechanism. This computationally costly iterative algorithm selects the most appropriate parent capsule to have nodes in a parse tree for all the active capsules but this behaviour is not ensured by the routing, hence it possibly causes vanishing weights during training. We hypothesise that an attention-like mechanism will help capsules to select the predominant regions among the maps to focus on, hence introducing a more reliable way of learning the agreement between the capsules in a single pass. We propose the Attention Agreement Capsule Networks (AA-Caps) architecture that builds upon CapsNet by introducing a self-attention layer to suppress irrelevant capsule votes thus keeping only the ones that are useful for capsules agreements on a specific entity. The generated capsule attention map is then assigned to classification layer responsible of emitting the predicted image class. The proposed AA-Caps model has been evaluated on five benchmark datasets to validate its ability in dealing with the diverse and complex data that CapsNet often fails with. The achieved results demonstrate that AA-Caps outperforms existing methods without the need of more complex architectures or model ensembles.
Stir to pour: Efficient calibration of liquid properties for pouring actions Tatiana Lopez-Guevara, Rita Pucci, Nicholas K. Taylor, Michael U. Gutmann, Suhramanian Ramamoorthy, et al. IEEE International Conference on Intelligent Robots and Systems, 2020 Humans use simple probing actions to develop intuition about the physical behavior of common objects. Such intuition is particularly useful for adaptive estimation of favorable manipulation strategies of those objects in novel contexts. For example, observing the effect of tilt on a transparent bottle containing an unknown liquid provides clues on how the liquid might be poured. It is desirable to equip general-purpose robotic systems with this capability because it is inevitable that they will encounter novel objects and scenarios. In this paper, we teach a robot to use a simple, specified probing strategy - stirring with a stick- to reduce spillage when pouring unknown liquids. In the probing step, we continuously observe the effects of a real robot stirring a liquid, while simultaneously tuning the parameters to a model (simulator) until the two outputs are in agreement. We obtain optimal simulation parameters, characterizing the unknown liquid, via a Bayesian Optimizer that minimizes the discrepancy between real and simulated outcomes. Then, we optimize the pouring policy conditioning on the optimal simulation parameters determined via stirring. We show that using stirring as a probing strategy result in reduced spillage for three qualitatively different liquids when executed on a UR10 Robot, compared to probing via pouring. Finally, we provide quantitative insights into the reason for stirring being a suitable calibration task for pouring -a step towards automatic discovery of probing strategies.
Fixed simplex coordinates for angular margin loss in CapsNet Rita Pucci, Christian Micheloni, Gian Luca Foresti, Niki Martinel Proceedings International Conference on Pattern Recognition, 2020 A more stationary and discriminative embedding is necessary for robust classification of images. We focus our attention on the newel CapsNet model and we propose the angular margin loss function in composition with margin loss. We define a fixed classifier implemented with fixed weights vectors obtained by the vertex coordinates of a simplex polytope. The advantage of using simplex polytope is that we obtain the maximal symmetry for stationary features angularly centred. Each weight vector is to be considered as the centroid of a class in the dataset. The embedding of an image is obtained through the capsule network encoding phase, that is identified as digitcaps matrix. Based on the centroids from the simplex coordinates and the embedding from the model, we compute the angular distance between the image embedding and the centroid of the correspondent class of the image. We take this angular distance as angular margin loss. We keep the computation proposed for margin loss in the original architecture of CapsNet. We train the model to minimise the angular between the embedding and the centroid of the class and maximise the magnitude of the embedding for the predicted class. The experiments on different datasets demonstrate that the angular margin loss improves the capability of capsule networks with complex datasets.
Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification J van Bijsterveld, D Avitabile, FJ Verbeek, R Pucci arXiv preprint arXiv:2605.04904 , 2026 2026
Colour Extraction Pipeline for Odonates using Computer Vision MMS Rajaraman, FJ Verbeek, VJ Kalkman, R Pucci arXiv preprint arXiv:2604.18725 , 2026 2026
Colour Extraction Pipeline for Odonates using Computer Vision M Mirnalini Sundaram Rajaraman, FJ Verbeek, VJ Kalkman, R Pucci arXiv e-prints, arXiv: 2604.18725 , 2026 2026
Physics Informed Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement N Martinel, R Pucci arXiv preprint arXiv:2506.04753 , 2025 2025 Citations: 5
Ce-vae: capsule enhanced variational autoencoder for underwater image enhancement R Pucci, N Martinel 2025 IEEE/CVF winter conference on applications of computer vision (WACV … , 2025 2025 Citations: 7
by Subject TT Høye, F Skov, CJ Topping, M Lagisz, J Gevers, M Glemnitz, ... ele 70189, 18-08 , 2025 2025
Performance of computer vision algorithms for fine‐grained classification using crowdsourced insect images R Pucci, VJ Kalkman, D Stowell IET Computer Vision 19 (1), e70006 , 2025 2025 Citations: 8
AI species identification using image and sound recognition for citizen science, collection management and biomonitoring: From training pipeline to large-scale models L Hogeweg, N Yan, D Brunink, K Ezzaki-Chokri, W Gerritsen, R Pucci, ... Biodiversity Information Science and Standards 8, 6 , 2024 2024 Citations: 6
Capsule enhanced variational autoencoder for underwater image reconstruction R Pucci, N Martinel arXiv preprint arXiv:2406.01294 , 2024 2024 Citations: 7
Comparison between transformers and convolutional models for fine-grained classification of insects R Pucci, VJ Kalkman, D Stowell arXiv preprint arXiv:2307.11112 , 2023 2023 Citations: 6
Uw-proccaps: Underwater progressive colourisation with capsules R Pucci, N Martinel arXiv preprint arXiv:2307.01091 , 2023 2023 Citations: 2
UW-CVGAN: UnderWater image enhancement with capsules vectors quantization R Pucci, C Micheloni, N Martinel arXiv preprint arXiv:2302.01144 , 2023 2023 Citations: 2
CVGAN: Image Generation with Capsule Vector-VAE R Pucci, C Micheloni, GL Foresti, N Martinel International Conference on Image Analysis and Processing, 536-547 , 2022 2022 Citations: 2
Pro-ccaps: progressively teaching colourisation to capsules R Pucci, C Micheloni, GL Foresti, N Martinel Proceedings of the IEEE/CVF Winter Conference on Applications of Computer … , 2022 2022 Citations: 4
TUCaN: Progressively Teaching Colourisation to Capsules R Pucci, N Martinel arXiv preprint arXiv:2106.15176 , 2021 2021
Lord of the rings: Hanoi pooling and self-knowledge distillation for fast and accurate vehicle reidentification N Martinel, M Dunnhofer, R Pucci, GL Foresti, C Micheloni IEEE Transactions on Industrial Informatics 18 (1), 87-96 , 2021 2021 Citations: 24
Collaboration among Image and Object Level Features for Image Colourisation R Pucci, C Micheloni, N Martinel arXiv preprint arXiv:2101.07576 , 2021 2021
Fixed simplex coordinates for angular margin loss in capsnet R Pucci, C Micheloni, GL Foresti, N Martinel 2020 25th International Conference on Pattern Recognition (ICPR), 3042-3049 , 2021 2021 Citations: 6
Self-attention agreement among capsules R Pucci, C Micheloni, N Martinel Proceedings of the ieee/cvf international conference on computer vision, 272-280 , 2021 2021 Citations: 14
Collaborative image and object level features for image colourisation R Pucci, C Micheloni, N Martinel Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern … , 2021 2021 Citations: 10
MOST CITED SCHOLAR PUBLICATIONS
Human activity recognition using multisensor data fusion based on reservoir computing F Palumbo, C Gallicchio, R Pucci, A Micheli Journal of Ambient Intelligence and Smart Environments 8 (2), 87-107 , 2016 2016 Citations: 194
Lord of the rings: Hanoi pooling and self-knowledge distillation for fast and accurate vehicle reidentification N Martinel, M Dunnhofer, R Pucci, GL Foresti, C Micheloni IEEE Transactions on Industrial Informatics 18 (1), 87-96 , 2021 2021 Citations: 24
Stir to pour: Efficient calibration of liquid properties for pouring actions T Lopez-Guevara, R Pucci, NK Taylor, MU Gutmann, S Ramamoorthy, ... 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems … , 2020 2020 Citations: 20
A comparative analysis of SVM and IDNN for identifying penguin activities S Chessa, A Micheli, R Pucci, J Hunter, G Carroll, R Harcourt Applied Artificial Intelligence 31 (5-6), 453-471 , 2017 2017 Citations: 16
Deep interactive encoding with capsule networks for image classification R Pucci, C Micheloni, GL Foresti, N Martinel Multimedia Tools and Applications 79 (43), 32243-32258 , 2020 2020 Citations: 15
Self-attention agreement among capsules R Pucci, C Micheloni, N Martinel Proceedings of the ieee/cvf international conference on computer vision, 272-280 , 2021 2021 Citations: 14
Collaborative image and object level features for image colourisation R Pucci, C Micheloni, N Martinel Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern … , 2021 2021 Citations: 10
Localizing tortoise nests by neural networks R Barbuti, S Chessa, A Micheli, R Pucci PloS one 11 (3), e0151168 , 2016 2016 Citations: 10
Activity Recognition system based on Multisensor data fusion (AReM) F Palumbo, C Gallicchio, R Pucci, A Micheli UCI Machine Learning Repository: Irvine, CA, USA , 2016 2016 Citations: 9
Performance of computer vision algorithms for fine‐grained classification using crowdsourced insect images R Pucci, VJ Kalkman, D Stowell IET Computer Vision 19 (1), e70006 , 2025 2025 Citations: 8
WhoAmI: An automatic tool for visual recognition of tiger and leopard individuals in the wild R Pucci, J Shankaraiah, D Jathanna, U Karanth, K Subr arXiv preprint arXiv:2006.09962 , 2020 2020 Citations: 8
Ce-vae: capsule enhanced variational autoencoder for underwater image enhancement R Pucci, N Martinel 2025 IEEE/CVF winter conference on applications of computer vision (WACV … , 2025 2025 Citations: 7
Capsule enhanced variational autoencoder for underwater image reconstruction R Pucci, N Martinel arXiv preprint arXiv:2406.01294 , 2024 2024 Citations: 7
To stir or not to stir: Online estimation of liquid properties for pouring actions TL Guevara, R Pucci, NK Taylor, M Gutmann, S Ramamoorthy, K Subr Robotics: Science and Systems Workshop on Learning and Inference in Robotics … , 2018 2018 Citations: 7
AI species identification using image and sound recognition for citizen science, collection management and biomonitoring: From training pipeline to large-scale models L Hogeweg, N Yan, D Brunink, K Ezzaki-Chokri, W Gerritsen, R Pucci, ... Biodiversity Information Science and Standards 8, 6 , 2024 2024 Citations: 6
Comparison between transformers and convolutional models for fine-grained classification of insects R Pucci, VJ Kalkman, D Stowell arXiv preprint arXiv:2307.11112 , 2023 2023 Citations: 6
Fixed simplex coordinates for angular margin loss in capsnet R Pucci, C Micheloni, GL Foresti, N Martinel 2020 25th International Conference on Pattern Recognition (ICPR), 3042-3049 , 2021 2021 Citations: 6
Physics Informed Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement N Martinel, R Pucci arXiv preprint arXiv:2506.04753 , 2025 2025 Citations: 5
Human activity recognition using multisensor data fusion based on reservoir computing. J Ambient Intell Smart Environ 8 (2): 87–107 F Palumbo, C Gallicchio, R Pucci, A Micheli 2016 Citations: 5
Identification of nesting phase in tortoise populations by neural networks. extended abstract R Barbuti, S Chessa, A Micheli, R Pucci The 50th Anniversary Convention of the AISB, selected papers, 62-65 , 2013 2013 Citations: 5