@postdoc
Naturalis Biodiversity Center
Artificial Intelligence, Computer Vision and Pattern Recognition, Animal Science and Zoology
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Rita Pucci, Christian Micheloni, Gian Luca Foresti, and Niki Martinel
Springer International Publishing
Rita Pucci, Christian Micheloni, Gian Luca Foresti, and Niki Martinel
IEEE
Automatic image colourisation studies how to colourise greyscale images. Existing approaches exploit convolutional layers that extract image-level features learning the colourisation on the entire image, but miss entities-level ones due to pooling strategies. We believe that entity-level features are of paramount importance to deal with the intrinsic multimodality of the problem (i.e., the same object can have different colours, and the same colour can have different properties). Models based on capsule layers aim to identify entity-level features in the image from different points of view, but they do not keep track of global features.Our network architecture integrates entity-level features into the image-level features to generate a plausible image colourisation. We observed that results obtained with direct integration of such two representations are largely dominated by the image-level features, thus resulting in unsaturated colours for the entities. To limit such an issue, we propose a gradual growth of the reconstruction phase of the model while training. By advantaging of prior knowledge from each growing step, we obtain a stable collaboration between image-level and entity-level features that ultimately generates stable and vibrant colourisations. Experimental results on three benchmark datasets, and a user study, demonstrate that our approach has competitive performance with respect to the state-of-the-art and provides more consistent colourisation.
Niki Martinel, Matteo Dunnhofer, Rita Pucci, Gian Luca Foresti, and Christian Micheloni
Institute of Electrical and Electronics Engineers (IEEE)
Vehicle reidentification has seen increasing interest, thanks to its fundamental impact on intelligent surveillance systems and smart transportation. The visual data acquired from monitoring camera networks come with severe challenges, including occlusions, color and illumination changes, as well as orientation issues (a vehicle can be seen from the side/front/rear due to different camera viewpoints). To deal with such challenges, the community has spent much effort in learning robust feature representations that hinge on additional visual attributes and part-driven methods, but with the side effects of requiring extensive human annotation labor as well as increasing computational complexity. In this article, we propose an approach that learns a feature representation robust to vehicle orientation issues without the need for extra-labeled data and adding negligible computational overheads. The former objective is achieved through the introduction of a Hanoi pooling layer exploiting ring regions and the image pyramid approach yielding a multiscale representation of vehicle appearance. The latter is tackled by transferring the accuracy of a deep network to its first layers, thus reducing the inference effort by the early stop of a test example. This is obtained by means of a self-knowledge distillation framework encouraging multiexit network decisions to agree with each other. Results demonstrate that the proposed approach significantly improves the accuracy of early (i.e., very fast) exits while maintaining the same accuracy of a deep (slow) baseline. Moreover, our solution obtains the best existing performance on three benchmark datasets. 11[Online]. Available: https://github.com/iN1k1/.
Rita Pucci, Christian Micheloni, and Niki Martinel
IEEE
Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum. Previous approaches attacked the problem either by requiring intense user-interactions or by exploiting the ability of convolutional neural networks (CNNs) in learning image-level (context) features. However, obtaining human hints is not always feasible and CNNs alone are not able to learn entity-level semantics, unless multiple models pre-trained with supervision are considered. In this work, we propose a single network, named UCapsNet, that takes into consideration the image-level features obtained through convolutions and entity-level features captured by means of capsules. Then, by skip connections over different layers, we enforce collaboration between such the convolutional and entity factors to produce a high-quality and plausible image colourisation. We pose the problem as a classification task that can be addressed by a fully unsupervised approach, thus requires no human effort. Experimental results on three benchmark datasets show that our approach outperforms existing methods on standard quality metrics and achieves state-of-the-art performances on image colourisation. A large scale user study shows that our method is preferred over existing solutions. Code available at https://github.com/Riretta/Image_Colourisation_WiCV_2021.
Rita Pucci, Christian Micheloni, and Niki Martinel
IEEE
At the state of the art, Capsule Networks (CapsNets) have shown to be a promising alternative to Convolutional Neural Networks (CNNs) in many computer vision tasks, due to their ability to encode object viewpoint variations. Network capsules provide maps of votes that focus on entities presence in the image and their pose. Each map is the point of view of a given capsule. To compute such votes, CapsNets rely on the routing-by-agreement mechanism. This computationally costly iterative algorithm selects the most appropriate parent capsule to have nodes in a parse tree for all the active capsules but this behaviour is not ensured by the routing, hence it possibly causes vanishing weights during training. We hypothesise that an attention-like mechanism will help capsules to select the predominant regions among the maps to focus on, hence introducing a more reliable way of learning the agreement between the capsules in a single pass. We propose the Attention Agreement Capsule Networks (AA-Caps) architecture that builds upon CapsNet by introducing a self-attention layer to suppress irrelevant capsule votes thus keeping only the ones that are useful for capsules agreements on a specific entity. The generated capsule attention map is then assigned to classification layer responsible of emitting the predicted image class. The proposed AA-Caps model has been evaluated on five benchmark datasets to validate its ability in dealing with the diverse and complex data that CapsNet often fails with. The achieved results demonstrate that AA-Caps outperforms existing methods without the need of more complex architectures or model ensembles.
Rita Pucci, Christian Micheloni, Gian Luca Foresti, and Niki Martinel
Springer Science and Business Media LLC
Tatiana Lopez-Guevara, Rita Pucci, Nicholas K. Taylor, Michael U. Gutmann, Suhramanian Ramamoorthy, and Kartic Suhr
IEEE
Humans use simple probing actions to develop intuition about the physical behavior of common objects. Such intuition is particularly useful for adaptive estimation of favorable manipulation strategies of those objects in novel contexts. For example, observing the effect of tilt on a transparent bottle containing an unknown liquid provides clues on how the liquid might be poured. It is desirable to equip general-purpose robotic systems with this capability because it is inevitable that they will encounter novel objects and scenarios. In this paper, we teach a robot to use a simple, specified probing strategy - stirring with a stick- to reduce spillage when pouring unknown liquids. In the probing step, we continuously observe the effects of a real robot stirring a liquid, while simultaneously tuning the parameters to a model (simulator) until the two outputs are in agreement. We obtain optimal simulation parameters, characterizing the unknown liquid, via a Bayesian Optimizer that minimizes the discrepancy between real and simulated outcomes. Then, we optimize the pouring policy conditioning on the optimal simulation parameters determined via stirring. We show that using stirring as a probing strategy result in reduced spillage for three qualitatively different liquids when executed on a UR10 Robot, compared to probing via pouring. Finally, we provide quantitative insights into the reason for stirring being a suitable calibration task for pouring -a step towards automatic discovery of probing strategies.
Rita Pucci, Christian Micheloni, Gian Luca Foresti, and Niki Martinel
IEEE
A more stationary and discriminative embedding is necessary for robust classification of images. We focus our attention on the newel CapsNet model and we propose the angular margin loss function in composition with margin loss. We define a fixed classifier implemented with fixed weights vectors obtained by the vertex coordinates of a simplex polytope. The advantage of using simplex polytope is that we obtain the maximal symmetry for stationary features angularly centred. Each weight vector is to be considered as the centroid of a class in the dataset. The embedding of an image is obtained through the capsule network encoding phase, that is identified as digitcaps matrix. Based on the centroids from the simplex coordinates and the embedding from the model, we compute the angular distance between the image embedding and the centroid of the correspondent class of the image. We take this angular distance as angular margin loss. We keep the computation proposed for margin loss in the original architecture of CapsNet. We train the model to minimise the angular between the embedding and the centroid of the class and maximise the magnitude of the embedding for the predicted class. The experiments on different datasets demonstrate that the angular margin loss improves the capability of capsule networks with complex datasets.
Rita Pucci, Christian Micheloni, Vito Roberto, Gian Luca Foresti, and Niki Martinel
ACM
Image recognition is an open challenge in computer vision since its early stages. The application of deep neural networks yielded significant improvements towards its solution. Despite their classification abilities, deep networks need datasets with thousands of labelled images and prohibitive computational capabilities to achieve good performance. To address some of these challenges, the CapsNet neural architecture has been recently proposed as a promising machine learning model for image classification based on the idea of capsules. A capsule is a group of neurons whose output represents the presence of features of the same entity. In this paper, we start from the CapsNet architecture to explore and analyse the interaction between the presence of features within certain, similar classes. This is achieved by means of techniques for the features interaction, working on the outputs of two independent capsule-based models. To understand the importance of the interaction between capsules, extensive experiments have been carried out on four challenging dataset. Results show that the exploitation of capsules interaction yields to performance improvements.
Stefano Chessa, Alessio Micheli, Rita Pucci, Jane Hunter, Gemma Carroll, and Rob Harcourt
Informa UK Limited
ABSTRACT Where, when and how much animals eat provide valuable insights into their ecology. In this paper, we present a comparative analysis between Support Vector Machine (SVM) and Input Delay Neural Network (IDNN) models to identify prey capture events from penguin accelerometry data. A pre-classified dataset of 3D time-series data from back-mounted accelerometers was used. We trained both the models to classify the penguins’ behavior at intervals as either ‘prey handling’ or ‘swimming’. The aim was to determine whether IDNN could achieve the same level of classification accuracy as SVM, but with reduced memory demands. This would enable the IDNN model to be embedded on the accelerometer micro-system itself, and hence reduce the magnitude of the output data to be uploaded. Based on the classification results, this paper provides an analysis of the two models from both an accuracy and applicability point of view. The experimental results show that both models achieve an equivalent accuracy of approx. 85% using the featured data, with a memory demand of 0.5 kB for IDNN and 0.7 Mb for SVM. The raw accelerometer data let us improve the generalizability of the models with a slightly lower accuracy to around 80%. This indicates that the IDNN model can embed on the accelerometer itself, reducing problems associated with raw time-series data retrieval and loss.
Roberto Barbuti, Stefano Chessa, Alessio Micheli, and Rita Pucci
Public Library of Science (PLoS)
The goal of this research is to recognize the nest digging activity of tortoises using a device mounted atop the tortoise carapace. The device classifies tortoise movements in order to discriminate between nest digging, and non-digging activity (specifically walking and eating). Accelerometer data was collected from devices attached to the carapace of a number of tortoises during their two-month nesting period. Our system uses an accelerometer and an activity recognition system (ARS) which is modularly structured using an artificial neural network and an output filter. For the purpose of experiment and comparison, and with the aim of minimizing the computational cost, the artificial neural network has been modelled according to three different architectures based on the input delay neural network (IDNN). We show that the ARS can achieve very high accuracy on segments of data sequences, with an extremely small neural network that can be embedded in programmable low power devices. Given that digging is typically a long activity (up to two hours), the application of ARS on data segments can be repeated over time to set up a reliable and efficient system, called Tortoise@, for digging activity recognition.
Filippo Palumbo, Claudio Gallicchio, Rita Pucci, and Alessio Micheli
IOS Press
Activity recognition plays a key role in providing activity assistance and care for users in smart homes. In this work, we present an activity recognition system that classifies in the near real-time a set of common daily activities exploiting both the data sampled by sensors embedded in a smartphone carried out by the user and the reciprocal Received Signal Strength (RSS) values coming from worn wireless sensor devices and from sensors deployed in the environment. In order to achieve an effective and responsive classification, a decision tree based on multisensor data-stream is applied fusing data coming from embedded sensors on the smartphone and environmental sensors before processing the RSS stream. To this end, we model the RSS stream, obtained from a Wireless Sensor Network (WSN), using Recurrent Neural Networks (RNNs) implemented as efficient Echo State Networks (ESNs), within the Reservoir Computing (RC) paradigm. We targeted the system for the EvAAL scenario, an international competition that aims at establishing benchmarks and evaluation metrics for comparing Ambient Assisted Living (AAL) solutions. In this paper, the performance of the proposed activity recognition system is assessed on a purposely collected real-world dataset, taking also into account a competitive neural network approach for performance comparison. Our results show that, with an appropriate configuration of the information fusion chain, the proposed system reaches a very good accuracy with a low deployment cost.
R. Barbuti and D. Pallini