Rubén Solera-Ureña

@inesc-id.pt

INESC-ID Lisboa

https://researchid.co/rubensoleraurea

Scopus Publications

299

Scholar Citations

Scholar h-index

Scholar i10-index

Scopus Publications

Assessment of Parkinson's disease medication state through automatic speech analysis
Anna Pompili, Rubén Solera-Ureña, Alberto Abad, Rita Cardoso, Isabel Guimarães, Margherita Fabbri, Isabel P. Martins, and Joaquim Ferreira
ISCA
Parkinson's disease (PD) is a progressive degenerative disorder of the central nervous system characterized by motor and non-motor symptoms. As the disease progresses, patients alternate periods in which motor symptoms are mitigated due to medication intake (ON state) and periods with motor complications (OFF state). The time that patients spend in the OFF condition is currently the main parameter employed to assess pharmacological interventions and to evaluate the efficacy of different active principles. In this work, we present a system that combines automatic speech processing and deep learning techniques to classify the medication state of PD patients by leveraging personal speech-based bio-markers. We devise a speaker-dependent approach and investigate the relevance of different acoustic-prosodic feature sets. Results show an accuracy of 90.54% in a test task with mixed speech and an accuracy of 95.27% in a semi-spontaneous speech task. Overall, the experimental assessment shows the potentials of this approach towards the development of reliable, remote daily monitoring and scheduling of medication intake of PD patients.

Project INSIDE: towards autonomous semi-unstructured human–robot social interaction in autism therapy
Francisco S. Melo, Alberto Sardinha, David Belo, Marta Couto, Miguel Faria, Anabela Farias, Hugo Gambôa, Cátia Jesus, Mithun Kinarullathil, Pedro Lima,et al.
Elsevier BV
This paper describes the INSIDE system, a networked robot system designed to allow the use of mobile robots as active players in the therapy of children with autism spectrum disorders (ASD). While a significant volume of work has explored the impact of robots in ASD therapy, most such work comprises remotely operated robots and/or well-structured interaction dynamics. In contrast, the INSIDE system allows for complex, semi-unstructured interaction in ASD therapy while featuring a fully autonomous robot. In this paper we describe the hardware and software infrastructure that supports such rich form of interaction, as well as the design methodology that guided the development of the INSIDE system. We also present some results on the use of our system both in pilot and in a long-term study comprising multiple therapy sessions with children at Hospital Garcia de Orta, in Portugal, highlighting the robustness and autonomy of the system as a whole.

A semi-supervised learning approach for acoustic-prosodic personality perception in under-resourced domains
Rubén Solera-Ureña, Helena Moniz, Fernando Batista, Vera Cabarrão, Anna Pompili, Ramon Fernandez Astudillo, Joana Campos, Ana Paiva, and Isabel Trancoso
ISCA
Automatic personality analysis has gained attention in the last years as a fundamental dimension in human-to-human and human-to-machine interaction. However, it still suffers from limited number and size of speech corpora for specific domains, such as the assessment of children’s personality. This paper investigates a semi-supervised training approach to tackle this scenario. We devise an experimental setup with age and language mismatch and two training sets: a small labeled training set from the Interspeech 2012 Personality Sub-challenge, containing French adult speech labeled with personality OCEAN traits, and a large unlabeled training set of Portuguese children’s speech. As test set, a corpus of Portuguese children’s speech labeled with OCEAN traits is used. Based on this setting, we investigate a weak supervision approach that iteratively refines an initial model trained with the labeled data-set using the unlabeled data-set. We also investigate knowledge-based features, which leverage expert knowledge in acoustic-prosodic cues and thus need no extra data. Results show that, despite the large mismatch imposed by language and age differences, it is possible to attain improvements with these techniques, pointing both to the benefits of using a weak supervision and expert-based acoustic-prosodic features across age and language.

Acoustic-prosodic automatic personality trait assessment for adults and children
Rubén Solera-Ureña, Helena Moniz, Fernando Batista, Ramón F. Astudillo, Joana Campos, Ana Paiva, and Isabel Trancoso
Springer International Publishing
This paper investigates the use of heterogeneous speech corpora for automatic assessment of personality traits in terms of the Big-Five OCEAN dimensions. The motivation for this work is twofold: the need to develop methods to overcome the lack of children’s speech corpora, particularly severe when targeting personality traits, and the interest on cross-age comparisons of acoustic-prosodic features to build robust paralinguistic detectors. For this purpose, we devise an experimental setup with age mismatch utilizing the Interspeech 2012 Personality Sub-challenge, containing adult speech, as training data. As test data, we use a corpus of children’s European Portuguese speech. We investigate various features sets such as the Sub-challenge baseline features, the recently introduced eGeMAPS features and our own knowledge-based features. The preliminary results bring insights into cross-age and -language detection of personality traits in spontaneous speech, pointing out to a stable set of acoustic-prosodic features for Extraversion and Agreeableness in both adult and child speech.

Real-time robust automatic speech recognition using compact support vector machines
R. Solera-Urena, A. I. Garcia-Moral, C. Pelaez-Moreno, Manel Martinez-Ramon, and F. Diaz-de-Maria
Institute of Electrical and Electronics Engineers (IEEE)
In the last years, support vector machines (SVMs) have shown excellent performance in many applications, especially in the presence of noise. In particular, SVMs offer several advantages over artificial neural networks (ANNs) that have attracted the attention of the speech processing community. Nevertheless, their high computational requirements prevent them from being used in practice in automatic speech recognition (ASR), where ANNs have proven to be successful. The high complexity of SVMs in this context arises from the use of huge speech training databases with millions of samples and highly overlapped classes. This paper suggests the use of a weighted least squares (WLS) training procedure that facilitates the possibility of imposing a compact semiparametric model on the SVM, which results in a dramatic complexity reduction. Such a complexity reduction with respect to conventional SVMs, which is between two and three orders of magnitude, allows the proposed hybrid WLS-SVC/HMM system to perform real-time speech decoding on a connected-digit recognition task (SpeechDat Spanish database). The experimental evaluation of the proposed system shows encouraging performance levels in clean and noisy conditions, although further improvements are required to reach the maturity level of current context-dependent HMM-based recognizers.

Data balancing for efficient training of hybrid ANN/HMM automatic speech recognition systems
Ana Isabel Garcia-Moral, Rubén Solera-Urena, Carmen Pelaez-Moreno, and Fernando Diaz-de-Maria
Institute of Electrical and Electronics Engineers (IEEE)
Hybrid speech recognizers, where the estimation of the emission pdf of the states of hidden Markov models (HMMs), usually carried out using Gaussian mixture models (GMMs), is substituted by artificial neural networks (ANNs) have several advantages over the classical systems. However, to obtain performance improvements, the computational requirements are heavily increased because of the need to train the ANN. Departing from the observation of the remarkable skewness of speech data, this paper proposes sifting out the training set and balancing the amount of samples per class. With this method, the training time has been reduced 18 times while obtaining performances similar to or even better than those with the whole database, especially in noisy environments. However, the application of these reduced sets is not straightforward. To avoid the mismatch between training and testing conditions created by the modification of the distribution of the training data, a proper scaling of the a posteriori probabilities obtained and a resizing of the context window need to be performed as demonstrated in this paper.

UC3M high level feature extraction at TRECVID 2008

Robust ASR using Support Vector Machines
R. Solera-Ureña, D. Martín-Iglesias, A. Gallardo-Antolín, C. Peláez-Moreno, and F. Díaz-de-María
Elsevier BV
The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units. In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM-SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841-1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.

SVMs for automatic speech recognition: A survey
R. Solera-Ureña, J. Padrell-Sendra, D. Martín-Iglesias, A. Gallardo-Antolín, C. Peláez-Moreno, and F. Díaz-de-María
Springer Berlin Heidelberg
Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact. During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed. These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research.

Hybrid models for automatic speech recognition: A comparison of classical ANN and kernel based methods
Ana I. García-Moral, Rubén Solera-Ureña, Carmen Peláez-Moreno, and Fernando Díaz-de-María
Springer Berlin Heidelberg
Support Vector Machines (SVMs) are state-of-the-art methods for machine learning but share with more classical Artificial Neural Networks (ANNs) the difficulty of their application to input patterns of non-fixed dimension. This is the case in Automatic Speech Recognition (ASR), in which the duration of the speech utterances is variable. In this paper we have recalled the hybrid (ANN/HMM) solutions provided in the past for ANNs and applied them to SVMs performing a comparison between them. We have experimentally assessed both hybrid systems with respect to the standard HMM-based ASR system, for several noisy environments. On the one hand, the ANN/HMM system provides better results than the HMM-based system. On the other, the results achieved by the SVM/HMM system are slightly lower than those of the HMM system. Nevertheless, such a results are encouraging due to the current limitations of the SVM/HMM system.

RECENT SCHOLAR PUBLICATIONS

Using Self-Supervised Feature Extractors with Attention for Automatic COVID-19 Detection from Speech
J Mendona, R Solera-Urea, A Abad, I Trancoso
arXiv preprint arXiv:2107.00112 2021

Transfer Learning-Based Cough Representations for Automatic Detection of COVID-19
R Solera-Urea, C Botelho, F Teixeira, T Rolland, A Abad, I Trancoso
Proceedings of INTERSPEECH 2021, 436-440 2021

Assessment of Parkinson's Disease Medication State through Automatic Speech Analysis
A Pompili, R Solera-Urea, A Abad, R Cardoso, I Guimares, M Fabbri, ...
Proceedings of INTERSPEECH 2020, 4591-4595 2020

Uma abordagem de aprendizagem semissupervisionada para a classificao automtica de personalidade baseada em pistas acstico-prosdicas
R Solera-Urea, H Moniz, F Batista, V Cabarro, A Pompili, ...
Revista da Associao Portuguesa de Lingustica, 348-364 2019

Affective analysis of customer service calls
V Cabarro, M Julio, R Solera-Urea, H Moniz, F Batista, I Trancoso, ...
10th International Conference of Experimental Linguistics (ExLing 2019), 37-40 2019

Affective computing based on acoustic-prosodic cues
H Moniz, R Solera-Urea, V Cabarro, M Julio, F Batista, I Trancoso
14th Annual INGRoup Conference (Interdisciplinary Network for Group Research 2019

Project INSIDE: towards autonomous semi-unstructured human–robot social interaction in autism therapy
FS Melo, A Sardinha, D Belo, M Couto, M Faria, A Farias, H Gamba, ...
Artificial Intelligence in Medicine 96, 198 - 216 2019

Uma abordagem de aprendizagem semi-supervisionada para a percepo automtica de personalidade, baseada em pistas acstico-prosdicas em domnios com poucos recursos
R Solera-Urea, H Moniz, F Batista, V Cabarro, A Pompili, ...
XXXIV Encontro Nacional da Associao Portuguesa de Lingustica 2018

A Semi-Supervised Learning Approach for Acoustic-Prosodic Personality Perception in Under-Resourced Domains
R Solera-Urea, H Moniz, F Batista, V Cabarro, A Pompili, ...
Proceedings of INTERSPEECH 2017, 929-933 2017

Acoustic-Prosodic Automatic Personality Trait Assessment for Adults and Children
R Solera-Urea, H Moniz, F Batista, R Fernndez Astudillo, J Campos, ...
Advances in Speech and Language Technologies for Iberian Languages 2016

Human-Robotic Agent Speech Interaction
R Solera-Urea, H Moniz
2016

Real-time Robust Automatic Speech Recognition Using Compact Support Vector Machines
R Solera-Urea, AI Garca-Moral, C Pelez-Moreno, M Martnez-Ramn, ...
IEEE Transactions on Audio Speech and LanguageProcessing 20 (4), 1347-1361 2012

Mquinas de vectores soporte para reconocimiento robusto de habla
R Solera-Urea
Universidad Carlos III de Madrid, Spain 2011

Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems
AI Garca-Moral, R Solera-Urea, C Pelez-Moreno, F Daz-de-Mara
IEEE Transactions on Audio, Speech, and Language Processing 19 (3), 468-481 2011

UC3M high level feature extraction at TRECVID 2008
I Gonzlez-Daz, D Garca-Garca, R Solera-Urea, J Madrid-Snchez, ...
2008 TREC VIDEO RETRIEVAL EVALUATION Workshop (TRECVID 2008) 2008

Hybrid models for automatic speech recognition: a comparison of classical ANN and kernel based methods
AI Garca-Moral, R Solera-Urea, C Pelez-Moreno, F Daz-de-Mara
Advances in Nonlinear Speech Processing (NOLISP 2007) 4885, 152-160 2007

Robust ASR using Support Vector Machines
R Solera-Urea, D Martn-Iglesias, A Gallardo-Antoln, C Pelez-Moreno, ...
Speech Communication 49 (4), 253-267 2007

SVMs for Automatic Speech Recognition: A Survey
R Solera-Urea, J Padrell-Sendra, D Martn-Iglesias, A Gallardo-Antoln, ...
Progress in Nonlinear Speech Processing 4391, 190-216 2007

Estimacin de probabilidades a posteriori en SVMs multiclase para reconocimiento de habla continua
R Solera-Urea, F Prez-Cruz, F Daz-de-Mara
IV Jornadas en Tecnologas del Habla 2006 (JTH 2006) 2006

Equipo de comentarista digital basado en OFDM
JJ Escudero Garzs, A Sanfiz Pinto, R Solera-Urea, J Miranda Andrade, ...
XIV Jornadas Telecom I+D 2004 2004

MOST CITED SCHOLAR PUBLICATIONS

Robust ASR using Support Vector Machines
R Solera-Urea, D Martn-Iglesias, A Gallardo-Antoln, C Pelez-Moreno, ...
Speech Communication 49 (4), 253-267 2007
Citations: 68

SVMs for Automatic Speech Recognition: A Survey
R Solera-Urea, J Padrell-Sendra, D Martn-Iglesias, A Gallardo-Antoln, ...
Progress in Nonlinear Speech Processing 4391, 190-216 2007
Citations: 58

Transfer Learning-Based Cough Representations for Automatic Detection of COVID-19
R Solera-Urea, C Botelho, F Teixeira, T Rolland, A Abad, I Trancoso
Proceedings of INTERSPEECH 2021, 436-440 2021
Citations: 13

Mquinas de vectores soporte para reconocimiento robusto de habla
R Solera-Urea
Universidad Carlos III de Madrid, Spain 2011
Citations: 5

Estimacin de probabilidades a posteriori en SVMs multiclase para reconocimiento de habla continua
R Solera-Urea, F Prez-Cruz, F Daz-de-Mara
IV Jornadas en Tecnologas del Habla 2006 (JTH 2006) 2006
Citations: 4

Affective analysis of customer service calls
V Cabarro, M Julio, R Solera-Urea, H Moniz, F Batista, I Trancoso, ...
10th International Conference of Experimental Linguistics (ExLing 2019), 37-40 2019
Citations: 2

Using Self-Supervised Feature Extractors with Attention for Automatic COVID-19 Detection from Speech
J Mendona, R Solera-Urea, A Abad, I Trancoso
arXiv preprint arXiv:2107.00112 2021
Citations: 1