@ulisboa.pt
Department of Computer Science and Engineering
INESC-ID, Instituto Superior Técnico, Universidade de Lisboa
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Diogo A.P. Nunes, Joana Ferreira-Gomes, Daniela Oliveira, Carlos Vaz, Sofia Pimenta, Fani Neto, and David Martins de Matos
IEEE
We demonstrate a proof-of-concept for the analysis of the language of chronic pain for pain intensity estimation. Importantly, we show that focus on specific words/themes is especially correlated with specific pain intensity categories. We interviewed chronic pain patients and collected demographic and clinical data. 65 patients (40 females), averaging $\\mathbf{56.4} \\pm \\mathbf{12.7}$ years of age, participated in the study. Patients reported their current pain intensity on a Visual Analogue Scale, which we discretized into 3 classes: mild, moderate, and severe pain. We extracted language features from the transcribed interview of each patient and used them to classify their pain intensity category. We measured performance with the weighted $\\mathbf{F}_{\\mathbf{1}}$ score. Finally, we analyzed potential confounding variables for internal validity. The best performing model was the Support Vector Machine with an Early Fusion of select language features, with an $\\mathbf{F}_{\\mathbf{1}}$ of 0.60, improving 39.5% upon the baseline. Patients with mild pain focused more on verbs, whilst moderate and severe pain patients focused on adverbs, and nouns and adjectives, respectively. We show that language features from patient narratives indeed convey information relevant for pain intensity estimation, and that our models can take advantage of that.
Francisco Afonso Raposo, David Martins de Matos, and Ricardo Ribeiro
Springer Science and Business Media LLC
Embodied Cognition (EC) states that semantics is encoded in the brain as firing patterns of neural circuits, which are learned according to the statistical structure of human multimodal experience. However, each human brain is idiosyncratically biased, according to its subjective experience, making this biological semantic machinery noisy with respect to semantics inherent to media, such as music and language. We propose to represent media semantics using low-dimensional vector embeddings by jointly modeling the functional Magnetic Resonance Imaging (fMRI) activity of several brains via Generalized Canonical Correlation Analysis (GCCA). We evaluate the semantic richness of the resulting latent space in appropriate semantic classification tasks: music genres and language topics. We show that the resulting unsupervised representations outperform the original high-dimensional fMRI voxel spaces in these downstream tasks while being more computationally efficient. Furthermore, we show that joint modeling of several subjects increases the semantic richness of the learned latent vector spaces as the number of subjects increases. Quantitative results and corresponding statistical significance testing demonstrate the instantiation of music and language semantics in the brain, thereby providing further evidence for multimodal embodied cognition as well as a method for extraction of media semantics from multi-subject brain dynamics.
Eugénio Ribeiro, Ricardo Ribeiro, and David Martins de Matos
AI Access Foundation
From the perspective of a dialog system, it is important to identify the intention behind the segments in a dialog, since it provides an important cue regarding the information that is present in the segments and how they should be interpreted. ISO 24617-2, the standard for dialog act annotation, defines a hierarchically organized set of general-purpose communicative functions which correspond to different intentions that are relevant in the context of a dialog. We explore the automatic recognition of these communicative functions in the DialogBank, which is a reference set of dialogs annotated according to this standard. To do so, we propose adaptations of existing approaches to flat dialog act recognition that allow them to deal with the hierarchical classification problem. More specifically, we propose the use of an end-to-end hierarchical network with cascading outputs and maximum a posteriori path estimation to predict the communicative function at each level of the hierarchy, preserve the dependencies between the functions in the path, and decide at which level to stop. Furthermore, since the amount of dialogs in the DialogBank is small, we rely on transfer learning processes to reduce overfitting and improve performance. The results of our experiments show that our approach outperforms both a flat one and hierarchical approaches based on multiple classifiers and that each of its components plays an important role towards the recognition of general-purpose communicative functions.
Francisco Afonso Raposo, David Martins de Matos, and Ricardo Ribeiro
Springer Science and Business Media LLC
Music semantics is embodied, in the sense that meaning is biologically mediated by and grounded in the human body and brain. This embodied cognition perspective also explains why music structures modulate kinetic and somatosensory perception. We explore this aspect of cognition, by considering dance as an overt expression of semantic aspects of music related to motor intention, in an artificial deep recurrent neural network that learns correlations between music audio and dance video. We claim that, just like human semantic cognition is based on multimodal statistical structures, joint statistical modeling of music and dance artifacts is expected to capture semantics of these modalities. We evaluate the ability of this model to effectively capture underlying semantics in a cross-modal retrieval task, including dance styles in an unsupervised fashion. Quantitative results, validated with statistical significance testing, strengthen the body of evidence for embodied cognition in music and demonstrate the model can recommend music audio for dance video queries and vice versa.
Diogo Vaz, David Matos, Miguel Pardal, and Miguel Correia
ACM
Many popular mobile applications rely on the Backend-as-a-Service (BaaS) cloud computing model to simplify the development and management of services like data storage, user authentication and notifications. However, vulnerabilities and other issues may lead to malicious operations on the mobile application client-side and malicious requests being sent to the backend, corrupting the state of the application in the cloud. To deal with these attacks after they happen and are successful, it is necessary to remove the immediate effects created by the malicious requests and subsequent effects derived from later requests. In this paper, we present MIRES, an intrusion recovery service for mobile applications based on BaaS. MIRES uses a two-phase recovery process that restores the integrity of the mobile application and minimizes its unavailability. We implemented MIRES in Android and with the Firebase platform and made experiments with 3 mobile applications that showed results of 1000 operations reverted in less than 1 minute and with the mobile application inaccessible only for less than 15 seconds.
Eugénio Ribeiro, Andreia Sofia Teixeira, Ricardo Ribeiro, and David Martins de Matos
Springer Science and Business Media LLC
AbstractResources such as FrameNet, which provide sets of semantic frame definitions and annotated textual data that maps into the evoked frames, are important for several NLP tasks. However, they are expensive to build and, consequently, are unavailable for many languages and domains. Thus, approaches able to induce semantic frames in an unsupervised manner are highly valuable. In this paper we approach that task from a network perspective as a community detection problem that targets the identification of groups of verb instances that evoke the same semantic frame and verb arguments that play the same semantic role. To do so, we apply a graph-clustering algorithm to a graph with contextualized representations of verb instances or arguments as nodes connected by edges if the distance between them is below a threshold that defines the granularity of the induced frames. By applying this approach to the benchmark dataset defined in the context of SemEval 2019, we outperformed all of the previous approaches to the task, achieving the current state-of-the-art performance.
Anna Pompili, Alberto Abad, David Martins de Matos, and Isabel Pavao Martins
Institute of Electrical and Electronics Engineers (IEEE)
Clinical literature provides convincing evidence that language deficits in Alzheimer's disease (AD) allow for distinguishing patients with dementia from healthy subjects. Currently, computational approaches have widely investigated lexicosemantic aspects of discourse production, while pragmatic aspects like cohesion and coherence, are still mostly unexplored. In this article, we aim at providing a more comprehensive characterization of language abilities for the automatic identification of AD in narrative description tasks by also incorporating pragmatic aspects of speech production. To this end, we investigate the relevance of a recently proposed set of pragmatic features extracted from an automatically generated topic hierarchy graph in combination with a complementary set of state-of-the-art features encoding lexical, syntactic and semantic cues. Experimental results on the DementiaBank corpus show an accuracy improvement from 82.6% to 85.5% in identifying AD patients when pragmatic features are incorporated to the set of lexicosemantic features. Nevertheless, these results are obtained relying on manual transcriptions, which strongly limits the applicability of computational analysis to clinical settings. Thus, in this work we additionally carry out an analysis of the errors introduced by a speech recognition system and the way in which they impact the performance of the proposed method. In spite of the high word error rates obtained on these data (∼40%), automatic AD identification accuracy decreased only to 79.7%, which is considered a remarkable result when compared with solutions based on manual transcriptions.
Ricardo Moura, David R. Matos, Miguel L. Pardal, and Miguel Correia
Springer International Publishing
TLS ensures confidentiality, integrity, and authenticity of communications. However, design, implementation, and cryptographic vulnerabilities can make TLS communication channels insecure. We need mechanisms that allow the channels to be kept secure even when a new vulnerability is discovered. We present MultiTLS, a middleware based on diversity and tunneling mechanisms that allows keeping communication channels secure even when new vulnerabilities are discovered. MultiTLS creates a secure communication channel through the encapsulation of k TLS channels, where each one uses a different cipher suite. We evaluated the performance of MultiTLS and concluded that it has the advantage of being easy to use and maintain since it does not modify any of its dependencies.
Eugénio Ribeiro, Andreia Sofia Teixeira, Ricardo Ribeiro, and David Martins de Matos
Springer International Publishing
Resources such as FrameNet provide semantic information that is important for multiple tasks. However, they are expensive to build and, consequently, are unavailable for many languages and domains. Thus, approaches able to induce semantic frames in an unsupervised manner are highly valuable. In this paper we approach that task from a network perspective as a community detection problem that targets the identification of groups of verb instances that evoke the same semantic frame. To do so, we apply a graph-clustering algorithm to a graph with contextualized representations of verb instances as nodes connected by an edge if the distance between them is below a threshold that defines the granularity of the induced frames. By applying this approach to the benchmark dataset defined in the context of the SemEval shared task we outperformed all the previous approaches to the task.
Eugénio Ribeiro, Ricardo Ribeiro, and David Martins de Matos
AI Access Foundation
Automatic dialog act recognition is a task that has been widely explored over the years. In recent works, most approaches to the task explored different deep neural network architectures to combine the representations of the words in a segment and generate a segment representation that provides cues for intention. In this study, we explore means to generate more informative segment representations, not only by exploring different network architectures, but also by considering different token representations, not only at the word level, but also at the character and functional levels. At the word level, in addition to the commonly used uncontextualized embeddings, we explore the use of contextualized representations, which are able to provide information concerning word sense and segment structure. Character-level tokenization is important to capture intention-related morphological aspects that cannot be captured at the word level. Finally, the functional level provides an abstraction from words, which shifts the focus to the structure of the segment. Additionally, we explore approaches to enrich the segment representation with context information from the history of the dialog, both in terms of the classifications of the surrounding segments and the turn-taking history. This kind of information has already been proved important for the disambiguation of dialog acts in previous studies. Nevertheless, we are able to capture additional information by considering a summary of the dialog history and a wider turn-taking context. By combining the best approaches at each step, we achieve performance results that surpass the previous state-of-the-art on generic dialog act recognition on both the Switchboard Dialog Act Corpus (SwDA) and the ICSI Meeting Recorder Dialog Act Corpus (MRDA), which are two of the most widely explored corpora for the task. Furthermore, by considering both past and future context, similarly to what happens in an annotation scenario, our approach achieves a performance similar to that of a human annotator on SwDA and surpasses it on MRDA.
Miguel Vasco, Francisco S. Melo, David Martins de Matos, Ana Paiva, and Tetsunari Inamura
IEEE
Humans interact in rich and diverse ways with the environment. However, the representation of such behavior by artificial agents is often limited. In this work we present motion concepts, a novel multimodal representation of human actions in a household environment. A motion concept encompasses a probabilistic description of the kinematics of the action along with its contextual background, namely the location and the objects held during the performance. We introduce a novel algorithm which learns and recognizes motion concepts from action demonstrations, named Online Motion Concept Learning (OMCL). The algorithm is evaluated on a virtual-reality household environment with the presence of a human avatar. OMCL outperforms standard motion recognition algorithms on an one-shot recognition task, attesting to its potential for sample-efficient recognition of human actions.
Francisco Afonso Raposo, David Martins de Matos, and Ricardo Ribeiro
Elsevier BV
Music summarization allows for higher efficiency in processing, storage, and sharing of datasets. Machine-oriented approaches, being agnostic to human consumption, optimize these aspects even further. Such summaries have already been successfully validated in some MIR tasks. We now generalize previous conclusions by evaluating the impact of generic summarization of music from a probabilistic perspective. We estimate Gaussian distributions for original and summarized songs and compute their relative entropy, in order to measure information loss incurred by summarization. Our results suggest that relative entropy is a good predictor of summarization performance in the context of tasks relying on a bag-of-features model. Based on this observation, we further propose a straightforward yet expressive summarizer, which minimizes relative entropy with respect to the original song, that objectively outperforms previous methods and is better suited to avoid potential copyright issues.
Eugénio Ribeiro, Ricardo Ribeiro, and David Martins de Matos
University of Minho
Os actos de diálogo revelam a intenção por trás das palavras pronunciadas. Por isso, o seu reconhecimento automático é importante para um sistema de diálogo que tenta entender o seu interlocutor. O estudo apresentado neste artigo aborda essa tarefa no corpus DIHANA, cujo esquema de anotação de actos de diálogo em três níveis coloca problemas que não foram explorados em estudos recentes. Além do problema hierárquico, os dois níveis inferiores colocam problemas de classificação multi-etiqueta. Além disso, cada nível da hierarquia refere-se a um aspecto diferente relativo à intenção do orador, tanto em termos da estrutura do diálogo, como da tarefa. Por outro lado, uma vez que os diálogos são em espanhol, este corpus permite-nos avaliar se as melhores abordagens para dados em inglês generalizam para uma língua diferente. Mais especificamente, comparamos o desempenho de diferentes abordagens de representação de segmentos, com foco tanto em sequências como em padrões de palavras, e avaliamos a importância do histórico do diálogo e das relações entre os múltiplos níveis da hierarquia. No que diz respeito ao problema de classificação de etiqueta única colocado pelo nível superior, mostramos que as conclusões obtidas a partir de dados em inglês se mantêm em dados em espanhol. Para além disso, mostramos que as abordagens podem ser adaptadas para cenários multi-etiqueta. Por fim, combinando hierarquicamente os melhores classificadores para cada nível, obtemos os melhores resultados reportados para este corpus.
Eugénio Ribeiro, Ricardo Ribeiro, and David de Matos
MDPI AG
Automatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention. Thus, in this study, we explored the use of character-level tokenization to capture that information. We explored the use of multiple character windows of different sizes to capture morphological aspects, such as affixes and lemmas, as well as inter-word information. Furthermore, we assessed the importance of punctuation and capitalization for the task. To broaden the conclusions of our study, we performed experiments on dialogs in three languages—English, Spanish, and German—which have different morphological characteristics. Furthermore, the dialogs cover multiple domains and are annotated with both domain-dependent and domain-independent dialog act labels. The achieved results not only show that the character-level approach leads to similar or better performance than the state-of-the-art word-level approaches on the task, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels.
David R. Matos, Miguel L. Pardal, Georg Carle, and Miguel Correia
ACM
Hugo Rosa, David Matos, Ricardo Ribeiro, Luisa Coheur, and Joao P. Carvalho
IEEE
As cyberbullying becomes more and more frequent in social networks, automatically detecting it and pro-actively acting upon it becomes of the utmost importance. In this work, a detailed look at the current state-of-the-art in cyberbullying detection reveals that deep learning techniques have seldom been used to tackle this problem, despite growing reputation in other text-based classification tasks. Motivated by neural networks' documented success, three architectures are implemented from similar works: a simple CNN, a hybrid CNN-LSTM and a mixed CNN-LSTM-DNN. In addition, three text representations are trained from three different sources, via the word2vec model: Google-News, Twitter and Formspring. The experiment shows that these models with one of the above embeddings beat other benchmark classifiers (Support Vector Machines and Logistic Regression) both in an unbalanced and balanced version of the same dataset.
David R. Matos, Miguel L. Pardal, Pedro Adão, António Rito Silva, and Miguel Correia
ACM
Health care institutions gather and store sensitive information from patients with the goal of providing the best care. The medical history of a patient is essential to guarantee that the right diagnosis is achieved and help the clinical staff act in the shortest time possible. This information is highly sensitive and must be kept private for the responsible staff only. At the same time, the medical records should be accessible by any health care institution to ensure that a patient can be attended anywhere. To guarantee data availability, health care institutions rely on data repositories accessible through the internet. This exposes a threat since patient data can be accessed by unauthorized personnel. It is also extremely difficult to manage access to data using standard access control mechanisms due to the vast amount of users, groups and patients and the constant adjustment in privileges that must be done to maintain confidentiality. This paper proposes a solution to the difficulty that is managing user access control to a complex universe of user data and guarantee confidentiality while using cloud computing services to store medical records.
Eugénio Ribeiro, Ricardo Ribeiro, and David Martins de Matos
ISCA
The three-level dialog act annotation scheme of the DIHANA corpus poses a multi-level classification problem in which the bottom levels allow multiple or no labels for a single segment. We approach automatic dialog act recognition on the three levels using an end-to-end approach, in order to implicitly capture relations between them. Our deep neural network classifier uses a combination of wordand character-based segment representation approaches, together with a summary of the dialog history and information concerning speaker changes. We show that it is important to specialize the generic segment representation in order to capture the most relevant information for each level. On the other hand, the summary of the dialog history should combine information from the three levels to capture dependencies between them. Furthermore, the labels generated for each level help in the prediction of those of the lower levels. Overall, we achieve results which surpass those of our previous approach using the hierarchical combination of three independent per-level classifiers. Furthermore, the results even surpass the results achieved on the simplified version of the problem approached by previous studies, which neglected the multi-label nature of the bottom levels and only considered the label combinations present in the corpus.
Anna Pompili, Alberto Abad, David Martins de Matos, and Isabel Pavão Martins
ISCA
Language impairment in Alzheimer’s disease is characterized by a decline in the semantic and pragmatic levels of language processing that manifests since the early stages of the disease. While semantic deficits have been widely investigated using linguistic features, pragmatic deficits are still mostly un-explored. In this work, we present an approach to automatically classify Alzheimer’s disease using a set of pragmatic features extracted from a discourse production task. Following the clinical practice, we consider an image representing a closed domain as a discourse’s elicitation form. Then, we model the elicited speech as a graph that encodes a hierarchy of topics. To do so, the proposed method relies on the integration of various NLP techniques: syntactic parsing for sentence segmentation into clauses, coreference resolution for capturing dependencies among clauses, and word embeddings for identifying semantic relations among topics. According to the experimental results, pragmatic features are able to provide promising results distinguishing individuals with Alzheimer’s disease, comparable to solutions based on other types of linguistic features.
Eugénio Ribeiro, Ricardo Ribeiro, and David Martins de Matos
Springer International Publishing
Dialog act recognition is an important step for dialog systems since it reveals the intention behind the uttered words. Most approaches on the task use word-level tokenization. In contrast, this paper explores the use of character-level tokenization. This is relevant since there is information at the sub-word level that is related to the function of the words and, thus, their intention. We also explore the use of different context windows around each token, which are able to capture important elements, such as affixes. Furthermore, we assess the importance of punctuation and capitalization. We performed experiments on both the Switchboard Dialog Act Corpus and the DIHANA Corpus. In both cases, the experiments not only show that character-level tokenization leads to better performance than the typical word-level approaches, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels.
David R. Matos, Miguel L. Pardal, and Miguel Correia
ACM
Web applications hosted on the cloud are exposed to cyberattacks and can be compromised by HTTP requests that exploit vulnerabilities. Platform as a Service (PaaS) offerings often provide a backup service that allows restoring application state after a serious attack, but all valid state changes since the last backup are lost. We propose Rectify, a new approach to recover from intrusions on applications running in a PaaS. Rectify is a service designed to be deployed alongside the application in a PaaS container. It does not require modifications to the software and the recovery can be performed by a system administrator. Machine learning techniques are used to associate the requests received by the application to the statements issued to the database. Rectify was evaluated using three widely used web applications - Wordpress, LimeSurvey and MediaWiki - and the results show that the effects of malicious requests can be removed whilst preserving the valid application data.