David Martins de Matos

@ulisboa.pt

Department of Computer Science and Engineering
INESC-ID, Instituto Superior Técnico, Universidade de Lisboa



                       

https://researchid.co/david.m.matos
77

Scopus Publications

1147

Scholar Citations

19

Scholar h-index

37

Scholar i10-index

Scopus Publications

  • Automatic Recognition of the General-Purpose Communicative Functions Defined by the ISO 24617-2 Standard for Dialog Act Annotation (Extended Abstract)


  • Chronic Pain Patient Narratives Allow for the Estimation of Current Pain Intensity
    Diogo A.P. Nunes, Joana Ferreira-Gomes, Daniela Oliveira, Carlos Vaz, Sofia Pimenta, Fani Neto, and David Martins de Matos

    IEEE
    We demonstrate a proof-of-concept for the analysis of the language of chronic pain for pain intensity estimation. Importantly, we show that focus on specific words/themes is especially correlated with specific pain intensity categories. We interviewed chronic pain patients and collected demographic and clinical data. 65 patients (40 females), averaging $\\mathbf{56.4} \\pm \\mathbf{12.7}$ years of age, participated in the study. Patients reported their current pain intensity on a Visual Analogue Scale, which we discretized into 3 classes: mild, moderate, and severe pain. We extracted language features from the transcribed interview of each patient and used them to classify their pain intensity category. We measured performance with the weighted $\\mathbf{F}_{\\mathbf{1}}$ score. Finally, we analyzed potential confounding variables for internal validity. The best performing model was the Support Vector Machine with an Early Fusion of select language features, with an $\\mathbf{F}_{\\mathbf{1}}$ of 0.60, improving 39.5% upon the baseline. Patients with mild pain focused more on verbs, whilst moderate and severe pain patients focused on adverbs, and nouns and adjectives, respectively. We show that language features from patient narratives indeed convey information relevant for pain intensity estimation, and that our models can take advantage of that.

  • Learning Low-Dimensional Semantics for Music and Language via Multi-Subject fMRI
    Francisco Afonso Raposo, David Martins de Matos, and Ricardo Ribeiro

    Springer Science and Business Media LLC
    Embodied Cognition (EC) states that semantics is encoded in the brain as firing patterns of neural circuits, which are learned according to the statistical structure of human multimodal experience. However, each human brain is idiosyncratically biased, according to its subjective experience, making this biological semantic machinery noisy with respect to semantics inherent to media, such as music and language. We propose to represent media semantics using low-dimensional vector embeddings by jointly modeling the functional Magnetic Resonance Imaging (fMRI) activity of several brains via Generalized Canonical Correlation Analysis (GCCA). We evaluate the semantic richness of the resulting latent space in appropriate semantic classification tasks: music genres and language topics. We show that the resulting unsupervised representations outperform the original high-dimensional fMRI voxel spaces in these downstream tasks while being more computationally efficient. Furthermore, we show that joint modeling of several subjects increases the semantic richness of the learned latent vector spaces as the number of subjects increases. Quantitative results and corresponding statistical significance testing demonstrate the instantiation of music and language semantics in the brain, thereby providing further evidence for multimodal embodied cognition as well as a method for extraction of media semantics from multi-subject brain dynamics.

  • Automatic Recognition of the General-Purpose Communicative Functions defined by the ISO 24617-2 Standard for Dialog Act Annotation
    Eugénio Ribeiro, Ricardo Ribeiro, and David Martins de Matos

    AI Access Foundation
    From the perspective of a dialog system, it is important to identify the intention behind the segments in a dialog, since it provides an important cue regarding the information that is present in the segments and how they should be interpreted. ISO 24617-2, the standard for dialog act annotation, defines a hierarchically organized set of general-purpose communicative functions which correspond to different intentions that are relevant in the context of a dialog. We explore the automatic recognition of these communicative functions in the DialogBank, which is a reference set of dialogs annotated according to this standard. To do so, we propose adaptations of existing approaches to flat dialog act recognition that allow them to deal with the hierarchical classification problem. More specifically, we propose the use of an end-to-end hierarchical network with cascading outputs and maximum a posteriori path estimation to predict the communicative function at each level of the hierarchy, preserve the dependencies between the functions in the path, and decide at which level to stop. Furthermore, since the amount of dialogs in the DialogBank is small, we rely on transfer learning processes to reduce overfitting and improve performance. The results of our experiments show that our approach outperforms both a flat one and hierarchical approaches based on multiple classifiers and that each of its components plays an important role towards the recognition of general-purpose communicative functions.

  • Assessing kinetic meaning of music and dance via deep cross-modal retrieval
    Francisco Afonso Raposo, David Martins de Matos, and Ricardo Ribeiro

    Springer Science and Business Media LLC
    Music semantics is embodied, in the sense that meaning is biologically mediated by and grounded in the human body and brain. This embodied cognition perspective also explains why music structures modulate kinetic and somatosensory perception. We explore this aspect of cognition, by considering dance as an overt expression of semantic aspects of music related to motor intention, in an artificial deep recurrent neural network that learns correlations between music audio and dance video. We claim that, just like human semantic cognition is based on multimodal statistical structures, joint statistical modeling of music and dance artifacts is expected to capture semantics of these modalities. We evaluate the ability of this model to effectively capture underlying semantics in a cross-modal retrieval task, including dance styles in an unsupervised fashion. Quantitative results, validated with statistical significance testing, strengthen the body of evidence for embodied cognition in music and demonstrate the model can recommend music audio for dance video queries and vice versa.

  • MIRES: Recovering mobile applications based on backend-as-a-service from cyber attacks
    Diogo Vaz, David Matos, Miguel Pardal, and Miguel Correia

    ACM
    Many popular mobile applications rely on the Backend-as-a-Service (BaaS) cloud computing model to simplify the development and management of services like data storage, user authentication and notifications. However, vulnerabilities and other issues may lead to malicious operations on the mobile application client-side and malicious requests being sent to the backend, corrupting the state of the application in the cloud. To deal with these attacks after they happen and are successful, it is necessary to remove the immediate effects created by the malicious requests and subsequent effects derived from later requests. In this paper, we present MIRES, an intrusion recovery service for mobile applications based on BaaS. MIRES uses a two-phase recovery process that restores the integrity of the mobile application and minimizes its unavailability. We implemented MIRES in Android and with the Firebase platform and made experiments with 3 mobile applications that showed results of 1000 operations reverted in less than 1 minute and with the mobile application inaccessible only for less than 15 seconds.

  • Semantic frame induction through the detection of communities of verbs and their arguments
    Eugénio Ribeiro, Andreia Sofia Teixeira, Ricardo Ribeiro, and David Martins de Matos

    Springer Science and Business Media LLC
    AbstractResources such as FrameNet, which provide sets of semantic frame definitions and annotated textual data that maps into the evoked frames, are important for several NLP tasks. However, they are expensive to build and, consequently, are unavailable for many languages and domains. Thus, approaches able to induce semantic frames in an unsupervised manner are highly valuable. In this paper we approach that task from a network perspective as a community detection problem that targets the identification of groups of verb instances that evoke the same semantic frame and verb arguments that play the same semantic role. To do so, we apply a graph-clustering algorithm to a graph with contextualized representations of verb instances or arguments as nodes connected by edges if the distance between them is below a threshold that defines the granularity of the induced frames. By applying this approach to the benchmark dataset defined in the context of SemEval 2019, we outperformed all of the previous approaches to the task, achieving the current state-of-the-art performance.

  • Pragmatic Aspects of Discourse Production for the Automatic Identification of Alzheimer's Disease
    Anna Pompili, Alberto Abad, David Martins de Matos, and Isabel Pavao Martins

    Institute of Electrical and Electronics Engineers (IEEE)
    Clinical literature provides convincing evidence that language deficits in Alzheimer's disease (AD) allow for distinguishing patients with dementia from healthy subjects. Currently, computational approaches have widely investigated lexicosemantic aspects of discourse production, while pragmatic aspects like cohesion and coherence, are still mostly unexplored. In this article, we aim at providing a more comprehensive characterization of language abilities for the automatic identification of AD in narrative description tasks by also incorporating pragmatic aspects of speech production. To this end, we investigate the relevance of a recently proposed set of pragmatic features extracted from an automatically generated topic hierarchy graph in combination with a complementary set of state-of-the-art features encoding lexical, syntactic and semantic cues. Experimental results on the DementiaBank corpus show an accuracy improvement from 82.6% to 85.5% in identifying AD patients when pragmatic features are incorporated to the set of lexicosemantic features. Nevertheless, these results are obtained relying on manual transcriptions, which strongly limits the applicability of computational analysis to clinical settings. Thus, in this work we additionally carry out an analysis of the errors introduced by a speech recognition system and the way in which they impact the performance of the proposed method. In spite of the high word error rates obtained on these data (∼40%), automatic AD identification accuracy decreased only to 79.7%, which is considered a remarkable result when compared with solutions based on manual transcriptions.

  • Mapping the dialog act annotations of the LEGO corpus into ISO 24617-2 communicative functions


  • MultiTLS: Secure Communication Channels with Cipher Suite Diversity
    Ricardo Moura, David R. Matos, Miguel L. Pardal, and Miguel Correia

    Springer International Publishing
    TLS ensures confidentiality, integrity, and authenticity of communications. However, design, implementation, and cryptographic vulnerabilities can make TLS communication channels insecure. We need mechanisms that allow the channels to be kept secure even when a new vulnerability is discovered. We present MultiTLS, a middleware based on diversity and tunneling mechanisms that allows keeping communication channels secure even when new vulnerabilities are discovered. MultiTLS creates a secure communication channel through the encapsulation of k TLS channels, where each one uses a different cipher suite. We evaluated the performance of MultiTLS and concluded that it has the advantage of being easy to use and maintain since it does not modify any of its dependencies.

  • Semantic Frame Induction as a Community Detection Problem
    Eugénio Ribeiro, Andreia Sofia Teixeira, Ricardo Ribeiro, and David Martins de Matos

    Springer International Publishing
    Resources such as FrameNet provide semantic information that is important for multiple tasks. However, they are expensive to build and, consequently, are unavailable for many languages and domains. Thus, approaches able to induce semantic frames in an unsupervised manner are highly valuable. In this paper we approach that task from a network perspective as a community detection problem that targets the identification of groups of verb instances that evoke the same semantic frame. To do so, we apply a graph-clustering algorithm to a graph with contextualized representations of verb instances as nodes connected by an edge if the distance between them is below a threshold that defines the granularity of the induced frames. By applying this approach to the benchmark dataset defined in the context of the SemEval shared task we outperformed all the previous approaches to the task.

  • Deep dialog act recognition using multiple token, segment, and context information representations
    Eugénio Ribeiro, Ricardo Ribeiro, and David Martins de Matos

    AI Access Foundation
    Automatic dialog act recognition is a task that has been widely explored over the years. In recent works, most approaches to the task explored different deep neural network architectures to combine the representations of the words in a segment and generate a segment representation that provides cues for intention. In this study, we explore means to generate more informative segment representations, not only by exploring different network architectures, but also by considering different token representations, not only at the word level, but also at the character and functional levels. At the word level, in addition to the commonly used uncontextualized embeddings, we explore the use of contextualized representations, which are able to provide information concerning word sense and segment structure. Character-level tokenization is important to capture intention-related morphological aspects that cannot be captured at the word level. Finally, the functional level provides an abstraction from words, which shifts the focus to the structure of the segment. Additionally, we explore approaches to enrich the segment representation with context information from the history of the dialog, both in terms of the classifications of the surrounding segments and the turn-taking history. This kind of information has already been proved important for the disambiguation of dialog acts in previous studies. Nevertheless, we are able to capture additional information by considering a summary of the dialog history and a wider turn-taking context. By combining the best approaches at each step, we achieve performance results that surpass the previous state-of-the-art on generic dialog act recognition on both the Switchboard Dialog Act Corpus (SwDA) and the ICSI Meeting Recorder Dialog Act Corpus (MRDA), which are two of the most widely explored corpora for the task. Furthermore, by considering both past and future context, similarly to what happens in an annotation scenario, our approach achieves a performance similar to that of a human annotator on SwDA and surpasses it on MRDA.

  • Learning Multimodal Representations for Sample-efficient Recognition of Human Actions
    Miguel Vasco, Francisco S. Melo, David Martins de Matos, Ana Paiva, and Tetsunari Inamura

    IEEE
    Humans interact in rich and diverse ways with the environment. However, the representation of such behavior by artificial agents is often limited. In this work we present motion concepts, a novel multimodal representation of human actions in a household environment. A motion concept encompasses a probabilistic description of the kinematics of the action along with its contextual background, namely the location and the objects held during the performance. We introduce a novel algorithm which learns and recognizes motion concepts from action demonstrations, named Online Motion Concept Learning (OMCL). The algorithm is evaluated on a virtual-reality household environment with the presence of a human avatar. OMCL outperforms standard motion recognition algorithms on an one-shot recognition task, attesting to its potential for sample-efficient recognition of human actions.

  • An information-theoretic approach to machine-oriented music summarization
    Francisco Afonso Raposo, David Martins de Matos, and Ricardo Ribeiro

    Elsevier BV
    Music summarization allows for higher efficiency in processing, storage, and sharing of datasets. Machine-oriented approaches, being agnostic to human consumption, optimize these aspects even further. Such summaries have already been successfully validated in some MIR tasks. We now generalize previous conclusions by evaluating the impact of generic summarization of music from a probabilistic perspective. We estimate Gaussian distributions for original and summarized songs and compute their relative entropy, in order to measure information loss incurred by summarization. Our results suggest that relative entropy is a good predictor of summarization performance in the context of tasks relying on a bag-of-features model. Based on this observation, we further propose a straightforward yet expressive summarizer, which minimizes relative entropy with respect to the original song, that objectively outperforms previous methods and is better suited to avoid potential copyright issues.

  • Online motion concept learning: A novel algorithm for sample-efficient learning and recognition of human actions


  • L2F/INESC-ID at SemEval-2019 task 2: Unsupervised lexical semantic frame induction using contextualized word representations


  • Hierarchical multi-label dialog act recognition on Spanish data
    Eugénio Ribeiro, Ricardo Ribeiro, and David Martins de Matos

    University of Minho
    Os actos de diálogo revelam a intenção por trás das palavras pronunciadas. Por isso, o seu reconhecimento automático é importante para um sistema de diálogo que tenta entender o seu interlocutor. O estudo apresentado neste artigo aborda essa tarefa no corpus DIHANA, cujo esquema de anotação de actos de diálogo em três níveis coloca problemas que não foram explorados em estudos recentes. Além do problema hierárquico, os dois níveis inferiores colocam problemas de classificação multi-etiqueta. Além disso, cada nível da hierarquia refere-se a um aspecto diferente relativo à intenção do orador, tanto em termos da estrutura do diálogo, como da tarefa. Por outro lado, uma vez que os diálogos são em espanhol, este corpus permite-nos avaliar se as melhores abordagens para dados em inglês generalizam para uma língua diferente. Mais especificamente, comparamos o desempenho de diferentes abordagens de representação de segmentos, com foco tanto em sequências como em padrões de palavras, e avaliamos a importância do histórico do diálogo e das relações entre os múltiplos níveis da hierarquia. No que diz respeito ao problema de classificação de etiqueta única colocado pelo nível superior, mostramos que as conclusões obtidas a partir de dados em inglês se mantêm em dados em espanhol. Para além disso, mostramos que as abordagens podem ser adaptadas para cenários multi-etiqueta. Por fim, combinando hierarquicamente os melhores classificadores para cada nível, obtemos os melhores resultados reportados para este corpus.

  • A multilingual and multidomain study on dialog act recognition using character-level tokenization
    Eugénio Ribeiro, Ricardo Ribeiro, and David de Matos

    MDPI AG
    Automatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention. Thus, in this study, we explored the use of character-level tokenization to capture that information. We explored the use of multiple character windows of different sizes to capture morphological aspects, such as affixes and lemmas, as well as inter-word information. Furthermore, we assessed the importance of punctuation and capitalization for the task. To broaden the conclusions of our study, we performed experiments on dialogs in three languages—English, Spanish, and German—which have different morphological characteristics. Furthermore, the dialogs cover multiple domains and are annotated with both domain-dependent and domain-independent dialog act labels. The achieved results not only show that the character-level approach leads to similar or better performance than the state-of-the-art word-level approaches on the task, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels.

  • RockfS: Cloud-backed file system resilience to client-side attacks
    David R. Matos, Miguel L. Pardal, Georg Carle, and Miguel Correia

    ACM

  • A 'Deeper' Look at Detecting Cyberbullying in Social Networks
    Hugo Rosa, David Matos, Ricardo Ribeiro, Luisa Coheur, and Joao P. Carvalho

    IEEE
    As cyberbullying becomes more and more frequent in social networks, automatically detecting it and pro-actively acting upon it becomes of the utmost importance. In this work, a detailed look at the current state-of-the-art in cyberbullying detection reveals that deep learning techniques have seldom been used to tackle this problem, despite growing reputation in other text-based classification tasks. Motivated by neural networks' documented success, three architectures are implemented from similar works: a simple CNN, a hybrid CNN-LSTM and a mixed CNN-LSTM-DNN. In addition, three text representations are trained from three different sources, via the word2vec model: Google-News, Twitter and Formspring. The experiment shows that these models with one of the above embeddings beat other benchmark classifiers (Support Vector Machines and Logistic Regression) both in an unbalanced and balanced version of the same dataset.

  • Securing electronic health records in the cloud
    David R. Matos, Miguel L. Pardal, Pedro Adão, António Rito Silva, and Miguel Correia

    ACM
    Health care institutions gather and store sensitive information from patients with the goal of providing the best care. The medical history of a patient is essential to guarantee that the right diagnosis is achieved and help the clinical staff act in the shortest time possible. This information is highly sensitive and must be kept private for the responsible staff only. At the same time, the medical records should be accessible by any health care institution to ensure that a patient can be attended anywhere. To guarantee data availability, health care institutions rely on data repositories accessible through the internet. This exposes a threat since patient data can be accessed by unauthorized personnel. It is also extremely difficult to manage access to data using standard access control mechanisms due to the vast amount of users, groups and patients and the constant adjustment in privileges that must be done to maintain confidentiality. This paper proposes a solution to the difficulty that is managing user access control to a complex universe of user data and guarantee confidentiality while using cloud computing services to store medical records.

  • End-to-End Multi-Level Dialog Act Recognition
    Eugénio Ribeiro, Ricardo Ribeiro, and David Martins de Matos

    ISCA
    The three-level dialog act annotation scheme of the DIHANA corpus poses a multi-level classification problem in which the bottom levels allow multiple or no labels for a single segment. We approach automatic dialog act recognition on the three levels using an end-to-end approach, in order to implicitly capture relations between them. Our deep neural network classifier uses a combination of wordand character-based segment representation approaches, together with a summary of the dialog history and information concerning speaker changes. We show that it is important to specialize the generic segment representation in order to capture the most relevant information for each level. On the other hand, the summary of the dialog history should combine information from the three levels to capture dependencies between them. Furthermore, the labels generated for each level help in the prediction of those of the lower levels. Overall, we achieve results which surpass those of our previous approach using the hierarchical combination of three independent per-level classifiers. Furthermore, the results even surpass the results achieved on the simplified version of the problem approached by previous studies, which neglected the multi-label nature of the bottom levels and only considered the label combinations present in the corpus.

  • Topic coherence analysis for the classification of Alzheimer's disease
    Anna Pompili, Alberto Abad, David Martins de Matos, and Isabel Pavão Martins

    ISCA
    Language impairment in Alzheimer’s disease is characterized by a decline in the semantic and pragmatic levels of language processing that manifests since the early stages of the disease. While semantic deficits have been widely investigated using linguistic features, pragmatic deficits are still mostly un-explored. In this work, we present an approach to automatically classify Alzheimer’s disease using a set of pragmatic features extracted from a discourse production task. Following the clinical practice, we consider an image representing a closed domain as a discourse’s elicitation form. Then, we model the elicited speech as a graph that encodes a hierarchy of topics. To do so, the proposed method relies on the integration of various NLP techniques: syntactic parsing for sentence segmentation into clauses, coreference resolution for capturing dependencies among clauses, and word embeddings for identifying semantic relations among topics. According to the experimental results, pragmatic features are able to provide promising results distinguishing individuals with Alzheimer’s disease, comparable to solutions based on other types of linguistic features.

  • A study on dialog act recognition using character-level tokenization
    Eugénio Ribeiro, Ricardo Ribeiro, and David Martins de Matos

    Springer International Publishing
    Dialog act recognition is an important step for dialog systems since it reveals the intention behind the uttered words. Most approaches on the task use word-level tokenization. In contrast, this paper explores the use of character-level tokenization. This is relevant since there is information at the sub-word level that is related to the function of the words and, thus, their intention. We also explore the use of different context windows around each token, which are able to capture important elements, such as affixes. Furthermore, we assess the importance of punctuation and capitalization. We performed experiments on both the Switchboard Dialog Act Corpus and the DIHANA Corpus. In both cases, the experiments not only show that character-level tokenization leads to better performance than the typical word-level approaches, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels.

  • Rectify: Black-box intrusion recovery in paas clouds
    David R. Matos, Miguel L. Pardal, and Miguel Correia

    ACM
    Web applications hosted on the cloud are exposed to cyberattacks and can be compromised by HTTP requests that exploit vulnerabilities. Platform as a Service (PaaS) offerings often provide a backup service that allows restoring application state after a serious attack, but all valid state changes since the last backup are lost. We propose Rectify, a new approach to recover from intrusions on applications running in a PaaS. Rectify is a service designed to be deployed alongside the application in a PaaS container. It does not require modifications to the software and the recovery can be performed by a system administrator. Machine learning techniques are used to associate the requests received by the application to the statements issued to the database. Rectify was evaluated using three widely used web applications - Wordpress, LimeSurvey and MediaWiki - and the results show that the effects of malicious requests can be removed whilst preserving the valid application data.

RECENT SCHOLAR PUBLICATIONS

  • Chronic pain patient narratives allow for the estimation of current pain intensity
    DAP Nunes, J Ferreira-Gomes, D Oliveira, C Vaz, S Pimenta, F Neto, ...
    2023 IEEE 36th International Symposium on Computer-Based Medical Systems 2023

  • Modeling chronic pain experiences from online reports using the Reddit Reports of Chronic Pain dataset
    DAP Nunes, J Ferreira-Gomes, F Neto, D Martins de Matos
    Information 14 (4), 237 2023

  • Transfer-learning for video classification: Video Swin Transformer on multiple domains
    D Oliveira, DM de Matos
    arXiv preprint arXiv:2210.09969 2022

  • Learning low-dimensional semantics for music and language via multi-subject fMRI
    FA Raposo, D Martins de Matos, R Ribeiro
    Neuroinformatics 20 (2), 451-461 2022

  • Towards Learning Through Open-Domain Dialog
    E Ribeiro, R Ribeiro, DM de Matos
    arXiv preprint arXiv:2202.03040 2022

  • Automatic recognition of the general-purpose communicative functions defined by the ISO 24617-2 standard for dialog act annotation
    E Ribeiro, R Ribeiro, DM De Matos
    Journal of Artificial Intelligence Research 73, 397–436-397–436 2022

  • Active Learning Improves the Teacher’s Experience: A Case Study in a Language Grounding Scenario
    F Reynaud, E Ribeiro, DM de Matos
    Proc. IberSPEECH 2022, 141-145 2022

  • Assessing kinetic meaning of music and dance via deep cross-modal retrieval
    FA Raposo, D Martins de Matos, R Ribeiro
    Neural Computing and Applications 33 (21), 14481-14493 2021

  • Chronic pain and language: A topic modelling approach to personal pain descriptions
    DAP Nunes, JF Gomes, F Neto, DM de Matos
    arXiv preprint arXiv:2109.00402 2021

  • Analysis of chronic pain descriptions for base-pathology prediction: the case of rheumatoid arthritis versus spondylitis pathology prediction based on pain descriptions Anlise
    DAP Nunes, J Ferreira-Gomes, C Vaz, D Oliveira, S Pimenta, F Neto, ...
    PERMANYER PORTUGAL, 78 2021

  • Analysis of Chronic Pain Experiences Based on Online Reports: the RRCP Dataset for quality-of-life assessment
    DAP Nunes, D Martins de Matos, J Ferreira-Gomes, F Neto
    arXiv preprint arXiv:2108.10218 2021

  • Analysis of Chronic Pain Experiences Based on Online Reports: the RRCP Dataset.
    DAP Nunes, DM de Matos, J Ferreira-Gomes, F Neto
    CoRR 2021

  • Semantic frame induction through the detection of communities of verbs and their arguments
    E Ribeiro, AS Teixeira, R Ribeiro, D Martins de Matos
    Applied Network Science 5, 1-32 2020

  • Pragmatic aspects of discourse production for the automatic identification of Alzheimer's disease
    A Pompili, A Abad, DM de Matos, IP Martins
    IEEE Journal of Selected Topics in Signal Processing 14 (2), 261-271 2020

  • Mapping the dialog act annotations of the LEGO corpus into ISO 24617-2 communicative functions
    E Ribeiro, R Ribeiro, DD Matos
    Mapping the dialog act annotations of the LEGO corpus into ISO 24617-2 2020

  • General-Purpose Communicative Function Recognition using a Hierarchical Network with Cascading Outputs and Maximum a Posteriori Path Estimation.
    E Ribeiro, R Ribeiro, DM de Matos
    CoRR 2020

  • Semantic frame induction as a community detection problem
    E Ribeiro, AS Teixeira, R Ribeiro, D Martins de Matos
    Complex Networks and Their Applications VIII: Volume 1 Proceedings of the 2020

  • Deep dialog act recognition using multiple token, segment, and context information representations
    E Ribeiro, R Ribeiro, DM de Matos
    Journal of Artificial Intelligence Research 66, 861-899 2019

  • Learning multimodal representations for sample-efficient recognition of human actions
    M Vasco, FS Melo, DM de Matos, A Paiva, T Inamura
    2019 IEEE/RSJ International Conference on Intelligent Robots and Systems 2019

  • Hierarchical Multi-Label Dialog Act Recognition on Spanish Data
    E Ribeiro, R Ribeiro, DM de Matos
    arXiv preprint arXiv:1907.12316 2019

MOST CITED SCHOLAR PUBLICATIONS

  • Automatic keyword extraction on twitter
    L Marujo, W Ling, I Trancoso, C Dyer, AW Black, A Gershman, ...
    Proceedings of the 53rd Annual Meeting of the Association for Computational 2015
    Citations: 75

  • A “deeper” look at detecting cyberbullying in social networks
    H Rosa, D Matos, R Ribeiro, L Coheur, JP Carvalho
    2018 international joint conference on neural networks (IJCNN), 1-8 2018
    Citations: 74

  • Improving a hybrid literary book recommendation system through author ranking
    PC Vaz, D Martins de Matos, B Martins, P Calado
    Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries 2012
    Citations: 63

  • Exploring events and distributed representations of text in multi-document summarization
    L Marujo, W Ling, R Ribeiro, A Gershman, J Carbonell, DM de Matos, ...
    Knowledge-Based Systems 94, 33-42 2016
    Citations: 49

  • The influence of context on dialogue act recognition
    E Ribeiro, R Ribeiro, DM de Matos
    arXiv preprint arXiv:1506.00839 2015
    Citations: 37

  • Centrality-as-relevance: support sets and similarity as geometric proximity
    R Ribeiro, DM de Matos
    Journal of Artificial Intelligence Research 42, 275-308 2011
    Citations: 35

  • Multicore SIMD ASIP for next-generation sequencing and alignment biochip platforms
    N Neves, N Sebastio, D Matos, P Toms, P Flores, N Roma
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems 23 (7 2014
    Citations: 33

  • Summarization of films and documentaries based on subtitles and scripts
    M Aparcio, P Figueiredo, F Raposo, DM de Matos, R Ribeiro, L Marujo
    Pattern Recognition Letters 73, 7-12 2016
    Citations: 29

  • Using generic summarization to improve music information retrieval tasks
    F Raposo, R Ribeiro, DM de Matos
    IEEE/ACM Transactions on Audio, Speech, and Language Processing 24 (6), 1119 2016
    Citations: 28

  • Influence of Peak Selection Methods on Onset Detection.
    C Rosao, R Ribeiro, DM De Matos
    ismir, 517-522 2012
    Citations: 28

  • Fairy Tale Corpus Organization Using Latent Semantic Mapping and an Item-to-item Top-n Recommendation Algorithm.
    PV Lobo, DM De Matos
    LREC 10, 1472-1475 2010
    Citations: 28

  • Pragmatic aspects of discourse production for the automatic identification of Alzheimer's disease
    A Pompili, A Abad, DM de Matos, IP Martins
    IEEE Journal of Selected Topics in Signal Processing 14 (2), 261-271 2020
    Citations: 23

  • Understanding temporal dynamics of ratings in the book recommendation scenario
    PC Vaz, R Ribeiro, DM de Matos
    Proceedings of the 2013 international conference on information systems and 2013
    Citations: 23

  • Document retrieval for question answering: a quantitative evaluation of text preprocessing
    G Carvalho, DM de Matos, V Rocio
    Proceedings of the ACM first Ph. D. workshop in CIKM, 125-130 2007
    Citations: 22

  • Stylometric relevance-feedback towards a hybrid book recommendation algorithm
    PC Vaz, D Martins de Matos, B Martins
    Proceedings of the fifth ACM workshop on Research advances in large digital 2012
    Citations: 21

  • Extractive summarization of broadcast news: Comparing strategies for european portuguese
    R Ribeiro, DM de Matos
    International Conference on Text, Speech and Dialogue, 115-122 2007
    Citations: 21

  • Event-based summarization using a centrality-as-relevance model
    L Marujo, R Ribeiro, A Gershman, DM de Matos, JP Neto, J Carbonell
    Knowledge and Information Systems 50, 945-968 2017
    Citations: 20

  • A library for implementing the multiple hypothesis tracking algorithm
    DM Antunes, DM de Matos, J Gaspar
    arXiv preprint arXiv:1106.2263 2011
    Citations: 20

  • Extending a single-document summarizer to multi-document: a hierarchical approach
    L Marujo, R Ribeiro, DM de Matos, JP Neto, A Gershman, J Carbonell
    arXiv preprint arXiv:1507.02907 2015
    Citations: 19

  • Key phrase extraction of lightly filtered broadcast news
    L Marujo, R Ribeiro, DM de Matos, JP Neto, A Gershman, J Carbonell
    Text, Speech and Dialogue: 15th International Conference, TSD 2012, Brno 2012
    Citations: 19