Arnaldo Candido Junior

@.unesp.br

Institute of Biosciences, Humanities and Exact Sciences; Computing and Statistics Department
São Paulo State University (UNESP)



              

https://researchid.co/arnaldocan

RESEARCH, TEACHING, or OTHER INTERESTS

Computer Science, Artificial Intelligence

4

Scopus Publications

1064

Scholar Citations

16

Scholar h-index

18

Scholar i10-index

Scopus Publications

  • CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese
    Arnaldo Candido Junior, Edresson Casanova, Anderson Soares, Frederico Santos de Oliveira, Lucas Oliveira, Ricardo Corso Fernandes Junior, Daniel Peixoto Pinto da Silva, Fernando Gorgulho Fayet, Bruno Baldissera Carlotto, Lucas Rafael Stefanel Gris,et al.

    Springer Science and Business Media LLC
    AbstractAutomatic Speech recognition (ASR) is a complex and challenging task. In recent years, there have been significant advances in the area. In particular, for the Brazilian Portuguese (BP) language, there were around 376 h publicly available for the ASR task until the second half of 2020. With the release of new datasets in early 2021, this number increased to 574 h. The existing resources, however, are composed of audios containing only read and prepared speech. There is a lack of datasets including spontaneous speech, which are essential in several ASR applications. This paper presents CORAA (Corpus of Annotated Audios) ASR with 290 h, a publicly available dataset for ASR in BP containing validated pairs of audio-transcription. CORAA ASR also contains European Portuguese audios (4.6 h). We also present a public ASR model based on Wav2Vec 2.0 XLSR-53, fine-tuned over CORAA ASR. Our model achieved a Word Error Rate (WER) of 24.18% on CORAA ASR test set and 20.08% on Common Voice test set. When measuring the Character Error Rate (CER), we obtained 11.02% and 6.34% for CORAA ASR and Common Voice, respectively. CORAA ASR corpora were assembled to both improve ASR models in BP with phenomena from spontaneous speech and motivate young researchers to start their studies on ASR for Portuguese. All the corpora are publicly available at https://github.com/nilc-nlp/CORAA under the CC BY-NC-ND 4.0 license.

  • Brazilian Portuguese Speech Recognition Using Wav2vec 2.0
    Lucas Rafael Stefanel Gris, Edresson Casanova, Frederico Santos de Oliveira, Anderson da Silva Soares, and Arnaldo Candido Junior

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Springer International Publishing

  • Leaf-Based Species Recognition Using Convolutional Neural Networks
    Willian Oliveira Pires, Ricardo Corso Fernandes, Pedro Luiz de Paula Filho, Arnaldo Candido Junior, and João Paulo Teixeira

    Springer International Publishing

  • Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models
    Edresson Casanova, Arnaldo Candido Junior, Christopher Shulby, Frederico Santos de Oliveira, Lucas Rafael Stefanel Gris, Hamilton Pereira da Silva, Sandra Maria Aluísio, and Moacir Antonelli Ponti

    Springer International Publishing

RECENT SCHOLAR PUBLICATIONS

  • Portal NURC-SP: Design, Development, and Speech Processing Corpora Resources to Support the Public Dissemination of Portuguese Spoken Language
    AC Rodrigues, AA Macedo, A Candido Jr, FRF Svartman, GM Craveiro, ...
    Proceedings of the 16th International Conference on Computational Processing 2024

  • Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task
    GL Garcia, PH Paiola, LH Morelli, G Candido, AC Jnior, DS Jodas, ...
    arXiv preprint arXiv:2401.02909 2024

  • Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task
    G Lino Garcia, PH Paiola, LH Morelli, G Candido, A Cndido Jnior, ...
    arXiv e-prints, arXiv: 2401.02909 2024

  • Accent Classification is Challenging but Pre-training Helps: a case study with novel Brazilian Portuguese datasets
    AN Matos, GE Arajo, A Candido Junior, MA Ponti
    Proceedings 2024

  • Yin Yang Convolutional Nets: Image Manifold Extraction by the Analysis of Opposites
    AS da Rosa, FS de Oliveira, AS Soares, AC Junior
    arXiv preprint arXiv:2310.16148 2023

  • Yin Yang Convolutional Nets: Image Manifold Extraction by the Analysis of Opposites
    A Seben da Rosa, F Santos de Oliveira, A da Silva Soares, AC Junior
    arXiv e-prints, arXiv: 2310.16148 2023

  • CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese
    A Candido Junior, E Casanova, A Soares, FS de Oliveira, L Oliveira, ...
    Language Resources and Evaluation 57 (3), 1139-1171 2023

  • CML-TTS: A Multilingual Dataset for Speech Synthesis in Low-Resource Languages
    FS Oliveira, E Casanova, AC Junior, AS Soares, AR Galvo Filho
    International Conference on Text, Speech, and Dialogue, 188-199 2023

  • Evaluation of Speech Representations for MOS prediction
    F S. Oliveira, E Casanova, AC Junior, L RS Gris, A S. Soares, ...
    International Conference on Text, Speech, and Dialogue, 270-282 2023

  • Discriminant Audio Properties in Deep Learning Based Respiratory Insufficiency Detection in Brazilian Portuguese
    MM Gauy, LC Berti, A Cndido Jr, AC Neto, A Goldman, ASS Levin, ...
    International Conference on Artificial Intelligence in Medicine, 271-275 2023

  • Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person
    LRS Gris, R Marcacini, AC Junior, E Casanova, A Soares, SM Alusio
    arXiv preprint arXiv:2305.14580 2023

  • Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person
    LR Stefanel Gris, R Marcacini, AC Junior, E Casanova, A Soares, ...
    arXiv e-prints, arXiv: 2305.14580 2023

  • Determination of harmonic parameters in pathological voices—efficient algorithm
    JFT Fernandes, D Freitas, AC Junior, JP Teixeira
    Applied Sciences 13 (4), 2333 2023

  • Recursos para o processamento de fala
    E Casanova, VG Santos, FRF Svartman, MQ Leite, A Candido Junior, ...
    Processamento de linguagem natural: conceitos, tcnicas e aplicaes em 2023

  • Captulo 3 Recursos para o processamento de fala
    E Casanova, VG Santos, FRF Svartman, MQ Leite, A Candido Jr, ...
    2023

  • Interpretability analysis of deep models for COVID-19 detection
    DPP da Silva, E Casanova, LRS Gris, AC Junior, M Finger, F Svartman, ...
    arXiv preprint arXiv:2211.14372 2022

  • Estimativa do Coeficiente de Uniformidade de Microaspersores por Meio da Aplicaao de Tcnicas de Redes Neurais Artificiais
    ES dos Santos, AC Junior, PL de Menezes
    Anais do XIX Congresso Latino-Americano de Software Livre e Tecnologias 2022

  • Interpretability Analysis of Deep Models for COVID-19 Detection
    D Peixoto Pinto da Silva, E Casanova, LR Stefanel Gris, AC Junior, ...
    arXiv e-prints, arXiv: 2211.14372 2022

  • Ptl-ai furnas dataset: A public dataset for fault detection in power transmission lines using aerial images
    FS De Oliveira, M De Carvalho, PHT Campos, ADS Soares, AC Jnior, ...
    2022 35th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) 1 2022

  • Smart data driven system for pathological voices classification
    J Fernandes, AC Junior, D Freitas, JP Teixeira
    International Conference on Optimization, Learning Algorithms and 2022

MOST CITED SCHOLAR PUBLICATIONS

  • Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone
    E Casanova, J Weber, CD Shulby, AC Junior, E Glge, MA Ponti
    International Conference on Machine Learning, 2709-2720 2022
    Citations: 203

  • Facilita: reading assistance for low-literacy readers
    WM Watanabe, AC Junior, VR Uzda, RPM Fortes, TAS Pardo, ...
    Proceedings of the 27th ACM international conference on Design of 2009
    Citations: 146

  • Supporting the adaptation of texts for poor literacy readers: a text simplification editor for brazilian portuguese
    A Candido Jr, EG Maziero, L Specia, C Gasperin, T Pardo, S Aluisio
    Proceedings of the Fourth Workshop on Innovative Use of NLP for Building 2009
    Citations: 100

  • SC-GlowTTS: An efficient zero-shot multi-speaker text-to-speech model
    E Casanova, C Shulby, E Glge, NM Mller, FS De Oliveira, AC Junior, ...
    arXiv preprint arXiv:2104.05557 2021
    Citations: 70

  • Harmonic to noise ratio measurement-selection of window and length
    J Fernandes, F Teixeira, V Guedes, A Junior, JP Teixeira
    Procedia computer science 138, 280-285 2018
    Citations: 61

  • Rhetorical Move Detection in English Abstracts: Multi-label Sentence Classifiers and their Annotated Corpora.
    C Dayrell, A Candido Jr, G Lima, D Machado Jr, AA Copestake, ...
    LREC, 1604-1609 2012
    Citations: 40

  • SIMPLIFICA: a tool for authoring simplified texts in Brazilian Portuguese guided by readability assessments
    C Scarton, M Oliveira, A Candido Jr, C Gasperin, S Alusio
    Proceedings of the NAACL HLT 2010 Demonstration Session, 41-44 2010
    Citations: 40

  • Automatic detection and classification of honey bee comb cells using deep learning
    TS Alves, MA Pinto, P Ventura, CJ Neves, DG Biron, AC Junior, ...
    Computers and Electronics in Agriculture 170, 105244 2020
    Citations: 38

  • Adapting web content for low-literacy readers by using lexical elaboration and named entities labeling
    WM Watanabe, A Candido Jr, MA Amncio, M De Oliveira, TAS Pardo, ...
    Proceedings of the 2010 international cross disciplinary conference on web 2010
    Citations: 38

  • Automatic detection of spelling variation in historical corpus: An application to build a Brazilian Portuguese spelling variants dictionary
    R Giusti, A Candido Jr, M Muniz, L Cucatto, SM Alusio
    Proceedings of the Corpus Linguistics Conference, 1-20 2007
    Citations: 37

  • Transfer learning with audioset to voice pathologies identification in continuous speech
    V Guedes, F Teixeira, A Oliveira, J Fernandes, L Silva, A Junior, ...
    Procedia Computer Science 164, 662-669 2019
    Citations: 32

  • TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
    E Casanova, AC Junior, C Shulby, FS Oliveira, JP Teixeira, MA Ponti, ...
    Language Resources and Evaluation 56 (3), 1043-1055 2022
    Citations: 24

  • Transfer Learning and Data Augmentation Techniques to the COVID-19 Identification Tasks in ComParE 2021.
    E Casanova, A Candido Jr, RCF Junior, M Finger, LRS Gris, MA Ponti, ...
    Interspeech, 446-450 2021
    Citations: 20

  • Reducing efforts of software engineering systematic literature reviews updates using text classification
    WM Watanabe, KR Felizardo, A Candido Jr, F de Souza, ...
    Information and Software Technology 128, 106395 2020
    Citations: 19

  • Long short term memory on chronic laryngitis classification
    V Guedes, A Junior, J Fernandes, F Teixeira, JP Teixeira
    Procedia Computer Science 138, 250-257 2018
    Citations: 19

  • Classification of control/pathologic subjects with support vector machines
    F Teixeira, J Fernandes, V Guedes, A Junior, JP Teixeira
    Procedia computer science 138, 272-279 2018
    Citations: 17

  • Coraa: a large corpus of spontaneous and prepared speech manually validated for speech recognition in brazilian portuguese
    AC Junior, E Casanova, A Soares, FS de Oliveira, L Oliveira, RCF Junior, ...
    arXiv preprint arXiv:2110.15731 2021
    Citations: 13

  • Deep learning against COVID-19: respiratory insufficiency detection in Brazilian Portuguese speech
    E Casanova, L Gris, A Camargo, D da Silva, M Gazzola, E Sabino, ...
    Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
    Citations: 12

  • RP d
    WM Watanabe, AC Junior, VR Uzda
    M. Fortes, TAS Pardo, and SM Aluısio,“Facilita: reading assistance for low 2009
    Citations: 8

  • Determination of harmonic parameters in pathological voices—efficient algorithm
    JFT Fernandes, D Freitas, AC Junior, JP Teixeira
    Applied Sciences 13 (4), 2333 2023
    Citations: 7