Shikha Baghel

@iitg.ac.in

Research Scholar, EEE Department
Indian Institute of Technology Guwahati

Shikha Baghel

RESEARCH INTERESTS

Speech processing, Speech & audio processing, Audio processing, Digital signal processing, Machine learning, Pattern recognition
18

Scopus Publications

139

Scholar Citations

8

Scholar h-index

6

Scholar i10-index

Scopus Publications

  • Summary of the DISPLACE challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments
    Shikha Baghel, Shreyas Ramoji, Somil Jain, Pratik Roy Chowdhuri, Prachi Singh, Deepu Vijayasenan, Sriram Ganapathy
    Speech Communication, 2024
  • The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
    Shareef Babu Kalluri, Prachi Singh, Pratik Roy Chowdhuri, Apoorva Kulkarni, Shikha Baghel, Pradyoth Hegde, Swapnil Sontakke, Deepak K T, S.R. Mahadeva Prasanna, Deepu Vijayasenan, Sriram Ganapathy
    Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2024
  • Driver Speech Detection in Real Driving Scenario
    Mrinmoy Bhattacharjee, Shikha Baghel, S. R. Mahadeva Prasanna
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2023
  • The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments
    Shikha Baghel, Shreyas Ramoji, - Sidharth, Ranjana H, Prachi Singh, Somil Jain, Pratik Roy Chowdhuri, Kaustubh Kulkarni, Swapnil Padhi, Deepu Vijayasenan, Sriram Ganapathy
    Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2023
    In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multi-speaker conversations. The DISPLACE challenge entails a first-of-kind task to benchmark speaker and language diarization on the same data, as the data contains multi-speaker conversations in multilingual code-mixed speech. The challenge attempts to highlight outstanding issues in speaker diarization (SD) in multilingual settings with code-mixing. Further, language diarization (LD) in multi-speaker settings also introduces new challenges, where the system has to disambiguate speaker switches with code switches. For this challenge, a natural multilingual, multi-speaker conversational dataset is distributed for development and evaluation purposes. The systems are evaluated on single-channel far-field recordings. We also release a baseline system and report the highlights of the system submissions.
  • Under-resourced dialect identification in Ao using source information
    Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S. R. Mahadeva Prasanna
    Journal of the Acoustical Society of America, 2022
    This paper reports the findings of an automatic dialect identification (DID) task conducted on Ao speech data using source features. Considering that Ao is a tone language, in this study for DID, the gammatonegram of the linear prediction residual is proposed as a feature. As Ao is an under-resourced language, data augmentation was carried out to increase the size of the speech corpus. The results showed that data augmentation improved DID by 14%. A perception test conducted on Ao speakers showed better DID by the subjects when utterance duration was 3 s. Accordingly, automatic DID was conducted on utterances of various duration. A baseline DID system with the Slms feature attained an average F1-score of 53.84% in a 3 s long utterance. Inclusion of source features, Silpr and [Formula: see text], improved the F1-score to 60.69%. In a final system, with a combination of Silpr, [Formula: see text], Slms, and Mel frequency cepstral coefficient features, the F1-score increased to 61.46%.
  • Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations
    Shikha Baghel, S. R. M. Prasanna, Prithwijit Guha
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2022
  • Analyzing RMFCC Feature for Dialect Identification in Ao, an Under-Resourced Language
    Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S. R. M. Prasanna
    2022 National Conference on Communications Ncc 2022, 2022
    Ao is a language spoken in Nagaland in the North-East of India. It is a low-resource tone language under the Tibeto-Burman language family. It consists of three tones, namely, high, mid and low. It has three distinct dialects of the language viz. Chungli, Mongsen and Changki. This paper presents an automatic dialect identification in Ao using the excitation source feature. The objective of a dialect identification system is to identify a speech variety within a language. The goal of this study is to determine if the excitation source features such as Residual Mel Frequency Cepstral Coefficient (RMFCC) can be exploited to discriminate the three dialects in Ao automatically. In addition, vocal tract system features, namely Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Cepstral (SDC) coefficients, are used as the baseline methods. The RMFCC features are obtained from the Linear Prediction (LP) residual signal, while MFCC features are derived from the smooth spectrum of the speech signal. SDC coefficients are explored to provide additional temporal information. This work is evaluated on trisyllabic words uttered by 36 speakers for the three dialects of Ao. A Gaussian Mixture Model (GMM) based classifier is used for classification. The performance of the system yields a better dialect identification accuracy rate when all three features are combined.
  • Overlapped speech detection using phase features
    Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guha
    Journal of the Acoustical Society of America, 2021
    Simultaneous speech of multiple speakers is known as overlapped speech, which causes problems for speech recognition and speaker diarization systems. The present work uses previously less utilized signal phase information in the task of overlapped speech detection. In this context, Instantaneous Frequency Cosine Coefficient (IFCC) and Modified Group Delay Cepstral Coefficient (MGDCC) features are explored. IFCC captures the time-varying phase characteristics, while MGDCC represents the frequency-varying information of the phase spectrum. A Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM)-based classifier is used for the classification. The present work uses synthetically generated overlapped speech from the GRID corpus. The proposed method is benchmarked against three baseline approaches that use magnitude spectrum features. It is observed that the combination of IFCC and MGDCC features with CNN-LSTM classifier provides better performance than the baselines. The combination of phase features with magnitude-based MFCC feature provides the best performance, indicating the importance of complementary information. The present study also investigates the effect of segment duration, genders, and number of simultaneous speakers on the overlapped speech detection system. Finally, the proposed method is also evaluated on real overlapped data from the AMI corpus.
  • Effect of high-energy voiced speech segments and speaker gender on shouted speech detection
    Shikha Baghel, S. R. M. Prasanna, Prithwijit Guha
    2021 National Conference on Communications Ncc 2021, 2021
    Shouted speech detection is an essential preprocessing task in many conventional speech processing systems. Mostly, shouted speech has been studied in terms of the characterization of vocal tract and excitation source features. Previous works have also established the significance of voiced segments in shouted speech detection. This work posits that a significant emphasis is given to a portion of the voiced segments during shouted speech production. These emphasized voiced regions have significant energy. This work analyzes the effect of high-energy voiced segments on shouted speech detection. Moreover, fundamental frequency is a crucial characteristic of both shouted speech and speaker gender. Authors believe that gender has a significant effect on shouted speech detection. Therefore, the present work also studies the impact of gender on the current task. The classification between normal and shouted speech is performed using a DNN based classifier. A statistical significance test of the features extracted from high-energy voiced segments is also performed. The results support the claim that high-energy voiced segments carry highly discriminating information. Additionally, classification results of gender experiments show that gender has a notable effect on shouted speech detection.
  • Excitation source feature based dialect identification in Ao - A low resource language
    Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S.R. Mahadeva Prasanna
    Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2021
    Ao is an under-resourced Tibeto-Burman tonal language spoken in Nagaland, India. There are three distinct dialects of the language, namely, Chungli, Mongsen and Changki. The objective of dialect identification is to identify one dialect from the other within the same language family. The goal of this study is to ascertain the potential of excitation source features for automatic dialect identification in Ao. In this direction, Integrated Linear Prediction Residual (ILPR), an approximate representation of source signal, is explored. The log Mel spectrogram of ILPR (SExt) signal is used to exploit the time-frequency characteristics of the excitation source. This work proposes attention based CNN-BiGRU architecture for automatic dialect identification tasks. Additionally, log Mel spectrogram (SV T ), extracted from the pre-emphasized speech signal, is used as a baseline method. The SV T contains the vocal-tract characteristics of the speech signal. A significant performance improvement of (nearly) 6% accuracy is observed when the excitation source feature (SExt) is combined with the vocal tract representation (SV T ). To analyse the effect of segment duration, dialect identification performance is reported for three different durations, viz., 1 sec, 3 sec and 6 sec. The effect of gender in dialect identification task for Ao is also studied in this work.
  • Automatic detection of shouted speech segments in Indian news debates
    Shikha Baghel, Mrinmoy Bhattacharjee, S.R. Mahadeva Prasanna, Prithwijit Guha
    Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2021
  • Overlapped/Non-Overlapped Speech Transition Point Detection Using Bag-of-Audio-Words
    Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guhal
    Spcom 2020 International Conference on Signal Processing and Communications, 2020
  • Exploration of excitation source information for shouted and normal speech classification
    Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guha
    Journal of the Acoustical Society of America, 2020
  • Analysis of excitation source characteristics for shouted and normal speech classification
    Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guha
    26th National Conference on Communications Ncc 2020, 2020
  • Shouted and Normal Speech Classification Using 1D CNN
    Shikha Baghel, Mrinmoy Bhattacharjee, S. R. M. Prasanna, Prithwijit Guha
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2019
  • Excitation source feature for discriminating shouted and normal speech
    Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guha
    Spcom 2018 12th International Conference on Signal Processing and Communications, 2018
  • Classification of multi speaker shouted speech and single speaker normal speech
    Shikha Baghel, S. R. Mahadeva Prasanna, Prithwijit Guha
    IEEE Region 10 Annual International Conference Proceedings TENCON, 2017
  • Shouted/normal speech classification using speech-specific features
    Shikha Baghel, Banriskhem K. Khonglah, S.R. Mahadeva Prasanna, Prithwijit Guha
    IEEE Region 10 Annual International Conference Proceedings TENCON, 2017

RECENT SCHOLAR PUBLICATIONS

  • The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
    SB Kalluri, P Singh, PR Chowdhuri, A Kulkarni, S Baghel, P Hegde, ...
    arXiv preprint arXiv:2406.09494 , 2024
    2024
    Citations: 11
  • Summary of the DISPLACE Challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments
    S Baghel, S Ramoji, S Jain, PR Chowdhuri, P Singh, D Vijayasenan, ...
    Speech Communication , 2024
    2024
    Citations: 15
  • Driver Speech Detection in Real Driving Scenario
    M Bhattacharjee, S Baghel, SRM Prasanna
    International Conference on Speech and Computer, 189-199 , 2023
    2023
    Citations: 1
  • The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments
    SG Shikha Baghel, Shreyas Ramoji, Sidharth, Ranjana H, Prachi Singh, Somil ...
    INTERSPEECH-2023, 3562--3566 , 2023
    2023
    Citations: 11
  • DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
    S Baghel, S Ramoji, P Singh, S Jain, PR Chowdhuri, K Kulkarni, S Padhi, ...
    INTERSPEECH at https://www.isca-speech.org/archive/interspeech_2023 … , 2023
    2023
    Citations: 6
  • Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations
    S Baghel, SRM Prasanna, P Guha
    International Conference on Speech and Computer, 33-43 , 2022
    2022
  • Under-resourced dialect identification in Ao using source information
    M Tzudir, S Baghel, P Sarmah, SRM Prasanna
    The Journal of the Acoustical Society of America 152 (3), 1755-1766 , 2022
    2022
    Citations: 7
  • Analyzing RMFCC Feature for Dialect Identification in Ao, an Under-Resourced Language
    M Tzudir, S Baghel, P Sarmah, SRM Prasanna
    2022 National Conference on Communications (NCC), 308-313 , 2022
    2022
    Citations: 16
  • Shouted, Overlapped and Competitive Speech Detection in Indian Television News Debates
    S Baghel
    2022
  • Overlapped speech detection using phase features
    S Baghel, SRM Prasanna, P Guha
    The Journal of the Acoustical Society of America 150 (4), 2770-2781 , 2021
    2021
    Citations: 3
  • Effect of High-Energy Voiced Speech Segments and Speaker Gender on Shouted Speech Detection
    S Baghel, SRM Prasanna, P Guha
    2021 National Conference on Communications (NCC), 1-6 , 2021
    2021
    Citations: 1
  • Excitation Source Feature Based Dialect Identification in Ao-A Low Resource Language.
    M Tzudir, S Baghel, P Sarmah, SRM Prasanna
    Interspeech, 1524-1528 , 2021
    2021
    Citations: 9
  • Automatic Detection of Shouted Speech Segments in Indian News Debates
    S Baghel, M Bhattacharjee, SRM Prasanna, P Guha
    Proc. Interspeech 2021, 4179-4183 , 2021
    2021
    Citations: 6
  • Overlapped/Non-Overlapped Speech Transition Point Detection Using Bag-of-Audio-Words
    S Baghel, SRM Prasanna, P Guhal
    2020 International Conference on Signal Processing and Communications (SPCOM … , 2020
    2020
  • Exploration of excitation source information for shouted and normal speech classification
    S Baghel, SRM Prasanna, P Guha
    The Journal of the Acoustical Society of America 147 (2), 1250-1261 , 2020
    2020
    Citations: 17
  • Analysis of Excitation Source Characteristics for Shouted and Normal Speech Classification
    S Baghel, SRM Prasanna, P Guha
    2020 National Conference on Communications (NCC), 1-6 , 2020
    2020
    Citations: 2
  • Shouted and normal speech classification using 1D CNN
    S Baghel, M Bhattacharjee, SRM Prasanna, P Guha
    Pattern Recognition and Machine Intelligence: 8th International Conference … , 2019
    2019
    Citations: 11
  • Excitation Source Feature for Discriminating Shouted and Normal Speech
    S Baghel, SRM Prasanna, P Guha
    2018 International Conference on Signal Processing and Communications (SPCOM … , 2018
    2018
    Citations: 7
  • Classification of multi speaker shouted speech and single speaker normal speech
    S Baghel, SRM Prasanna, P Guha
    TENCON 2017-2017 IEEE Region 10 Conference, 2388-2392 , 2017
    2017
    Citations: 8
  • Shouted/normal speech classification using speech-specific features
    S Baghel, BK Khonglah, SRM Prasanna, P Guha
    2016 IEEE Region 10 Conference (TENCON), 1655-1659 , 2016
    2016
    Citations: 8

MOST CITED SCHOLAR PUBLICATIONS

  • Exploration of excitation source information for shouted and normal speech classification
    S Baghel, SRM Prasanna, P Guha
    The Journal of the Acoustical Society of America 147 (2), 1250-1261 , 2020
    2020
    Citations: 17
  • Analyzing RMFCC Feature for Dialect Identification in Ao, an Under-Resourced Language
    M Tzudir, S Baghel, P Sarmah, SRM Prasanna
    2022 National Conference on Communications (NCC), 308-313 , 2022
    2022
    Citations: 16
  • Summary of the DISPLACE Challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments
    S Baghel, S Ramoji, S Jain, PR Chowdhuri, P Singh, D Vijayasenan, ...
    Speech Communication , 2024
    2024
    Citations: 15
  • The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
    SB Kalluri, P Singh, PR Chowdhuri, A Kulkarni, S Baghel, P Hegde, ...
    arXiv preprint arXiv:2406.09494 , 2024
    2024
    Citations: 11
  • The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments
    SG Shikha Baghel, Shreyas Ramoji, Sidharth, Ranjana H, Prachi Singh, Somil ...
    INTERSPEECH-2023, 3562--3566 , 2023
    2023
    Citations: 11
  • Shouted and normal speech classification using 1D CNN
    S Baghel, M Bhattacharjee, SRM Prasanna, P Guha
    Pattern Recognition and Machine Intelligence: 8th International Conference … , 2019
    2019
    Citations: 11
  • Excitation Source Feature Based Dialect Identification in Ao-A Low Resource Language.
    M Tzudir, S Baghel, P Sarmah, SRM Prasanna
    Interspeech, 1524-1528 , 2021
    2021
    Citations: 9
  • Classification of multi speaker shouted speech and single speaker normal speech
    S Baghel, SRM Prasanna, P Guha
    TENCON 2017-2017 IEEE Region 10 Conference, 2388-2392 , 2017
    2017
    Citations: 8
  • Shouted/normal speech classification using speech-specific features
    S Baghel, BK Khonglah, SRM Prasanna, P Guha
    2016 IEEE Region 10 Conference (TENCON), 1655-1659 , 2016
    2016
    Citations: 8
  • Under-resourced dialect identification in Ao using source information
    M Tzudir, S Baghel, P Sarmah, SRM Prasanna
    The Journal of the Acoustical Society of America 152 (3), 1755-1766 , 2022
    2022
    Citations: 7
  • Excitation Source Feature for Discriminating Shouted and Normal Speech
    S Baghel, SRM Prasanna, P Guha
    2018 International Conference on Signal Processing and Communications (SPCOM … , 2018
    2018
    Citations: 7
  • DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
    S Baghel, S Ramoji, P Singh, S Jain, PR Chowdhuri, K Kulkarni, S Padhi, ...
    INTERSPEECH at https://www.isca-speech.org/archive/interspeech_2023 … , 2023
    2023
    Citations: 6
  • Automatic Detection of Shouted Speech Segments in Indian News Debates
    S Baghel, M Bhattacharjee, SRM Prasanna, P Guha
    Proc. Interspeech 2021, 4179-4183 , 2021
    2021
    Citations: 6
  • Overlapped speech detection using phase features
    S Baghel, SRM Prasanna, P Guha
    The Journal of the Acoustical Society of America 150 (4), 2770-2781 , 2021
    2021
    Citations: 3
  • Analysis of Excitation Source Characteristics for Shouted and Normal Speech Classification
    S Baghel, SRM Prasanna, P Guha
    2020 National Conference on Communications (NCC), 1-6 , 2020
    2020
    Citations: 2
  • Driver Speech Detection in Real Driving Scenario
    M Bhattacharjee, S Baghel, SRM Prasanna
    International Conference on Speech and Computer, 189-199 , 2023
    2023
    Citations: 1
  • Effect of High-Energy Voiced Speech Segments and Speaker Gender on Shouted Speech Detection
    S Baghel, SRM Prasanna, P Guha
    2021 National Conference on Communications (NCC), 1-6 , 2021
    2021
    Citations: 1
  • Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations
    S Baghel, SRM Prasanna, P Guha
    International Conference on Speech and Computer, 33-43 , 2022
    2022
  • Shouted, Overlapped and Competitive Speech Detection in Indian Television News Debates
    S Baghel
    2022
  • Overlapped/Non-Overlapped Speech Transition Point Detection Using Bag-of-Audio-Words
    S Baghel, SRM Prasanna, P Guhal
    2020 International Conference on Signal Processing and Communications (SPCOM … , 2020
    2020