Dr. Anirban Bhowmick is an Assistant Professor (Senior) at the School of Electrical and Electronics Engineering, VIT Bhopal University, and a Senior Member of IEEE. He earned his Ph.D. in Speech Processing from Birla Institute of Technology Mesra and completed a postdoctoral fellowship with the Sound and Audio Group at Indian Institute of Science, where he contributed to a DRDO-funded project on large-scale audio analytics.
With over 11 years of academic experience, Dr. Bhowmick has authored 30+ research papers in leading international journals and conferences. His research interests span speech and audio signal processing, spoken language identification (particularly for under-resourced and code-switched languages), audio event detection, and acoustic surveillance for UAVs.
He serves as an Academic Editor for PLOS One and is on the Editorial Board of Scientific Reports. He is also an active reviewer for prominent journals, including IEEE/ACM Transactions on Audio Speech and Language P
RESEARCH, TEACHING, or OTHER INTERESTS
Signal Processing, Electrical and Electronic Engineering, Computer Vision and Pattern Recognition, Human-Computer Interaction
Bridging Domain Gaps With ProtoAlign: Teacher–Student Few-Shot Prototype Alignment for Cross-Domain Spoken Language Recognition Omkar Vilas Sawant, Anirban Bhowmick IEEE Access, 2026 Spoken language recognition in low-resource settings is hindered by domain shift and limited labeled data.We propose ProtoAlign, a teacher–student few-shot prototype alignment framework that learns domain-invariant, language-discriminative representations with minimal target supervision. The student uses a compact transformer-style backbone with Feature Reweighting Layer (FRL). Source-domain class prototypes are maintained as exponential moving averages and serve as stable anchors. A target-to-source Information Noise Contrastive Estimate(InfoNCE) alignment term pulls few-shot target embeddings toward their language-matched source prototypes, while a lightweight knowledge-distillation loss from a source-only teacher preserves source accuracy. Warm-start schedules for the alignment and distillation weights stabilize optimization, and a pairing sampler ensures each batch contains target samples with same-language source counterparts. We evaluate on five Indian languages across five heterogeneous domains (All India Radio (AIR), Common Voice (CV), Kaggle, Indian Institute of Technology Hyderabad (IIT-H)) and Indic TTS. With at most ten labeled target examples per language, ProtoAlign consistently outperforms a strong transformer baseline in cross-domain tests and produces visibly tighter, more domain-invariant clusters in the embedding space. These results indicate that prototype anchoring combined with gentle teacher guidance provides a simple, scalable, and label-efficient path to robust cross-domain spoken language recognition.
Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape Palash Jain, Anirban Bhowmick Eurasip Journal on Audio Speech and Music Processing, 2025 India’s linguistic diversity encompasses multiple language families, including the Indo-Aryan and Dravidian, which represent distinct phonological and morphological characteristics. This study aims to evaluate and compare the performance of end-to-end automatic speech recognition (ASR) systems for three Indo-Aryan languages—Marathi, Odia, and Gujarati—and three Dravidian languages—Tamil, Telugu, and Malayalam. Using four transformer-based pre-trained models—Wav2Vec2.0-base, XLSR-53, W2V2-BERT, and Whisper small—the analysis explores their adaptability to these languages’ linguistic features, with word error rate (WER) and character error rate (CER) serving as evaluation metrics. Results indicate that W2V2-BERT and XLSR-53 outperform other models, achieving lower WER and CER, especially for Indo-Aryan languages. However, higher error rates for Dravidian languages highlight challenges such as complex phonology and agglutinative morphology. This work provides a comparative insight into the strengths and limitations of pre-trained ASR models across India’s diverse linguistic landscape and underscores the need for language-specific adaptations to improve ASR accuracy for underrepresented languages.
Non-linear filtering for multilingual speech: A physics-inspired transformer framework for joint denoising and recognition Omkar Vilas Sawant, Anirban Bhowmick Aip Advances, 2025 Environmental noise severely degrades speech intelligibility and downstream processing. This paper presents a physics-inspired, transformer-based deep neural network for robust speech denoising. The model leverages complementary perceptually motivated acoustic features—including gammatone frequency cepstral coefficients, power-normalized cepstral coefficients, RelAtive SpecTrAl-Perceptual Linear Prediction (RASTA-PLP), modulation spectra, and cepstral frequency cepstral coefficients—to capture essential speech cues while suppressing noise. Analysis shows that different noise types (e.g., pink, babble, and transient) corrupt distinct spectrotemporal regions. This insight informed the model’s design, particularly its non-linear attention mechanism, which dynamically emphasizes clean speech components and suppresses localized noise distortions. We evaluate denoising effectiveness using multilingual Spoken Language Recognition (SLR) as a proxy for intelligibility. Experiments on a noisy Indian language corpus (−10 to −15 dB signal-to-noise ratio) and the Common Voice dataset demonstrate significant superiority over classical methods (e.g., Wiener filtering) and other deep neural network approaches. The proposed transformer model achieved the highest SLR accuracy, notably 97.18% on the Indian corpus, confirming its ability to preserve spectral and temporal speech integrity. Results consistently highlight the generalizability and robustness of this physics-guided, attention-based non-linear filtering approach across diverse multilingual speech.
Multi-Scale Based Approach for Denoising Real-World Noisy Image Using Curvelet Thresholding: Scope and Beyond Susant Kumar Panigrahi, Santosh Kumar Tripathy, Anirban Bhowmick, Santosh Kumar Satapathy, Paolo Barsocchi, Akash Kumar Bhoi IEEE Access, 2024 Naïve simulated additive white Gaussian noise (AWGN) may not fully characterize the complexity of real world noisy images. Owing to optimal sparsity in image representation, we propose a curvelet based model for denoising real-world RGB images. Initially, the image is decomposed in three curvelet scales, namely: the approximation scale (that retains low-frequency information), the coarser scale and the finest scale (that preserves high-frequency components). Coefficients in the approximation and finest scale are estimated using NLM filter, while a scale dependent threshold is adopted for signal estimation in the coarser scale. The reconstructed image in spatial domain is further processed using Guided Image Filter (GIF) to suppress the ringing artifacts due to curvelet thresholding. The proposed approach known as CTuNLM method is extended for color image denoising using uncorrelated YUV color space. Extensive experiments on multi-channel real noisy images are conducted in comparison with eight sate-of-the-art methods. With four encouraging qualitative and quantitative measures including PSNR and SSIM, we found that CTuNLM method achieves better denoising performance in terms of noise reduction and detail preservation. We further examined the potential of proposed approach by focusing only on the Finest scale curvelet Coefficients (FC). Features like small details, edges and textures always add up to improve the overall denoising performance, while minimizing spurious details. We studied “The Curious Case of the Finest Scale” and constructed “Deep Curvelet-Net”: an encoder-decoder-based CNN architecture, as a pilot work. The encoder uses multiscale spatial characteristics from noisy FC, while the decoder processes denoised FC under the supervision of encoder’s multiscale spatial attention map. The “Deep Curvelet-Net” links encoder multiscale feature modeling with decoder spatial attention supervision to learn the most essential features for denoising. The CNN-based architecture only estimates FC, while all other CTuNLM stages are left unchanged to produce the denoised output. Results presented in this article validated the design of proposed CNN architecture in curvelet domain and motivated us to search beyond classical thresholding and/or filtering approaches.
Gender Classification Using Pitch and Formants Pawan Kumar, Nitika Jakhanwal, Anirban Bhowmick, Mahesh Chandra Proceedings of the International Conference on Communication Computing and Security Icccs 2011, 2011
RECENT SCHOLAR PUBLICATIONS
When Images Become Sound: Preserving Visual Semantics with I-GLA: S. Pal, A. Bhowmick S Pal, A Bhowmick Signal, Image and Video Processing 20 (2), 67 , 2026 2026
Bridging Domain Gaps With ProtoAlign: Teacher–Student Few-Shot Prototype Alignment for Cross-Domain Spoken Language Recognition OV Sawant, A Bhowmick IEEE Access 14, 10635-10653 , 2026 2026
Non-linear filtering for multilingual speech: A physics-inspired transformer framework for joint denoising and recognition OV Sawant, A Bhowmick AIP Advances 15 (10) , 2025 2025
Analyzing code-switching scenarios in india’s diverse linguistic landscape using end-to-end asr systems with vitb-hebic P Jain, A Bhowmick Computers and Electrical Engineering 122, 109978 , 2025 2025 Citations: 2
Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape P Jain, A Bhowmick EURASIP Journal on Audio, Speech, and Music Processing 2025 (1), 10 , 2025 2025 Citations: 7
VITB-HEBiC: A bilingual corpus for evaluating ASR in diverse Indian code-switching scenarios P Jain, A Bhowmick Applied Acoustics 224, 110119 , 2024 2024 Citations: 5
Separation of speech & music using temporal-spectral features and neural classifiers O Sawant, A Bhowmick, G Bhagwat Evolutionary Intelligence 17 (3), 1389-1403 , 2024 2024 Citations: 7
Multi-scale based approach for denoising real-world noisy image using curvelet thresholding: scope and beyond SK Panigrahi, SK Tripathy, A Bhowmick, SK Satapathy, P Barsocchi, ... IEEE Access 12, 25090-25105 , 2024 2024 Citations: 10
Utilizing Convolutional Neural Networks and Mel Spectrograms for Indian Spoken Language Detection OV Sawant, A Bhowmick International Conference on Signal and Data Processing, 615-628 , 2023 2023
A comparative evaluation of 2-D Hilbert transforms and 2-D continuous wavelet transforms for robust phase extraction in complex fringe patterns J Singh, D Haridas, A Bhowmick, RP Sugavaneshwar Optical Review 30 (5), 570-582 , 2023 2023 Citations: 5
Swarm-based hybrid optimization algorithms: an exhaustive analysis and its applications to electricity load and price forecasting: R. Kottath et al. R Kottath, P Singh, A Bhowmick Soft computing 27 (19), 14095-14126 , 2023 2023 Citations: 18
Energy harvesting techniques and trends in electronic applications P Mehta, A Gaur, C Kumar, A Nella, A Bhowmick, M Rajagopal Energy Harvesting Trends for Low Power Compact Electronic Devices, 205-220 , 2023 2023 Citations: 8
Energy Harvesting Trends for Low Power Compact Electronic Devices A Nella, A Bhowmick, C Kumar, M Rajagopal Springer International Publishing , 2023 2023 Citations: 4
In situ decomposition of crop residues using lignocellulolytic microbial consortia: a viable alternative to residue burning S Bhattacharjya, A Sahu, DH Phalke, MC Manna, JK Thakur, A Mandal, ... Environmental Science and Pollution Research 28 (25), 32416-32433 , 2021 2021 Citations: 46
A novel offset feed flared monopole quasi‐Yagi high directional UWB antenna A Nella, A Bhowmick, M Rajagopal International Journal of RF and Microwave Computer‐Aided Engineering 31 (6 … , 2021 2021 Citations: 13
Rotating Acoustic Reflector Parameter Trade-Off for Near-Outdoor Audio Event Detection G Bhagwat, S Jayaprakash, A Bhowmick Innovations in Electrical and Electronic Engineering: Proceedings of ICEEE … , 2021 2021
Identification/segmentation of indian regional languages with singular value decomposition based feature embedding A Bhowmick, A Biswas, N AnveshKumar, R Kottath Applied Acoustics 176, 107864 , 2021 2021 Citations: 9
A detailed review on non-orthogonal multiple accessbased spatial modulation systems KP Jadhav, A Mahor, A Bhowmick, A N International Journal of Pervasive Computing and Communications 16 (2), 143-164 , 2020 2020 Citations: 6
Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition A Bhowmick, A Biswas, M Chandra Pattern Analysis and Applications 23 (2), 527-539 , 2020 2020 Citations: 4
Ad-hoc mobile array based audio segmentation using latent variable stochastic model SR Chetupalli, A Bhowmick, TV Sreenivas 2019 27th European Signal Processing Conference (EUSIPCO), 1-5 , 2019 2019
MOST CITED SCHOLAR PUBLICATIONS
In situ decomposition of crop residues using lignocellulolytic microbial consortia: a viable alternative to residue burning S Bhattacharjya, A Sahu, DH Phalke, MC Manna, JK Thakur, A Mandal, ... Environmental Science and Pollution Research 28 (25), 32416-32433 , 2021 2021 Citations: 46
Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition A Biswas, PK Sahu, A Bhowmick, M Chandra International Journal of speech technology 17 (4), 389-399 , 2014 2014 Citations: 37
Speech enhancement using voiced speech probability based wavelet decomposition A Bhowmick, M Chandra Computers & Electrical Engineering 62, 706-718 , 2017 2017 Citations: 33
Auditory ERB like admissible wavelet packet features for TIMIT phoneme recognition PK Sahu, A Biswas, A Bhowmick, M Chandra Engineering Science and Technology, an International Journal 17 (3), 145-151 , 2014 2014 Citations: 25
Gender classification using pitch and formants P Kumar, N Jakhanwal, A Bhowmick, M Chandra Proceedings of the 2011 International Conference on Communication, Computing … , 2011 2011 Citations: 25
Hindi vowel classification using GFCC and formant analysis in sensor mismatch condition A Biswas, PK Sahu, A Bhowmick, M Chandra WSEAS Trans Syst 13, 130-143 , 2014 2014 Citations: 21
Speech enhancement using MMSE estimation and spectral subtraction methods VK Gupta, A Bhowmick, M Chandra, SN Sharan 2011 International Conference on Devices and Communications (ICDeCom), 1-5 , 2011 2011 Citations: 21
Swarm-based hybrid optimization algorithms: an exhaustive analysis and its applications to electricity load and price forecasting: R. Kottath et al. R Kottath, P Singh, A Bhowmick Soft computing 27 (19), 14095-14126 , 2023 2023 Citations: 18
Hindi phoneme classification using Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature A Biswas, PK Sahu, A Bhowmick, M Chandra Computers & Electrical Engineering 42, 12-22 , 2015 2015 Citations: 17
A novel offset feed flared monopole quasi‐Yagi high directional UWB antenna A Nella, A Bhowmick, M Rajagopal International Journal of RF and Microwave Computer‐Aided Engineering 31 (6 … , 2021 2021 Citations: 13
Hindi vowel classification using QCN-MFCC features S Mishra, A Bhowmick, MC Shrotriya Perspectives in Science 8, 28-31 , 2016 2016 Citations: 13
Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition A Bhowmick, M Chandra, A Biswas International Journal of Speech Technology 20 (4), 813-827 , 2017 2017 Citations: 12
Admissible wavelet packet sub‐band‐based harmonic energy features for Hindi phoneme recognition A Biswas, PK Sahu, A Bhowmick, M Chandra IET Signal Processing 9 (6), 511-519 , 2015 2015 Citations: 12
AAM based features for multiple camera visual speech recognition in car environment A Biswas, PK Sahu, A Bhowmick, M Chandra Procedia Computer Science 57, 614-621 , 2015 2015 Citations: 11
Multi-scale based approach for denoising real-world noisy image using curvelet thresholding: scope and beyond SK Panigrahi, SK Tripathy, A Bhowmick, SK Satapathy, P Barsocchi, ... IEEE Access 12, 25090-25105 , 2024 2024 Citations: 10
VidTIMIT audio visual phoneme recognition using AAM visual features and human auditory motivated acoustic wavelet features A Biswas, PK Sahu, A Bhowmick, M Chandra 2015 IEEE 2nd International Conference on Recent Trends in Information … , 2015 2015 Citations: 10
Identification/segmentation of indian regional languages with singular value decomposition based feature embedding A Bhowmick, A Biswas, N AnveshKumar, R Kottath Applied Acoustics 176, 107864 , 2021 2021 Citations: 9
Audio visual isolated Oriya digit recognition using HMM and DWT A Biswas, PK Sahu, A Bhowmick, M Chandra Conference on Advances in Communication and Control Systems (CAC2S 2013 … , 2013 2013 Citations: 9
Energy harvesting techniques and trends in electronic applications P Mehta, A Gaur, C Kumar, A Nella, A Bhowmick, M Rajagopal Energy Harvesting Trends for Low Power Compact Electronic Devices, 205-220 , 2023 2023 Citations: 8
Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape P Jain, A Bhowmick EURASIP Journal on Audio, Speech, and Music Processing 2025 (1), 10 , 2025 2025 Citations: 7