Dr. Anirban Bhowmick

Scopus Publications

When Images Become Sound: Preserving Visual Semantics with I-GLA
Sreejib Pal, Anirban Bhowmick
Signal Image and Video Processing, 2026
Bridging Domain Gaps With ProtoAlign: Teacher–Student Few-Shot Prototype Alignment for Cross-Domain Spoken Language Recognition
Omkar Vilas Sawant, Anirban Bhowmick
IEEE Access, 2026
Spoken language recognition in low-resource settings is hindered by domain shift and limited labeled data.We propose ProtoAlign, a teacher–student few-shot prototype alignment framework that learns domain-invariant, language-discriminative representations with minimal target supervision. The student uses a compact transformer-style backbone with Feature Reweighting Layer (FRL). Source-domain class prototypes are maintained as exponential moving averages and serve as stable anchors. A target-to-source Information Noise Contrastive Estimate(InfoNCE) alignment term pulls few-shot target embeddings toward their language-matched source prototypes, while a lightweight knowledge-distillation loss from a source-only teacher preserves source accuracy. Warm-start schedules for the alignment and distillation weights stabilize optimization, and a pairing sampler ensures each batch contains target samples with same-language source counterparts. We evaluate on five Indian languages across five heterogeneous domains (All India Radio (AIR), Common Voice (CV), Kaggle, Indian Institute of Technology Hyderabad (IIT-H)) and Indic TTS. With at most ten labeled target examples per language, ProtoAlign consistently outperforms a strong transformer baseline in cross-domain tests and produces visibly tighter, more domain-invariant clusters in the embedding space. These results indicate that prototype anchoring combined with gentle teacher guidance provides a simple, scalable, and label-efficient path to robust cross-domain spoken language recognition.
Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape
Palash Jain, Anirban Bhowmick
Eurasip Journal on Audio Speech and Music Processing, 2025
India’s linguistic diversity encompasses multiple language families, including the Indo-Aryan and Dravidian, which represent distinct phonological and morphological characteristics. This study aims to evaluate and compare the performance of end-to-end automatic speech recognition (ASR) systems for three Indo-Aryan languages—Marathi, Odia, and Gujarati—and three Dravidian languages—Tamil, Telugu, and Malayalam. Using four transformer-based pre-trained models—Wav2Vec2.0-base, XLSR-53, W2V2-BERT, and Whisper small—the analysis explores their adaptability to these languages’ linguistic features, with word error rate (WER) and character error rate (CER) serving as evaluation metrics. Results indicate that W2V2-BERT and XLSR-53 outperform other models, achieving lower WER and CER, especially for Indo-Aryan languages. However, higher error rates for Dravidian languages highlight challenges such as complex phonology and agglutinative morphology. This work provides a comparative insight into the strengths and limitations of pre-trained ASR models across India’s diverse linguistic landscape and underscores the need for language-specific adaptations to improve ASR accuracy for underrepresented languages.
Non-linear filtering for multilingual speech: A physics-inspired transformer framework for joint denoising and recognition
Omkar Vilas Sawant, Anirban Bhowmick
Aip Advances, 2025
Environmental noise severely degrades speech intelligibility and downstream processing. This paper presents a physics-inspired, transformer-based deep neural network for robust speech denoising. The model leverages complementary perceptually motivated acoustic features—including gammatone frequency cepstral coefficients, power-normalized cepstral coefficients, RelAtive SpecTrAl-Perceptual Linear Prediction (RASTA-PLP), modulation spectra, and cepstral frequency cepstral coefficients—to capture essential speech cues while suppressing noise. Analysis shows that different noise types (e.g., pink, babble, and transient) corrupt distinct spectrotemporal regions. This insight informed the model’s design, particularly its non-linear attention mechanism, which dynamically emphasizes clean speech components and suppresses localized noise distortions. We evaluate denoising effectiveness using multilingual Spoken Language Recognition (SLR) as a proxy for intelligibility. Experiments on a noisy Indian language corpus (−10 to −15 dB signal-to-noise ratio) and the Common Voice dataset demonstrate significant superiority over classical methods (e.g., Wiener filtering) and other deep neural network approaches. The proposed transformer model achieved the highest SLR accuracy, notably 97.18% on the Indian corpus, confirming its ability to preserve spectral and temporal speech integrity. Results consistently highlight the generalizability and robustness of this physics-guided, attention-based non-linear filtering approach across diverse multilingual speech.
Analyzing code-switching scenarios in India's diverse linguistic landscape using end-to-end ASR systems with VITB-HEBiC
Palash Jain, Anirban Bhowmick
Computers and Electrical Engineering, 2025
Utilizing Convolutional Neural Networks and Mel Spectrograms for Indian Spoken Language Detection
Omkar Vilas Sawant, Anirban Bhowmick
Lecture Notes in Electrical Engineering, 2025
VITB-HEBiC: A bilingual corpus for evaluating ASR in diverse Indian code-switching scenarios
Palash Jain, Anirban Bhowmick
Applied Acoustics, 2024
Separation of speech & music using temporal-spectral features and neural classifiers
Omkar Sawant, Anirban Bhowmick, Ganesh Bhagwat
Evolutionary Intelligence, 2024
Multi-Scale Based Approach for Denoising Real-World Noisy Image Using Curvelet Thresholding: Scope and Beyond
Susant Kumar Panigrahi, Santosh Kumar Tripathy, Anirban Bhowmick, Santosh Kumar Satapathy, Paolo Barsocchi, Akash Kumar Bhoi
IEEE Access, 2024
Naïve simulated additive white Gaussian noise (AWGN) may not fully characterize the complexity of real world noisy images. Owing to optimal sparsity in image representation, we propose a curvelet based model for denoising real-world RGB images. Initially, the image is decomposed in three curvelet scales, namely: the approximation scale (that retains low-frequency information), the coarser scale and the finest scale (that preserves high-frequency components). Coefficients in the approximation and finest scale are estimated using NLM filter, while a scale dependent threshold is adopted for signal estimation in the coarser scale. The reconstructed image in spatial domain is further processed using Guided Image Filter (GIF) to suppress the ringing artifacts due to curvelet thresholding. The proposed approach known as CTuNLM method is extended for color image denoising using uncorrelated YUV color space. Extensive experiments on multi-channel real noisy images are conducted in comparison with eight sate-of-the-art methods. With four encouraging qualitative and quantitative measures including PSNR and SSIM, we found that CTuNLM method achieves better denoising performance in terms of noise reduction and detail preservation. We further examined the potential of proposed approach by focusing only on the Finest scale curvelet Coefficients (FC). Features like small details, edges and textures always add up to improve the overall denoising performance, while minimizing spurious details. We studied “The Curious Case of the Finest Scale” and constructed “Deep Curvelet-Net”: an encoder-decoder-based CNN architecture, as a pilot work. The encoder uses multiscale spatial characteristics from noisy FC, while the decoder processes denoised FC under the supervision of encoder’s multiscale spatial attention map. The “Deep Curvelet-Net” links encoder multiscale feature modeling with decoder spatial attention supervision to learn the most essential features for denoising. The CNN-based architecture only estimates FC, while all other CTuNLM stages are left unchanged to produce the denoised output. Results presented in this article validated the design of proposed CNN architecture in curvelet domain and motivated us to search beyond classical thresholding and/or filtering approaches.
Swarm-based hybrid optimization algorithms: an exhaustive analysis and its applications to electricity load and price forecasting
Rahul Kottath, Priyanka Singh, Anirban Bhowmick
Soft Computing, 2023
A comparative evaluation of 2-D Hilbert transforms and 2-D continuous wavelet transforms for robust phase extraction in complex fringe patterns
Jyoti Singh, Divya Haridas, Anirban Bhowmick, Ramu Pasupathi Sugavaneshwar
Optical Review, 2023
Editor’s Note
Eai Springer Innovations in Communication and Computing, 2023
Energy Harvesting Techniques and Trends in Electronic Applications
Pavan Mehta, Anupama Gaur, Chandan Kumar, Anveshkumar Nella, Anirban Bhowmick, Maheswar Rajagopal
Eai Springer Innovations in Communication and Computing, 2023
A novel offset feed flared monopole quasi-Yagi high directional UWB antenna
Anveshkumar Nella, Anirban Bhowmick, Maheswar Rajagopal
International Journal of RF and Microwave Computer Aided Engineering, 2021
Identification/segmentation of indian regional languages with singular value decomposition based feature embedding
Anirban Bhowmick, Astik Biswas, Nella AnveshKumar, Rahul Kottath
Applied Acoustics, 2021
Rotating acoustic reflector parameter trade-off for near-outdoor audio event detection
Ganesh Bhagwat, Sangeeth Jayaprakash, Anirban Bhowmick
Lecture Notes in Electrical Engineering, 2021
Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition
Anirban Bhowmick, Astik Biswas, Mahesh Chandra
Pattern Analysis and Applications, 2020
Ad-hoc mobile array based audio segmentation using latent variable stochastic model
Srikanth Raj Chetupalli, Anirban Bhowmick, Thippur V. Sreenivas
European Signal Processing Conference, 2019
Enhanced directional sensitivity using acoustic dish reflector
Omkar Sawant, Anirban Bhowmick, Thippur V. Sreenivas
Spcom 2018 12th International Conference on Signal Processing and Communications, 2018
Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition
Anirban Bhowmick, Mahesh Chandra, Astik Biswas
International Journal of Speech Technology, 2017
Speech enhancement using voiced speech probability based wavelet decomposition
Anirban Bhowmick, Mahesh Chandra
Computers and Electrical Engineering, 2017
Speech recognition using ERB-like admissible wavelet packet decomposition based on perceptual sub-band weighting
Astik Biswas, P.K. Sahu, Anirban Bhowmick, Mahesh Chandra
IETE Journal of Research, 2016
VidTIMIT audio visual phoneme recognition using AAM visual features and human auditory motivated acoustic wavelet features
Astik Biswas, P. K. Sahu, Anirban Bhowmick, Mahesh Chandra
2015 IEEE 2nd International Conference on Recent Trends in Information Systems Retis 2015 Proceedings, 2015
Admissible wavelet packet sub-band-based harmonic energy features for Hindi phoneme recognition
Astik Biswas, Prasanna Kumar Sahu, Anirban Bhowmick, Mahesh Chandra
Iet Signal Processing, 2015
Acoustic feature extraction using ERB like wavelet sub-band perceptual Wiener filtering for noisy speech recognition
Astik Biswas, P.K. Sahu, Anirban Bhowmick, Mahesh Chandra
11th IEEE India Conference Emerging Trends and Innovation in Technology Indicon 2014, 2015
Hindi phoneme classification using Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature
Astik Biswas, P.K. Sahu, Anirban Bhowmick, Mahesh Chandra
Computers and Electrical Engineering, 2015
AAM Based Features for Multiple Camera Visual Speech Recognition in Car Environment
Astik Biswas, P.K. Sahu, Anirban Bhowmick, Mahesh Chandra
Procedia Computer Science, 2015
Articulation based admissible wavelet packet feature based on human cochlear frequency response for TIMIT speech recognition
Astik Biswas, P.K. Sahu, Anirban Bhowmick, Mahesh Chandra
Ain Shams Engineering Journal, 2014
Performance evaluation of front end speech enhancement techniques
Anirban Bhowmick, Mahesh Chandra, Astik Biswas, P.K. Sahu
Proceedings of the 2014 International Conference on Advances in Computing Communications and Informatics Icacci 2014, 2014
Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition
Astik Biswas, P. K. Sahu, Anirban Bhowmick, Mahesh Chandra
International Journal of Speech Technology, 2014
Auditory ERB like admissible wavelet packet features for TIMIT phoneme recognition
P.K. Sahu, Astik Biswas, Anirban Bhowmick, Mahesh Chandra
Engineering Science and Technology an International Journal, 2014
Hindi vowel classification using GFCC and formant analysis in sensor mismatch condition
Wseas Transactions on Systems, 2014
Speech enhancement using MMSE estimation and spectral subtraction methods
V. K. Gupta, Anirban Bhowmick, Mahesh Chandra, S. N. Sharan
2011 International Conference on Devices and Communications Icdecom 2011 Proceedings, 2011
Gender classification using pitch and formants
Pawan Kumar, Nitika Jakhanwal, Anirban Bhowmick, Mahesh Chandra
ACM International Conference Proceeding Series, 2011
Gender Classification Using Pitch and Formants
Pawan Kumar, Nitika Jakhanwal, Anirban Bhowmick, Mahesh Chandra
Proceedings of the International Conference on Communication Computing and Security Icccs 2011, 2011

RECENT SCHOLAR PUBLICATIONS

When Images Become Sound: Preserving Visual Semantics with I-GLA: S. Pal, A. Bhowmick
S Pal, A Bhowmick
Signal, Image and Video Processing 20 (2), 67 , 2026
2026
Bridging Domain Gaps With ProtoAlign: Teacher–Student Few-Shot Prototype Alignment for Cross-Domain Spoken Language Recognition
OV Sawant, A Bhowmick
IEEE Access 14, 10635-10653 , 2026
2026
Non-linear filtering for multilingual speech: A physics-inspired transformer framework for joint denoising and recognition
OV Sawant, A Bhowmick
AIP Advances 15 (10) , 2025
2025
Analyzing code-switching scenarios in india’s diverse linguistic landscape using end-to-end asr systems with vitb-hebic
P Jain, A Bhowmick
Computers and Electrical Engineering 122, 109978 , 2025
2025
Citations: 2
Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape
P Jain, A Bhowmick
EURASIP Journal on Audio, Speech, and Music Processing 2025 (1), 10 , 2025
2025
Citations: 7
VITB-HEBiC: A bilingual corpus for evaluating ASR in diverse Indian code-switching scenarios
P Jain, A Bhowmick
Applied Acoustics 224, 110119 , 2024
2024
Citations: 5
Separation of speech & music using temporal-spectral features and neural classifiers
O Sawant, A Bhowmick, G Bhagwat
Evolutionary Intelligence 17 (3), 1389-1403 , 2024
2024
Citations: 7
Multi-scale based approach for denoising real-world noisy image using curvelet thresholding: scope and beyond
SK Panigrahi, SK Tripathy, A Bhowmick, SK Satapathy, P Barsocchi, ...
IEEE Access 12, 25090-25105 , 2024
2024
Citations: 10
Utilizing Convolutional Neural Networks and Mel Spectrograms for Indian Spoken Language Detection
OV Sawant, A Bhowmick
International Conference on Signal and Data Processing, 615-628 , 2023
2023
A comparative evaluation of 2-D Hilbert transforms and 2-D continuous wavelet transforms for robust phase extraction in complex fringe patterns
J Singh, D Haridas, A Bhowmick, RP Sugavaneshwar
Optical Review 30 (5), 570-582 , 2023
2023
Citations: 5
Swarm-based hybrid optimization algorithms: an exhaustive analysis and its applications to electricity load and price forecasting: R. Kottath et al.
R Kottath, P Singh, A Bhowmick
Soft computing 27 (19), 14095-14126 , 2023
2023
Citations: 18
Energy harvesting techniques and trends in electronic applications
P Mehta, A Gaur, C Kumar, A Nella, A Bhowmick, M Rajagopal
Energy Harvesting Trends for Low Power Compact Electronic Devices, 205-220 , 2023
2023
Citations: 8
Energy Harvesting Trends for Low Power Compact Electronic Devices
A Nella, A Bhowmick, C Kumar, M Rajagopal
Springer International Publishing , 2023
2023
Citations: 4
In situ decomposition of crop residues using lignocellulolytic microbial consortia: a viable alternative to residue burning
S Bhattacharjya, A Sahu, DH Phalke, MC Manna, JK Thakur, A Mandal, ...
Environmental Science and Pollution Research 28 (25), 32416-32433 , 2021
2021
Citations: 46
A novel offset feed flared monopole quasi‐Yagi high directional UWB antenna
A Nella, A Bhowmick, M Rajagopal
International Journal of RF and Microwave Computer‐Aided Engineering 31 (6 … , 2021
2021
Citations: 13
Rotating Acoustic Reflector Parameter Trade-Off for Near-Outdoor Audio Event Detection
G Bhagwat, S Jayaprakash, A Bhowmick
Innovations in Electrical and Electronic Engineering: Proceedings of ICEEE … , 2021
2021
Identification/segmentation of indian regional languages with singular value decomposition based feature embedding
A Bhowmick, A Biswas, N AnveshKumar, R Kottath
Applied Acoustics 176, 107864 , 2021
2021
Citations: 9
A detailed review on non-orthogonal multiple accessbased spatial modulation systems
KP Jadhav, A Mahor, A Bhowmick, A N
International Journal of Pervasive Computing and Communications 16 (2), 143-164 , 2020
2020
Citations: 6
Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition
A Bhowmick, A Biswas, M Chandra
Pattern Analysis and Applications 23 (2), 527-539 , 2020
2020
Citations: 4
Ad-hoc mobile array based audio segmentation using latent variable stochastic model
SR Chetupalli, A Bhowmick, TV Sreenivas
2019 27th European Signal Processing Conference (EUSIPCO), 1-5 , 2019
2019

MOST CITED SCHOLAR PUBLICATIONS

In situ decomposition of crop residues using lignocellulolytic microbial consortia: a viable alternative to residue burning
S Bhattacharjya, A Sahu, DH Phalke, MC Manna, JK Thakur, A Mandal, ...
Environmental Science and Pollution Research 28 (25), 32416-32433 , 2021
2021
Citations: 46
Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition
A Biswas, PK Sahu, A Bhowmick, M Chandra
International Journal of speech technology 17 (4), 389-399 , 2014
2014
Citations: 37
Speech enhancement using voiced speech probability based wavelet decomposition
A Bhowmick, M Chandra
Computers & Electrical Engineering 62, 706-718 , 2017
2017
Citations: 33
Auditory ERB like admissible wavelet packet features for TIMIT phoneme recognition
PK Sahu, A Biswas, A Bhowmick, M Chandra
Engineering Science and Technology, an International Journal 17 (3), 145-151 , 2014
2014
Citations: 25
Gender classification using pitch and formants
P Kumar, N Jakhanwal, A Bhowmick, M Chandra
Proceedings of the 2011 International Conference on Communication, Computing … , 2011
2011
Citations: 25
Hindi vowel classification using GFCC and formant analysis in sensor mismatch condition
A Biswas, PK Sahu, A Bhowmick, M Chandra
WSEAS Trans Syst 13, 130-143 , 2014
2014
Citations: 21
Speech enhancement using MMSE estimation and spectral subtraction methods
VK Gupta, A Bhowmick, M Chandra, SN Sharan
2011 International Conference on Devices and Communications (ICDeCom), 1-5 , 2011
2011
Citations: 21
Swarm-based hybrid optimization algorithms: an exhaustive analysis and its applications to electricity load and price forecasting: R. Kottath et al.
R Kottath, P Singh, A Bhowmick
Soft computing 27 (19), 14095-14126 , 2023
2023
Citations: 18
Hindi phoneme classification using Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature
A Biswas, PK Sahu, A Bhowmick, M Chandra
Computers & Electrical Engineering 42, 12-22 , 2015
2015
Citations: 17
A novel offset feed flared monopole quasi‐Yagi high directional UWB antenna
A Nella, A Bhowmick, M Rajagopal
International Journal of RF and Microwave Computer‐Aided Engineering 31 (6 … , 2021
2021
Citations: 13
Hindi vowel classification using QCN-MFCC features
S Mishra, A Bhowmick, MC Shrotriya
Perspectives in Science 8, 28-31 , 2016
2016
Citations: 13
Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition
A Bhowmick, M Chandra, A Biswas
International Journal of Speech Technology 20 (4), 813-827 , 2017
2017
Citations: 12
Admissible wavelet packet sub‐band‐based harmonic energy features for Hindi phoneme recognition
A Biswas, PK Sahu, A Bhowmick, M Chandra
IET Signal Processing 9 (6), 511-519 , 2015
2015
Citations: 12
AAM based features for multiple camera visual speech recognition in car environment
A Biswas, PK Sahu, A Bhowmick, M Chandra
Procedia Computer Science 57, 614-621 , 2015
2015
Citations: 11
Multi-scale based approach for denoising real-world noisy image using curvelet thresholding: scope and beyond
SK Panigrahi, SK Tripathy, A Bhowmick, SK Satapathy, P Barsocchi, ...
IEEE Access 12, 25090-25105 , 2024
2024
Citations: 10
VidTIMIT audio visual phoneme recognition using AAM visual features and human auditory motivated acoustic wavelet features
A Biswas, PK Sahu, A Bhowmick, M Chandra
2015 IEEE 2nd International Conference on Recent Trends in Information … , 2015
2015
Citations: 10
Identification/segmentation of indian regional languages with singular value decomposition based feature embedding
A Bhowmick, A Biswas, N AnveshKumar, R Kottath
Applied Acoustics 176, 107864 , 2021
2021
Citations: 9
Audio visual isolated Oriya digit recognition using HMM and DWT
A Biswas, PK Sahu, A Bhowmick, M Chandra
Conference on Advances in Communication and Control Systems (CAC2S 2013 … , 2013
2013
Citations: 9
Energy harvesting techniques and trends in electronic applications
P Mehta, A Gaur, C Kumar, A Nella, A Bhowmick, M Rajagopal
Energy Harvesting Trends for Low Power Compact Electronic Devices, 205-220 , 2023
2023
Citations: 8
Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape
P Jain, A Bhowmick
EURASIP Journal on Audio, Speech, and Music Processing 2025 (1), 10 , 2025
2025
Citations: 7

Dr. Anirban Bhowmick

RESEARCH, TEACHING, or OTHER INTERESTS

Scopus Publications

RECENT SCHOLAR PUBLICATIONS

MOST CITED SCHOLAR PUBLICATIONS