WingBeats and Snapshots: Fusing Sound and Vision for Mosquito Monitoring (Student Abstract) Ahana Chanda, Akshay Agarwal Proceedings of the Aaai Conference on Artificial Intelligence, 2026 Accurate identification of mosquito species is crucial for controlling vector-borne diseases, yet visual or acoustic methods alone are often insufficient. We propose a multimodal deep-learning framework that combines high-resolution images with wingbeat audio using a SwinV2 vision transformer and an Audio Spectrogram Transformer, thereby capturing complementary cues. On a six-species dataset, it achieves 97% accuracy, comparable to the best single-modality baseline, and is designed to improve robustness under noise or environmental variation, demonstrating the value of integrating multiple data sources for reliable mosquito surveillance.
Semantic-Guided Sketch-to-RGB Image Generation via Controlled Diffusion for Improved Sketch Recognition (Student Abstract) Ritika Jain, Atul Kumar, Akshay Agarwal Proceedings of the Aaai Conference on Artificial Intelligence, 2026 Although deep networks excel on RGB images, their performance degrades sharply under severe domain shifts—such as sketch recognition, where color and texture cues are missing. In this work, we propose a novel pipeline that leverages semantic cues extracted from sketches to guide the synthesis of photorealistic RGB images using diffusion-based generative models. Our framework operates by extracting two crucial cues from the input sketch: semantic captions via the BLIP model and structural outlines via Canny edge detection. These cues are then integrated using ControlNet to guide a Stable Diffusion model, ensuring the synthesized RGB image is both semantically consistent with the content and structurally faithful to the original sketch. We evaluated our synthesized images by benchmarking classification performance. We trained standard architectures (from convolutional to transformer-based) on Tiny-ImageNet subsets and tested them on sketches, their synthesized counterparts, and the original RGB images. Experimental results demonstrate that our approach produces realistic, identity-preserving images, which significantly improve classification accuracy and effectively bridge the semantic gap. While BLIP-based captioning and ControlNet-guided diffusion are established methods, our contribution lies in their integration into a unified, caption-guided pipeline that enhances sketch-to-RGB translation with improved semantic consistency. The proposed method generalizes well across architectures, providing a scalable and cost-efficient solution for sketch-based image synthesis.
Guarding Digital Identity: Attention-Guided Fusion for Detecting Forged ID Documents (Student Abstract) Gargi Surendra Yeole, Poulomi Bhattacharya, Akshay Agarwal Proceedings of the Aaai Conference on Artificial Intelligence, 2026 Government verification systems are increasingly relying on internet-based platforms, where users authenticate their identities by uploading images captured with ordinary mobile devices. However, the rapid advancements in generative algorithms have enabled the creation of highly realistic forged ID cards that can easily bypass such verification pipelines. These forgeries are not restricted to a single modality; they may target facial imagery, textual content, or both, posing significant challenges to existing detection approaches. We present a framework that analyzes visual features for ID forgery detection by integrating feature fusion with attention mechanisms, leveraging both convolutional neural network (CNN) architectures, such as ResNet-50 and EfficientNet, and transformer-based models, including ViT-16 and Swin Transformer. This study emphasises the significance of feature fusion and attention-driven representation learning in developing robust and trustworthy ID forgery detection systems for real-world deployment.
Improving CAPTCHA Robustness via Controlled Image Corruptions (Student Abstract) Suchetan G. Uppur, Ashish Kumar, Akshay Agarwal Proceedings of the Aaai Conference on Artificial Intelligence, 2026 The Completely Automated Public Turing test to Tell Computers and Humans Apart (CAPTCHA) is widely deployed on the web as a security mechanism to distinguish humans from automated bots. However, their robustness is being challenged by the rapid advancements in AI, with models capable of near-human level character recognition rendering CAPTCHA obsolete. This research aims to systematically study the effect of multiple image corruptions, including elastic transformations, blur, noise, and occlusions, on human readability and automated solvers in text-based CAPTCHA recognition. We conduct experiments on multimodal large language models (MLLMs), a traditional deep learning-based optical character recognition (OCR) system, and human subjects. Using an existing CAPTCHA dataset and artificially corrupted versions, we analyze the recognition performance of AI models and humans, identifying vulnerabilities and patterns of robustness. The findings will contribute to a better understanding of CAPTCHA vulnerabilities and explore potential methods to increase the robustness of CAPTCHA in the era of advanced AI models.
Q-MoFusion: A Quantum Classifier for Mosquito Species Classification (StudeAbstract) Vishesh Kumar, Ahana Chanda, Poulomi Bhattacharya, Akshay Agarwal Proceedings of the Aaai Conference on Artificial Intelligence, 2026 Automated mosquito species identification is critical for combating vector-borne diseases. We introduce Q-MoFusion, a novel hybrid quantum-classical framework that fuses deep features from pre-trained Audio Spectrogram Transformer (AST) and Whisper models using a Variational Quantum Circuit (VQC). Our approach significantly outperforms individual backbones and prior state-of-the-art benchmarks, demonstrating superior accuracy and robustness, particularly on imbalanced classes. Q-MoFusion demonstrates the potential of hybrid quantum computing to enhance bioacoustic surveillance for addressing critical public health challenges.
A Multi-modal Framework to Counter Hate Speeches Kirtilekha Bhesra, Akshay Agarwal Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2025
Unravelling robustness of deep learning based face recognition against adversarial attacks 32nd Aaai Conference on Artificial Intelligence Aaai 2018, 2018
Face anti-spoofing using Haralick features Akshay Agarwal, Richa Singh, Mayank Vatsa 2016 IEEE 8th International Conference on Biometrics Theory Applications and Systems Btas 2016, 2016
The unseen adversaries: Robust and generalized defense against adversarial patches V Kumar, A Agarwal arXiv preprint arXiv:2604.26317 , 2026 2026 Citations: 3
Guarding Digital Identity: Attention-Guided Fusion for Detecting Forged ID Documents (Student Abstract) GS Yeole, P Bhattacharya, A Agarwal Proceedings of the AAAI Conference on Artificial Intelligence 40 (48), 41400 … , 2026 2026
WingBeats and Snapshots: Fusing Sound and Vision for Mosquito Monitoring (Student Abstract) A Chanda, A Agarwal Proceedings of the AAAI Conference on Artificial Intelligence 40 (48), 41154 … , 2026 2026
Semantic-Guided Sketch-to-RGB Image Generation via Controlled Diffusion for Improved Sketch Recognition (Student Abstract) R Jain, A Kumar, A Agarwal Proceedings of the AAAI Conference on Artificial Intelligence 40 (48), 41231 … , 2026 2026
Guarding Digital Identity: Attention-Guided Fusion for Detecting Forged ID Documents GS Yeole, P Bhattacharya, A Agarwal 2026
Navigating in the Dark: A Multimodal Framework and Dataset for Nighttime Traffic Sign Recognition A Mishra, A Agarwal, H Lone arXiv preprint arXiv:2511.17183 , 2025 2025
Unmasking the Audio Illusion: A Survey on Spoofing and Deepfake Detection S Aarthi, A Agarwal 2025 IEEE International Joint Conference on Biometrics (IJCB), 1-11 , 2025 2025
Robustness Benchmarking of Convolutional and Transformer Architectures for Image Classification V Kumar, S Shukla, A Agarwal IEEE Transactions on Big Data , 2025 2025 Citations: 3
Explainable, trustworthy, and responsible AI in image processing A Agarwal Frontiers in Signal Processing 5, 1628390 , 2025 2025
Family Resemblance or Fraud? Face Morphing Attacks on Kinship Verification GS Yeole, S Aarthi, S Srivastav, A Agarwal 2025 IEEE 19th International Conference on Automatic Face and Gesture … , 2025 2025
Gesture Recognition for Emergencies: Dataset and Cross-Condition Analysis J Sinha, P Bhattacharya, A Agarwal 2025 IEEE 19th International Conference on Automatic Face and Gesture … , 2025 2025
Your Face, Your Privacy: Combating Unauthorized Usage A Kumar, A Agarwal, N Ratha 2025 IEEE 19th International Conference on Automatic Face and Gesture … , 2025 2025
Detection of identity swapping attacks in low-resolution image settings A Agarwal, N Ratha Journal of Information Security and Applications 89, 103911 , 2025 2025 Citations: 3
Identity in the Blood Relation: Unraveling the Complexity of Morph Detection in Kinship Biometrics S Srivastav, P Bhattacharya, A Agarwal, N Ratha 2025
Brain Matters: Enhancing Tumor Classification via CNN and Vision-Language Fusion CK Ganesh, A Agarwal 2025
On Visual Saliency Maps for Identifying Fidelity of Deepfake Detection Datasets A Banerjee, S Das, A Agarwal 2025
On Adversarial Robustness of Face Presentation Attack Detection Algorithms A Agarwal, M Vatsa, R Singh Proceedings of the IEEE/CVF International Conference on Computer Vision … , 2025 2025
Advancing Facial Age Progression for Occluded Faces A Birla, A Agarwal Proceedings of the Computer Vision and Pattern Recognition Conference, 5614-5622 , 2025 2025
A unified, resilient, and explainable adversarial patch detector V Kumar, A Agarwal Proceedings of the Computer Vision and Pattern Recognition Conference, 30387 … , 2025 2025 Citations: 7
On which data distribution (synthetic or real) we should rely for soft biometric classification A Kumar, A Agarwal Proceedings of the Winter Conference on Applications of Computer Vision … , 2025 2025 Citations: 2
MOST CITED SCHOLAR PUBLICATIONS
Unravelling robustness of deep learning based face recognition against adversarial attacks G Goswami, N Ratha, A Agarwal, R Singh, M Vatsa Proceedings of the AAAI Conference on Artificial Intelligence 32 (1) , 2018 2018 Citations: 235
Face anti-spoofing using haralick features A Agarwal, R Singh, M Vatsa 2016 IEEE 8th International Conference on Biometrics Theory, Applications … , 2016 2016 Citations: 153
Detecting and mitigating adversarial perturbations for robust face recognition G Goswami, A Agarwal, N Ratha, R Singh, M Vatsa International Journal of Computer Vision, 1-24 , 2019 2019 Citations: 143
Face Presentation Attack with Latex Masks in Multispectral Videos A Agarwal, D Yadav, N Kohli, R Singh, M Vatsa, A Noore IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , 2017 2017 Citations: 139
Face anti-spoofing with multifeature videolet aggregation TA Siddiqui, S Bharadwaj, TI Dhamecha, A Agarwal, M Vatsa, R Singh, ... 2016 23rd International Conference on Pattern Recognition (ICPR), 1035-1040 , 2016 2016 Citations: 124
On the Robustness of Face Recognition Algorithms Against Attacks and Bias R Singh, A Agarwal, M Singh, S Nagpal, M Vatsa Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) , 2020 2020 Citations: 104
Swapped! digital face presentation attack detection via weighted local magnitude pattern A Agarwal, R Singh, M Vatsa, A Noore 2017 IEEE International Joint Conference on Biometrics (IJCB), 659-665 , 2017 2017 Citations: 93
Image transformation-based defense against adversarial perturbation on deep learning models A Agarwal, R Singh, M Vatsa, N Ratha IEEE Transactions on Dependable and Secure Computing 18 (5), 2106-2121 , 2020 2020 Citations: 90
Fusion of handcrafted and deep learning features for large-scale multiple iris presentation attack detection D Yadav, N Kohli, A Agarwal, M Vatsa, R Singh, A Noore Proceedings of the IEEE conference on computer vision and pattern … , 2018 2018 Citations: 76
Are image-agnostic universal adversarial perturbations for face recognition difficult to detect? A Agarwal, R Singh, M Vatsa, N Ratha 2018 IEEE 9th International Conference on Biometrics Theory, Applications … , 2018 2018 Citations: 75
Smartbox: Benchmarking adversarial detection and mitigation algorithms for face recognition A Goel, A Singh, A Agarwal, M Vatsa, R Singh 2018 IEEE 9th international conference on biometrics theory, applications … , 2018 2018 Citations: 68
DeepRing: Protecting Deep Neural Network with Blockchain A Goel, A Agarwal, M Vatsa, R Singh, N Ratha CVPR Workshop on When Blockchain Meets Computer Vision and Artificial … , 2019 2019 Citations: 67
Evading Face Recognition via Partial Tampering of Faces P Majumdar, A Agarwal, R Singh, M Vatsa CVPR Workshop on The Bright and Dark Sides of Computer Vision: Challenges … , 2019 2019 Citations: 51
Securing CNN Model and Biometric Template using Blockchain A Goel, A Agarwal, M Vatsa, R Singh, N Ratha IEEE International Conference on Biometrics: Theory, Applications and … , 2019 2019 Citations: 49
Motion magnified 3-d residual-in-dense network for deepfake detection A Mehra, A Agarwal, M Vatsa, R Singh IEEE Transactions on Biometrics, Behavior, and Identity Science 5 (1), 39-52 , 2022 2022 Citations: 45
Cognitive data augmentation for adversarial defense via pixel masking A Agarwal, M Vatsa, R Singh, N Ratha Pattern Recognition Letters 146, 244-251 , 2021 2021 Citations: 43
MD-CSDNetwork: Multi-domain cross stitched network for deepfake detection A Agarwal, A Agarwal, S Sinha, M Vatsa, R Singh 2021 16th IEEE international conference on automatic face and gesture … , 2021 2021 Citations: 42
Noise is Inside Me! Generating Adversarial Perturbations with Noise Derived from Natural Filters A Agarwal, M Vatsa, R Singh, NK Ratha IEEE CVPR Workshop on adversarial machine learning in computer vision (CVPRW) , 2020 2020 Citations: 38
DNDNet: Reconfiguring CNN for Adversarial Robustness A Goel, A Agarwal, M Vatsa, R Singh, NK Ratha IEEE CVPR Workshop on fair, data efficient and trusted computer vision (CVPRW) , 2020 2020 Citations: 36
Generalized contact lens iris presentation attack detection A Agarwal, A Noore, M Vatsa, R Singh IEEE Transactions on Biometrics, Behavior, and Identity Science 4 (3), 373-385 , 2022 2022 Citations: 32