WingBeats and Snapshots: Fusing Sound and Vision for Mosquito Monitoring (Student Abstract) Ahana Chanda, Akshay Agarwal Proceedings of the Aaai Conference on Artificial Intelligence, 2026 Accurate identification of mosquito species is crucial for controlling vector-borne diseases, yet visual or acoustic methods alone are often insufficient. We propose a multimodal deep-learning framework that combines high-resolution images with wingbeat audio using a SwinV2 vision transformer and an Audio Spectrogram Transformer, thereby capturing complementary cues. On a six-species dataset, it achieves 97% accuracy, comparable to the best single-modality baseline, and is designed to improve robustness under noise or environmental variation, demonstrating the value of integrating multiple data sources for reliable mosquito surveillance.
Semantic-Guided Sketch-to-RGB Image Generation via Controlled Diffusion for Improved Sketch Recognition (Student Abstract) Ritika Jain, Atul Kumar, Akshay Agarwal Proceedings of the Aaai Conference on Artificial Intelligence, 2026 Although deep networks excel on RGB images, their performance degrades sharply under severe domain shifts—such as sketch recognition, where color and texture cues are missing. In this work, we propose a novel pipeline that leverages semantic cues extracted from sketches to guide the synthesis of photorealistic RGB images using diffusion-based generative models. Our framework operates by extracting two crucial cues from the input sketch: semantic captions via the BLIP model and structural outlines via Canny edge detection. These cues are then integrated using ControlNet to guide a Stable Diffusion model, ensuring the synthesized RGB image is both semantically consistent with the content and structurally faithful to the original sketch. We evaluated our synthesized images by benchmarking classification performance. We trained standard architectures (from convolutional to transformer-based) on Tiny-ImageNet subsets and tested them on sketches, their synthesized counterparts, and the original RGB images. Experimental results demonstrate that our approach produces realistic, identity-preserving images, which significantly improve classification accuracy and effectively bridge the semantic gap. While BLIP-based captioning and ControlNet-guided diffusion are established methods, our contribution lies in their integration into a unified, caption-guided pipeline that enhances sketch-to-RGB translation with improved semantic consistency. The proposed method generalizes well across architectures, providing a scalable and cost-efficient solution for sketch-based image synthesis.
Guarding Digital Identity: Attention-Guided Fusion for Detecting Forged ID Documents (Student Abstract) Gargi Surendra Yeole, Poulomi Bhattacharya, Akshay Agarwal Proceedings of the Aaai Conference on Artificial Intelligence, 2026 Government verification systems are increasingly relying on internet-based platforms, where users authenticate their identities by uploading images captured with ordinary mobile devices. However, the rapid advancements in generative algorithms have enabled the creation of highly realistic forged ID cards that can easily bypass such verification pipelines. These forgeries are not restricted to a single modality; they may target facial imagery, textual content, or both, posing significant challenges to existing detection approaches. We present a framework that analyzes visual features for ID forgery detection by integrating feature fusion with attention mechanisms, leveraging both convolutional neural network (CNN) architectures, such as ResNet-50 and EfficientNet, and transformer-based models, including ViT-16 and Swin Transformer. This study emphasises the significance of feature fusion and attention-driven representation learning in developing robust and trustworthy ID forgery detection systems for real-world deployment.
Improving CAPTCHA Robustness via Controlled Image Corruptions (Student Abstract) Suchetan G. Uppur, Ashish Kumar, Akshay Agarwal Proceedings of the Aaai Conference on Artificial Intelligence, 2026 The Completely Automated Public Turing test to Tell Computers and Humans Apart (CAPTCHA) is widely deployed on the web as a security mechanism to distinguish humans from automated bots. However, their robustness is being challenged by the rapid advancements in AI, with models capable of near-human level character recognition rendering CAPTCHA obsolete. This research aims to systematically study the effect of multiple image corruptions, including elastic transformations, blur, noise, and occlusions, on human readability and automated solvers in text-based CAPTCHA recognition. We conduct experiments on multimodal large language models (MLLMs), a traditional deep learning-based optical character recognition (OCR) system, and human subjects. Using an existing CAPTCHA dataset and artificially corrupted versions, we analyze the recognition performance of AI models and humans, identifying vulnerabilities and patterns of robustness. The findings will contribute to a better understanding of CAPTCHA vulnerabilities and explore potential methods to increase the robustness of CAPTCHA in the era of advanced AI models.
Q-MoFusion: A Quantum Classifier for Mosquito Species Classification (StudeAbstract) Vishesh Kumar, Ahana Chanda, Poulomi Bhattacharya, Akshay Agarwal Proceedings of the Aaai Conference on Artificial Intelligence, 2026 Automated mosquito species identification is critical for combating vector-borne diseases. We introduce Q-MoFusion, a novel hybrid quantum-classical framework that fuses deep features from pre-trained Audio Spectrogram Transformer (AST) and Whisper models using a Variational Quantum Circuit (VQC). Our approach significantly outperforms individual backbones and prior state-of-the-art benchmarks, demonstrating superior accuracy and robustness, particularly on imbalanced classes. Q-MoFusion demonstrates the potential of hybrid quantum computing to enhance bioacoustic surveillance for addressing critical public health challenges.
Editorial: Explainable, trustworthy, and responsible AI in image processing Akshay Agarwal Frontiers in Signal Processing, 2025 The tremendous growth of deep learning models, especially the success of generative AI and foundation models, has led to their deployment in several critical sectors, including biometric recognition, healthcare, language processing, and security. While these models see huge success, the concern around ethics, copyright, and privacy raises serious concerns; therefore, the addressing of the points related to their decision making (explainability of the decision process), trustworthy in dealing the adversaries with explainability (Kumar et al. [CVPR'25]), and privacy through responsible AI especially in biometric recognition (Singh et al. [AAAI'20]), is critical.The articles featured in this Research Topic illustrate a rapid expansion and dynamic growth of this field of knowledge by presenting novel machine intelligence technologies, including deep learning and generative AI. The research works published aim to advance biometric recognition, including child face recognition and its application in forensics, healthcare, including assessing lung field through developing novel deep learning algorithms for Chest X-Ray and assessing creativity of an individual based on their drawing, and computer vision through advancing video processing, including summarization of videos.The articles demonstrate groundbreaking research combining advanced signal processing techniques with machine intelligence to address critical challenges in computer vision, face recognition, and healthcare. From creativity assessment to lung profile to de-aging to advancing child face recognition, the articles presented in this editorial emphasize innovation and practical usage.Apart from security and bias/privacy concerns (Singh et al. [AAAI'20], Goswami et al. [IJCV'19]), one of the prominent concern of current face recognition technology is that they are highly ineffective in processing the face images of child and their vulnerability in performing face recognition where the age difference between the gallery and probe image is high. Falkenberg et al. [2024] counter the limitations of limited exploration of child face recognition by presenting a large-scale database of children's faces by using generative adversarial networks (GANs) and face-age progression (FAP) models to synthesize a realistic dataset referred to as "HDA-SynChildFaces". The resulting HDA-SynChildFaces consists of 1,652 subjects and 188,328 images, each subject being present at various ages and with many different intra-subject variations. As asserted, the EER in the younger age group (4-1 and 7-4) drastically increases compared to the bigger age groups (20+ and 16-13). While the drastic increase has been noticed against deep learning-based face recognition algorithms, ArcFace and MagFace, the increase is not as sharp with a commercial off-the-shelf (COTS) system. Furthermore, there is a significant bias across age groups where the models are highly effective in dealing with male faces compared to female faces, except for age-group 4-1. It was also observed how black and Asian race subjects generally performed worse than white and Latino-Hispanic subjects. On the other hand, Martis et al. [2024] advance the forensic system by performing de-aging on the faces and presenting a sketch generation algorithm to increase the accuracy between the sketch and RGB images. For the de-aging, deepfake technology has been used; whereas, for the real-life-like sketch generation, the pix-to-pix approach has been utilized. The results presented for the different age groups from age 20 to 70 in the interval of 10 demonstrate that the generated images have higher image quality in terms of FID, SSIM, and PSNR.The above collection of articles significantly helps advance face recognition with a perspective of varying age groups, whether aiming to help forensic professionals or lower the performance gap as the age gap between gallery and probe images increases.Compared to other articles, Yang et al. [2025] present a study to advance the healthcare system by automated processing of chest X-ray images. It is asserted that the X-ray is the most widely used primary chest imaging technique as it is widely available, low-cost, has a fast imaging speed, and is easy to acquire. Medical image registration technology is a crucial step and pillar problem in medical image analysis for aligning the source image (moving image) with the target image (fixed image). The work presents a fully automatic three-stage registration pipeline to find the deformation fields of the point-to-point correspondence between the source and target images. Visual differences among the dynamic chest X-Ray (CXR), lung field, and registration images of the source and target images help explain the proposed approach's effectiveness and provide the analysis trustworthiness, especially in medical images. By highlighting the limitation of insufficient dynamic CXR images, the article demands that researchers collect more dynamic CXR images.Creativity assessment evaluates an individual's creative thinking abilities and capacity to generate novel and valuable ideas. Panfilova et al. [2024] performed a benchmark study to identify the creativity of different individuals based on their drawings. The authors have used multiple deep convolutional neural networks, including AlexNet, GoogLeNet, and MobileNet-V2. Further, to ensure the assessment is trustworthy, the work performed the Grad-CAM analysis of models by checking the most relevant features in drawings that influence the model's prediction. On the other hand, in this vast collection of editorial, Tsigos et al. [2024] presented a video summarization algorithm by looking at the pain of understanding and even seeing the large video. Traditionally, this laborious and time-consuming task requires a professional video editor to watch the entire content and decide the parts of it that should be included in the summary. The work adapts the LIME method by operating it on sequences of video frames rather than on a single frame/image. To ensure the generated summary is explainable, authors integrate fragment-and object-level explanation methods into a framework for multi-granular explanation of video summarization. In particular, our framework can provide fragment-level explanations that show the video's temporal pieces that influenced the summarizer's decisions the most.The above collection of articles significantly helps advance computer vision algorithms by presenting novel video processing algorithms and image processing approaches to assess creativity, which can later be used to diagnose illnesses. The articles also show future directions in improving the field, such as using vision-language models for a textual description of the images.Author contributions AA: Writing-original draft, Writing-review and editing.
Advancing Facial Age Progression for Occluded Faces Ankit Birla, Akshay Agarwal IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2025 It is observed that face recognition is highly vulnerable when the age gap between the gallery and the probe images is drastically high. This phenomenon is a universal concern since acquiring gallery images at multiple age intervals might not always be possible. Therefore, accurate age progression is an ideal solution to mitigate this age gap and boost face recognition performance. It is observed that the existing age progression algorithms are vulnerable to occlusion. Keeping this in mind, this paper presents a novel approach to facial age progression, particularly addressing the challenge of occluded faces. The objects occluding the face's key points are first detected using segment anything and later inpainted using transformer architecture to improve the age progression. We compare our results against state-of-the-art models across various age clusters (e.g., 0 3, 15-19, and 50-69), demonstrating superior performance in terms of age progression and retaining identity, gender, and age attributes. The proposed work significantly improves facial age progression's robustness and visual quality, enhancing its applicability in security systems, forensic analysis, and other fields requiring precise age prediction.
A Multi-modal Framework to Counter Hate Speeches Kirtilekha Bhesra, Akshay Agarwal Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2025
Unravelling robustness of deep learning based face recognition against adversarial attacks 32nd Aaai Conference on Artificial Intelligence Aaai 2018, 2018
Face anti-spoofing using Haralick features Akshay Agarwal, Richa Singh, Mayank Vatsa 2016 IEEE 8th International Conference on Biometrics Theory Applications and Systems Btas 2016, 2016
Face anti-spoofing with multifeature videolet aggregation Talha Ahmad Siddiqui, Samarth Bharadwaj, Tejas I. Dhamecha, Akshay Agarwal, Mayank Vatsa, Richa Singh, Nalini Ratha Proceedings International Conference on Pattern Recognition, 2016
The unseen adversaries: Robust and generalized defense against adversarial patches V Kumar, A Agarwal arXiv preprint arXiv:2604.26317 , 2026 2026 Citations: 3
Guarding Digital Identity: Attention-Guided Fusion for Detecting Forged ID Documents (Student Abstract) GS Yeole, P Bhattacharya, A Agarwal Proceedings of the AAAI Conference on Artificial Intelligence 40 (48), 41400 … , 2026 2026
WingBeats and Snapshots: Fusing Sound and Vision for Mosquito Monitoring (Student Abstract) A Chanda, A Agarwal Proceedings of the AAAI Conference on Artificial Intelligence 40 (48), 41154 … , 2026 2026
Semantic-Guided Sketch-to-RGB Image Generation via Controlled Diffusion for Improved Sketch Recognition (Student Abstract) R Jain, A Kumar, A Agarwal Proceedings of the AAAI Conference on Artificial Intelligence 40 (48), 41231 … , 2026 2026
Guarding Digital Identity: Attention-Guided Fusion for Detecting Forged ID Documents GS Yeole, P Bhattacharya, A Agarwal 2026
Navigating in the Dark: A Multimodal Framework and Dataset for Nighttime Traffic Sign Recognition A Mishra, A Agarwal, H Lone arXiv preprint arXiv:2511.17183 , 2025 2025
Unmasking the Audio Illusion: A Survey on Spoofing and Deepfake Detection S Aarthi, A Agarwal 2025 IEEE International Joint Conference on Biometrics (IJCB), 1-11 , 2025 2025
Robustness Benchmarking of Convolutional and Transformer Architectures for Image Classification V Kumar, S Shukla, A Agarwal IEEE Transactions on Big Data , 2025 2025 Citations: 3
Explainable, trustworthy, and responsible AI in image processing A Agarwal Frontiers in Signal Processing 5, 1628390 , 2025 2025
Family Resemblance or Fraud? Face Morphing Attacks on Kinship Verification GS Yeole, S Aarthi, S Srivastav, A Agarwal 2025 IEEE 19th International Conference on Automatic Face and Gesture … , 2025 2025
Gesture Recognition for Emergencies: Dataset and Cross-Condition Analysis J Sinha, P Bhattacharya, A Agarwal 2025 IEEE 19th International Conference on Automatic Face and Gesture … , 2025 2025
Your Face, Your Privacy: Combating Unauthorized Usage A Kumar, A Agarwal, N Ratha 2025 IEEE 19th International Conference on Automatic Face and Gesture … , 2025 2025
Detection of identity swapping attacks in low-resolution image settings A Agarwal, N Ratha Journal of Information Security and Applications 89, 103911 , 2025 2025 Citations: 3
Identity in the Blood Relation: Unraveling the Complexity of Morph Detection in Kinship Biometrics S Srivastav, P Bhattacharya, A Agarwal, N Ratha 2025
Brain Matters: Enhancing Tumor Classification via CNN and Vision-Language Fusion CK Ganesh, A Agarwal 2025
On Visual Saliency Maps for Identifying Fidelity of Deepfake Detection Datasets A Banerjee, S Das, A Agarwal 2025
On Adversarial Robustness of Face Presentation Attack Detection Algorithms A Agarwal, M Vatsa, R Singh Proceedings of the IEEE/CVF International Conference on Computer Vision … , 2025 2025
Advancing Facial Age Progression for Occluded Faces A Birla, A Agarwal Proceedings of the Computer Vision and Pattern Recognition Conference, 5614-5622 , 2025 2025
A unified, resilient, and explainable adversarial patch detector V Kumar, A Agarwal Proceedings of the Computer Vision and Pattern Recognition Conference, 30387 … , 2025 2025 Citations: 7
On which data distribution (synthetic or real) we should rely for soft biometric classification A Kumar, A Agarwal Proceedings of the Winter Conference on Applications of Computer Vision … , 2025 2025 Citations: 2
MOST CITED SCHOLAR PUBLICATIONS
Unravelling robustness of deep learning based face recognition against adversarial attacks G Goswami, N Ratha, A Agarwal, R Singh, M Vatsa Proceedings of the AAAI Conference on Artificial Intelligence 32 (1) , 2018 2018 Citations: 235
Face anti-spoofing using haralick features A Agarwal, R Singh, M Vatsa 2016 IEEE 8th International Conference on Biometrics Theory, Applications … , 2016 2016 Citations: 153
Detecting and mitigating adversarial perturbations for robust face recognition G Goswami, A Agarwal, N Ratha, R Singh, M Vatsa International Journal of Computer Vision, 1-24 , 2019 2019 Citations: 143
Face Presentation Attack with Latex Masks in Multispectral Videos A Agarwal, D Yadav, N Kohli, R Singh, M Vatsa, A Noore IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , 2017 2017 Citations: 139
Face anti-spoofing with multifeature videolet aggregation TA Siddiqui, S Bharadwaj, TI Dhamecha, A Agarwal, M Vatsa, R Singh, ... 2016 23rd International Conference on Pattern Recognition (ICPR), 1035-1040 , 2016 2016 Citations: 124
On the Robustness of Face Recognition Algorithms Against Attacks and Bias R Singh, A Agarwal, M Singh, S Nagpal, M Vatsa Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) , 2020 2020 Citations: 104
Swapped! digital face presentation attack detection via weighted local magnitude pattern A Agarwal, R Singh, M Vatsa, A Noore 2017 IEEE International Joint Conference on Biometrics (IJCB), 659-665 , 2017 2017 Citations: 93
Image transformation-based defense against adversarial perturbation on deep learning models A Agarwal, R Singh, M Vatsa, N Ratha IEEE Transactions on Dependable and Secure Computing 18 (5), 2106-2121 , 2020 2020 Citations: 90
Fusion of handcrafted and deep learning features for large-scale multiple iris presentation attack detection D Yadav, N Kohli, A Agarwal, M Vatsa, R Singh, A Noore Proceedings of the IEEE conference on computer vision and pattern … , 2018 2018 Citations: 76
Are image-agnostic universal adversarial perturbations for face recognition difficult to detect? A Agarwal, R Singh, M Vatsa, N Ratha 2018 IEEE 9th International Conference on Biometrics Theory, Applications … , 2018 2018 Citations: 75
Smartbox: Benchmarking adversarial detection and mitigation algorithms for face recognition A Goel, A Singh, A Agarwal, M Vatsa, R Singh 2018 IEEE 9th international conference on biometrics theory, applications … , 2018 2018 Citations: 68
DeepRing: Protecting Deep Neural Network with Blockchain A Goel, A Agarwal, M Vatsa, R Singh, N Ratha CVPR Workshop on When Blockchain Meets Computer Vision and Artificial … , 2019 2019 Citations: 67
Evading Face Recognition via Partial Tampering of Faces P Majumdar, A Agarwal, R Singh, M Vatsa CVPR Workshop on The Bright and Dark Sides of Computer Vision: Challenges … , 2019 2019 Citations: 51
Securing CNN Model and Biometric Template using Blockchain A Goel, A Agarwal, M Vatsa, R Singh, N Ratha IEEE International Conference on Biometrics: Theory, Applications and … , 2019 2019 Citations: 49
Motion magnified 3-d residual-in-dense network for deepfake detection A Mehra, A Agarwal, M Vatsa, R Singh IEEE Transactions on Biometrics, Behavior, and Identity Science 5 (1), 39-52 , 2022 2022 Citations: 45
Cognitive data augmentation for adversarial defense via pixel masking A Agarwal, M Vatsa, R Singh, N Ratha Pattern Recognition Letters 146, 244-251 , 2021 2021 Citations: 43
MD-CSDNetwork: Multi-domain cross stitched network for deepfake detection A Agarwal, A Agarwal, S Sinha, M Vatsa, R Singh 2021 16th IEEE international conference on automatic face and gesture … , 2021 2021 Citations: 42
Noise is Inside Me! Generating Adversarial Perturbations with Noise Derived from Natural Filters A Agarwal, M Vatsa, R Singh, NK Ratha IEEE CVPR Workshop on adversarial machine learning in computer vision (CVPRW) , 2020 2020 Citations: 38
DNDNet: Reconfiguring CNN for Adversarial Robustness A Goel, A Agarwal, M Vatsa, R Singh, NK Ratha IEEE CVPR Workshop on fair, data efficient and trusted computer vision (CVPRW) , 2020 2020 Citations: 36
Generalized contact lens iris presentation attack detection A Agarwal, A Noore, M Vatsa, R Singh IEEE Transactions on Biometrics, Behavior, and Identity Science 4 (3), 373-385 , 2022 2022 Citations: 32