Enhancing image captioning with asynchronous dual attention vision transformer Biswajit Patra, Dakshina Ranjan Kisku Intelligent Data Analysis, 2026 This paper proposes a unique approach to enhance image captioning by leveraging an Asynchronous Dual Attention (ADA) mechanism within a Vision Transformer (ViT) based framework. Traditional deep-learning models for image captioning often struggle with multimodal interactions and capturing local-to-global visual contexts, including both prominent and subtle features. To address this, the proposed model integrates global self-attention (ViT-B/16) with a Joint Calibration Module during image encoding to enhance the quality of visual embeddings and combines dynamic step-wise attention (Bahdanau) with a Gated Recurrent Unit (GRU) during decoding. This forms an ADA pipeline that decouples visual and linguistic pathways, allowing adaptive refinement of visual features and more precise alignment with linguistic context. Unlike synchronous attention models, ADA enables dynamic image region selection and improved spatial reasoning through enhanced multimodal interaction, leading to more contextually coherent and informative captions for complex visual scenes. The proposed approach demonstrates consistent improvement over state-of-the-art methods on benchmark datasets, achieving CIDEr scores of 0.946 and 1.364 and SPICE scores of 0.188 and 0.248 for Flickr 30k and MSCOCO datasets, respectively. Additionally, the framework incorporates Google’s text-to-speech synthesis to generate audio captions, enhancing accessibility for visually impaired users.
Non-invasive anaemia detection based on palm pallor video using tree-structured 3D CNN and vision transformer models Abhishek Kesarwani, Sunanda Das, Dakshina Ranjan Kisku, Mamata Dalui Journal of Experimental and Theoretical Artificial Intelligence, 2025 Anaemia is a common disease that affects billions of people worldwide and is caused due to low blood haemoglobin level. According to WHO statistics, anaemia is the most prevalent in developing and underdeveloped countries. Conventional invasive methods are prohibitively expensive and difficult to administer globally, necessitating a non-invasive, low-cost, and user-friendly solution. This study aims to develop a non-invasive anaemia detection system by combining cutting-edge computational approaches with the age-old practice of estimating blood haemoglobin levels by observing pallor in the palm. The proposed system operates on the basis of inducing changes in palm pallor with appropriate pressure application and release, measuring the rate of colour changes, and performing time-domain analysis to correlate with blood haemoglobin concentration. The video of colour changes in the palm caused by a customised device is captured using a smartphone camera and processed and analysed using deep learning models based on tree-structured 3-Dimensional Convolutional Neural Network (3D CNN) and Vision Transformer (ViT) for accurate estimation of haemoglobin levels. The proposed system ensures a sensitivity, specificity, accuracy and RMSE of 96.87%, 90.90%, 94.44% and 0.495, respectively, while run on a dataset consisting of palm pallor video samples of 531 individuals.
Exploring Bengali Image Descriptions through the combination of diverse CNN Architectures and Transformer Decoders Biswajit Patra, D. Kisku Turkish Journal of Engineering, 2025 Image captioning produces a description for a query image that closely resembles human-generated text and recently researchers have expressed significant interest in this domain. Research works published in this particular domain are mostly based on the English language, where CNN and RNN or its variants are used as encoder and decoder models respectively, with attention-based approaches as enhancement mechanisms. Currently, Bengali stands as the sixth most-spoken native language and the seventh most widely spoken language. Despite the growing interest in language research, the Bengali context has received very little attention in comparison to other resource-rich languages like English. The objective of this study is to address the research gap in describing image visuals in Bengali by introducing a novel image captioning approach. Utilizing the strengths of state-of-the-art Convolutional Neural Networks, such as EfficientNetV2s, ConvNeXt-Small, and InceptionResNetV2 in conjunction with an improvised Transformer, the proposed image captioning system achieves desirable computational efficiency while producing accurate and contextually relevant captions. Further, to help the visually impaired Bengali-speaking population in order to understand the surrounding environment and image visuals in an efficient way, Bengali text-to-speech synthesis in the proposed framework is integrated. The evaluation of the proposed model is performed with Bengali descriptions which are obtained from the ‘Ban-Cap’ dataset and the corresponding images which are obtained from the Flickr 8k dataset. Utilizing EfficientNet the proposed model attains METEOR, CIDEr, and ROUGE scores of 0.34, 0.30, and 0.40 while BLEU scores for unigram, bigram, trigram, and four-gram matching are 0.66, 0.59, 0.44 and 0.26 respectively. The investigation aims to assess the quality of the model-generated description concerning human-annotated ground truth description based on several evaluation metrics and achieves desirable effectiveness to other state-of-the-art models for Bengali description generation.
Biometric-based computer vision for boundless possibilities: Process, techniques, and challenges Intelligent Multimedia Processing and Computer Vision Techniques and Applications, 2023
Preface Maurice Dawson, Dakshina Ranjan Kisku, Phalguni Gupta, Jamuna Kanta Sing, Weifeng Li Developing Next Generation Countermeasures for Homeland Security Threat Prevention, 2016
Preface Advances in Biometrics for Secure Human Authentication and Recognition, 2013
Advances in biometrics for secure human authentication and recognition Advances in Biometrics for Secure Human Authentication and Recognition, 2013
Probabilistic approach to face recognition Dakshina Ranjan Kisku, Phalguni Gupta, Jamuna Kanta Sing, Massimo Tistarelli Journal of the Chinese Institute of Engineers Transactions of the Chinese Institute of Engineers Series A, 2012
Multibiometrics feature level fusion by graph clustering International Journal of Security and Its Applications, 2011
Robust multi-camera view face recognition Dakshina R. Kisku, Hunny Mehrotra, Phalguni Gupta, Jamuna K. Sing International Journal of Computers and Applications, 2011
Palmprint identification using FRIT D. R. Kisku, A. Rattani, P. Gupta, C. J. Hwang, J. K. Sing Proceedings of SPIE the International Society for Optical Engineering, 2011
Graphs in biometrics Dakshina Ranjan Kisku, Phalguni Gupta, Jamuna Kanta Sing Cases on ICT Utilization Practice and Solutions Tools for Managing Day to Day Issues, 2010
Offline signature identification by fusion of multiple classifiers using statistical learning theory International Journal of Security and Its Applications, 2010
Face recognition using SIFT descriptor under multiple paradigms of graph similarity constraints International Journal of Multimedia and Ubiquitous Engineering, 2010
Feature level fusion of face and palmprint biometrics Dakshina Ranjan Kisku, Phalguni Gupta, Jamuna Kanta Sing Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2010
An efficient ear identification system Dakshina Ranjan Kisku, Sandesh Gupta, Phalguni Gupta, Jamuna Kanta Sing 2010 5th International Conference on Future Information Technology Futuretech 2010 Proceedings, 2010
A Novel approach to offline signature verification using gaussian empirical rule 6th European Conference on Information Warfare and Security 2007 Eciw 2007, 2007
Facial template synthesis based on SIFT features Ajita Rattani, D. R. Kisku, Andrea Lagorio, Massimo Tistarelli 2007 IEEE Workshop on Automatic Identification Advanced Technologies Proceedings, 2007
Estimating Haemoglobin Levels Non-Invasively Through Analysis of Palm Pallor Using Deep Learning Networks S Naskar, A Kesarwani, S Das, DR Kisku, M Dalui Journal of The Institution of Engineers (India): Series B, 1-14 , 2026 2026
Enhancing image captioning with asynchronous dual attention vision transformer B Patra, DR Kisku Intelligent Data Analysis 30 (2), 388-414 , 2026 2026 Citations: 2
Improving E-jet process capabilities with black box machine learning AK Ball, R Das, A Kumar, SS Roy, DR Kisku, NC Murmu The International Journal of Advanced Manufacturing Technology 142 (1), 119-134 , 2026 2026
A Self-Attention-Integrated Deep Learning Approach for Non-Invasive Hemoglobin Estimation from Palm Pallor Video S Naskar, DR Kisku, M Dalui 2025 Seventh International Conference on Research in Computational … , 2025 2025
Context-Aware Image Description via Dual-Stream Visual Encoding and Guided Multimodal Learning B Patra, C Ghosh, DR Kisku International Conference on Pattern Recognition and Machine Intelligence … , 2025 2025
Boosting Zero-Shot Learning using A Combination of EfficientNet and Deep Visual-Semantic Embeddings S Changder, DR Kisku 2025 5th IEEE International Conference on Applied Electromagnetics, Signal … , 2025 2025
Non-invasive anaemia detection based on palm pallor video using tree-structured 3D CNN and vision transformer models A Kesarwani, S Das, DR Kisku, M Dalui Journal of Experimental & Theoretical Artificial Intelligence 37 (6), 957-985 , 2025 2025 Citations: 8
Attention enabled precise haemoglobin prediction: A GNN based non-invasive approach S Das, A Kesarwani, DR Kisku, M Dalui Biomedical Signal Processing and Control 105, 107604 , 2025 2025 Citations: 7
Early Detection of Knee Osteoarthritis: A Simple and Effective Approach Using Local Directional Octa Pattern (LDOP) S Garg, RD Rakshit, DS Dev, DR Kisku International Conference on Smart Computing and Informatics, 153-165 , 2025 2025
Analysis, Processing and Encoding of Low-Resolution Face Images Using Convolutional Neural Network and Vision Transformer for Person Recognition SK Gautam, S Garg, RD Rakshit, DR Kisku International Conference on Smart Computing and Informatics, 159-169 , 2025 2025
An LDOP approach for face identification under unconstrained scenarios R Datta Rakshit, A Rattani, DR Kisku Journal of Experimental & Theoretical Artificial Intelligence 37 (2), 219-267 , 2025 2025 Citations: 5
Exploring bengali image descriptions through the combination of diverse cnn architectures and transformer decoders B Patra, DR Kisku Turkish Journal of Engineering 9 (1), 64-78 , 2025 2025 Citations: 7
DEM-UFR: Deep Ensemble Method for Enhanced Unconstraint Face Recognition System D Kumar, RK Kumar, J Garain, DR Kisku, JK Sing, P Gupta 2024 OITS International Conference on Information Technology (OCIT), 133-137 , 2024 2024 Citations: 3
An effective pathway of brain stroke detection from ct scan images using local directional octa pattern AK Padhi, S Garg, RD Rakshit, DR Kisku International Conference on Pattern Recognition, 240-252 , 2024 2024 Citations: 1
Improving Bias in Facial Attribute Classification: A Combined Impact of KL Divergence-Induced Loss Function and Dual Attention S Patel, DR Kisku International Conference on Pattern Recognition, 383-397 , 2024 2024
Multi-scale Vision Transformer toward improved non-invasive anaemia detection using palm video A Kesarwani, S Das, DR Kisku, M Dalui Multimedia Tools and Applications 83 (38), 85825-85848 , 2024 2024 Citations: 2
Non-invasive Haemoglobin estimation from Nail Pallor leveraging pre-trained models and graph neural networks S Das, A Kesarwani, DR Kisku, M Dalui International Conference on Data Science and Applications, 143-155 , 2024 2024 Citations: 1
A Multi-modal Approach for Efficient and Contextually Rich Visual Description Generation B Patra, DR Kisku International Conference on Computational Intelligence in Pattern … , 2024 2024 Citations: 1
Dual mode information fusion with pre-trained CNN models and transformer for video-based non-invasive anaemia detection A Kesarwani, S Das, DR Kisku, M Dalui Biomedical Signal Processing and Control 88, 105592 , 2024 2024 Citations: 20
Precise and faster image description generation with limited resources using an improved hybrid deep model B Patra, DR Kisku International conference on pattern recognition and machine intelligence … , 2023 2023 Citations: 7
MOST CITED SCHOLAR PUBLICATIONS
Feature level fusion of face and fingerprint biometrics A Rattani, DR Kisku, M Bicego, M Tistarelli 2007 First IEEE International Conference on Biometrics: Theory, Applications … , 2007 2007 Citations: 234
Face identification by SIFT-based complete graph topology DR Kisku, A Rattani, E Grosso, M Tistarelli 2007 IEEE workshop on automatic identification advanced technologies, 63-68 , 2007 2007 Citations: 131
Continuous Authentication Using Biometrics: Data, Models, and Metrics I Traore, AAE Ahmed Information Science Reference , 2012 2012 Citations: 117
Covid-19 detection on chest x-ray and ct scan images using multi-image augmented deep learning model K Purohit, A Kesarwani, D Ranjan Kisku, M Dalui Proceedings of the Seventh International Conference on Mathematics and … , 2022 2022 Citations: 96
SIFT-based ear recognition by fusion of detected keypoints from color similarity slice regions DR Kisku, H Mehrotra, P Gupta, JK Sing 2009 International Conference on Advances in Computational Tools for … , 2009 2009 Citations: 64
Modeling of EHD inkjet printing performance using soft computing-based approaches: AK Ball et al. AK Ball, R Das, SS Roy, DR Kisku, NC Murmu Soft Computing 24 (1), 571-589 , 2020 2020 Citations: 61
Face recognition by fusion of local and global matching scores using DS theory: An evaluation with uni-classifier and multi-classifier paradigm DR Kisku, M Tistarelli, JK Sing, P Gupta 2009 IEEE Computer Society Conference on Computer Vision and Pattern … , 2009 2009 Citations: 60
Multisensor biometric evidence fusion for person authentication using wavelet decomposition and monotonic-decreasing graph DR Kisku, JK Sing, M Tistarelli, P Gupta 2009 seventh international conference on advances in pattern recognition … , 2009 2009 Citations: 58
Writer independent handwritten signature verification on multi-scripted signatures using hybrid CNN-BiLSTM: A novel approach T Longjam, DR Kisku, P Gupta Expert Systems with Applications 214, 119111 , 2023 2023 Citations: 55
Offline signature identification by Fusion of multiple classifiers using statistical learning theory DRK Dakshina Ranjan Kisku, PG Phalguni Gupta, ... International Journal of Security and Its Applications 4 (3), 35-45 , 2010 2010 Citations: 51
Optimization of drop ejection frequency in EHD inkjet printing system using an improved Firefly Algorithm AK Ball, SS Roy, DR Kisku, NC Murmu, L dos Santos Coelho Applied Soft Computing 94, 106438 , 2020 2020 Citations: 39
Robust feature-level multibiometric classification A Rattani, DR Kisku, M Bicego, M Tistarelli 2006 Biometrics Symposium: Special Session on Research at the Biometric … , 2006 2006 Citations: 36
Face identification using some novel local descriptors under the influence of facial complexities RD Rakshit, SC Nath, DR Kisku Expert Systems with Applications 92, 82-94 , 2018 2018 Citations: 34
Detection of rare genetic diseases using facial 2D images with transfer learning A Singh, DR Kisku 2018 8th International Symposium on Embedded Computing and System Design … , 2018 2018 Citations: 31
Biometric sensor image fusion for identity verification: A case study with wavelet-based fusion rules graph matching DR Kisku, A Rattani, P Gupta, JK Sing 2009 IEEE Conference on Technologies for Homeland Security, 433-439 , 2009 2009 Citations: 31
Data hiding in images using some efficient steganography techniques C Maiti, D Baksi, I Zamider, P Gorai, DR Kisku International Conference on Signal Processing, Image Processing, and Pattern … , 2011 2011 Citations: 30
Multibiometrics Feature Level Fusion by Graph Clustering DRK Dakshina Ranjan Kisku, PG Phalguni Gupta, ... International Journal of Security and Its Applications 5 (2), 61-74 , 2011 2011 Citations: 30
Al-based bmi inference from facial images: An application to weight monitoring H Siddiqui, A Rattani, DR Kisku, T Dean 2020 19th IEEE International Conference on Machine Learning and Applications … , 2020 2020 Citations: 29
Unconstrained and constrained face recognition using dense local descriptor with ensemble framework D Kumar, J Garain, DR Kisku, JK Sing, P Gupta Neurocomputing 408, 273-284 , 2020 2020 Citations: 29
Advances in biometrics for secure human authentication and recognition P Gupta, DR Kisku, JK Sing CRC Press , 2014 2014 Citations: 29