Dr. Tapas Kumar Mishra

Scopus Publications

AKS: Optimized Adaptive K-Nearest Neighbor-Based Oversampling for Imbalanced Datasets
Annam Nandini, Tapas Kumar Mishra, Abinash Pujahari, Kshira Sagar Sahoo
IEEE Access, 2026
OdNER: NER resource creation and system development for low-resource Odia language
Tusarkanta Dalai, Anupam Das, Tapas Kumar Mishra, Pankaj Kumar Sa
Natural Language Processing Journal, 2025
Statistical machine translation for Indic languages
Sudhansu Bala Das, Divyajyoti Panda, Tapas Kumar Mishra, Bidyut Kr. Patra
Natural Language Processing, 2025
Statistical Machine Translation (SMT) systems use various probabilistic and statistical Natural Language Processing (NLP) methods to automatically translate from one language to another language while retaining the originality of the context. This paper aims to discuss the development of bilingual SMT models for translating English into fifteen low-resource Indic languages (ILs) and vice versa. The process to build the SMT model is described and explained using a workflow diagram. Samanantar and OPUS corpus are utilized for training, and Flores200 corpus is used for fine-tuning and testing purposes. The paper also highlights various preprocessing methods used to deal with corpus noise. The Moses open-source SMT toolkit is being investigated for the system’s development. The impact of distance-based reordering and Morpho-syntactic Descriptor Bidirectional Finite-State Encoder (msd-bidirectional-fe) reordering on ILs is compared in the paper. This paper provides a comparison of SMT models with Neural Machine Translation (NMT) for ILs. All the experiments assess the translation quality using standard metrics such as BiLingual Evaluation Understudy, Rank-based Intuitive Bilingual Evaluation Score, Translation Edit Rate, and Metric for Evaluation of Translation with Explicit Ordering. From the result, it is observed that msd-bidirectional-fe reordering performs better than the distance-based reordering model for ILs. It is also noticed that even though the IL-English and English-IL systems are trained using the same corpus, the former performs better for all the evaluation metrics. The comparison between SMT and NMT shows that across various languages, SMT performs better in some cases, while NMT outperforms in others.
HGR-FYOLO: a robust hand gesture recognition system for the normal and physically impaired person using frozen YOLOv5
Abir Sen, Shubham Dombe, Tapas Kumar Mishra, Ratnakar Dash
Multimedia Tools and Applications, 2024
Multilingual Neural Machine Translation for Indic to Indic Languages
Sudhansu Bala Das, Divyajyoti Panda, Tapas Kumar Mishra, Bidyut Kr. Patra, Asif Ekbal
ACM Transactions on Asian and Low Resource Language Information Processing, 2024
The method of translation from one language to another without human intervention is known as Machine Translation (MT). Multilingual neural machine translation (MNMT) is a technique for MT that builds a single model for multiple languages. It is preferred over other approaches, since it decreases training time and improves translation in low-resource contexts, i.e., for languages that have insufficient corpus. However, good-quality MT models are yet to be built for many scenarios such as for Indic-to-Indic Languages (IL-IL). Hence, this article is an attempt to address and develop the baseline models for low-resource languages i.e., IL-IL (for 11 Indic Languages (ILs)) in a multilingual environment. The models are built on the Samanantar corpus and analyzed on the Flores-200 corpus. All the models are evaluated using standard evaluation metrics i.e., Bilingual Evaluation Understudy (BLEU) score (with the range of 0 to 100). This article examines the effect of the grouping of related languages, namely, East Indo-Aryan (EI), Dravidian (DR), and West Indo-Aryan (WI) on the MNMT model. From the experiments, the results reveal that related language grouping is beneficial for the WI group only while it is detrimental for the EI group and it shows an inconclusive effect on the DR group. The role of pivot-based MNMT models in enhancing translation quality is also investigated in this article. Owing to the presence of large good-quality corpora from English (EN) to ILs, MNMT IL-IL models using EN as a pivot are built and examined. To achieve this, English-Indic Language (EN-IL) models are developed with and without the usage of related languages. Results show that the use of related language grouping is advantageous specifically for EN to ILs. Thus, related language groups are used for the development of pivot MNMT models. It is also observed that the usage of pivot models greatly improves MNMT baselines. Furthermore, the effect of transliteration on ILs is also analyzed in this article. To explore transliteration, the best MNMT models from the previous approaches (in most of cases pivot model using related groups) are determined and built on corpus transliterated from the corresponding scripts to a modified Indian language Transliteration script (ITRANS). The outcome of the experiments indicates that transliteration helps the models built for lexically rich languages, with the best increment of BLEU scores observed in Malayalam (ML) and Tamil (TA), i.e., 6.74 and 4.72, respectively. The BLEU score using transliteration models ranges from 7.03 to 24.29. The best model obtained is the Punjabi (PA)-Hindi (HI) language pair trained on PA-WI transliterated corpus.
Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers
Tusarkanta Dalai, Tapas Kumar Mishra, Pankaj K. Sa
ACM Transactions on Asian and Low Resource Language Information Processing, 2024
Developing effective natural language processing (NLP) tools for low-resourced languages poses significant challenges. This article centers its attention on the task of Part-of-speech (POS) tagging and chunking, which pertains to the identification and categorization of linguistic units within sentences. POS tagging and Chunking have already produced positive results in English and other European languages. However, in Indian languages, particularly in Odia language, it is not yet well explored because of the lack of supporting tools, resources, and its complex linguistic morphology. This study presents the building of a manually annotated dataset for Odia phrase chunking task and the development of a deep learning-based model specifically tailored to accommodate the distinctive properties of the language. The process of annotating the Odia chunking corpus involved the utilization of inside-outside-begin labels, which were tagged by using designed Odia chunking tagset. We utilize the constructed Odia chunking dataset to build Odia chunker based on deep learning techniques, employing state-of-the-art architectures. Various techniques, such as Recurrent Neural Networks, Convolutional Neural Networks, and transformer-based models, are investigated to determine the most effective approach for Odia POS tagging and chunking. In addition, we conduct experiments utilizing diverse input representations, including Odia word embeddings, character-level representations, and sub-word units, to effectively capture the complex linguistic characteristics of the Odia language. Numerous experiments are conducted that evaluate the performance of our Odia POS tagger and chunker, employing standard evaluation metrics and making comparisons with existing approaches. The results demonstrate that our transformer-based tagger and chunker achieves superior accuracy and robustness in identifying and categorizing linguistic POS tags and chunks within Odia sentences. It outperforms existing work and exhibits consistent performance across diverse linguistic contexts and sentence structures. The developed Odia POS tagger and chunker have enormous potential for a variety of NLP applications, including information extraction, syntactic parsing, and machine translation, all of which are tailored to the low-resource Odia language. This work contributes to developing NLP tools and technologies for low-resource languages, thereby facilitating enhanced language processing capabilities in various linguistic contexts.
On the Size of an r-wise fractional L-intersecting family
Tapas Kumar Mishra
Journal of Combinatorics, 2024
Deep Learning-Based Hand Gesture Recognition System and Design of a Human–Machine Interface
Abir Sen, Tapas Kumar Mishra, Ratnakar Dash
Neural Processing Letters, 2023
Part-of-Speech Tagging of Odia Language Using Statistical and Deep Learning Based Approaches
Tusarkanta Dalai, Tapas Kumar Mishra, Pankaj K. Sa
ACM Transactions on Asian and Low Resource Language Information Processing, 2023
Automatic part-of-speech (POS) tagging is a preprocessing step of many natural language processing tasks, such as named entity recognition, speech processing, information extraction, word sense disambiguation, and machine translation. It has already gained promising results in English and European languages. However, in Indian languages, particularly in the Odia language, it is not yet well explored because of the lack of supporting tools, resources, and morphological richness of the language. Unfortunately, we were unable to locate an open source POS tagger for the Odia language, and only a handful of attempts have been made to develop POS taggers for the Odia language. The main contribution of this research work is to present statistical approaches such as the maximum entropy Markov model and conditional random field (CRF), as well as deep learning based approaches, including the convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM) to develop the Odia POS tagger. A publicly accessible corpus annotated with the Bureau of Indian Standards (BIS) tagset is used in our work. However, most of the languages around the globe have used the dataset annotated with the Universal Dependencies (UD) tagset. Hence, to maintain uniformity, the Odia dataset should use the same tagset. Thus, following the BIS and UD guidelines, we constructed a mapping from the BIS tagset to the UD tagset. The maximum entropy Markov model, CRF, Bi-LSTM, and CNN models are trained using the Indian Languages Corpora Initiative corpus with the BIS and UD tagsets. We have experimented with various feature sets as input to the statistical models to prepare a baseline system and observed the impact of constructed feature sets. The deep learning based model includes the Bi-LSTM network, the CNN network, the CRF layer, character sequence information, and a pre-trained word vector. Seven different combinations of neural sequence labeling models are implemented, and their performance measures are investigated. It has been observed that the Bi-LSTM model with the character sequence feature and pre-trained word vector achieved a result with 94.58% accuracy.
Improving Multilingual Neural Machine Translation System for Indic Languages
Sudhansu Bala Das, Atharv Biradar, Tapas Kumar Mishra, Bidyut Kr. Patra
ACM Transactions on Asian and Low Resource Language Information Processing, 2023
The Machine Translation System (MTS) serves as effective tool for communication by translating text or speech from one language to another language. Recently, neural machine translation (NMT) has become popular for its performance and cost-effectiveness. However, NMT systems are restricted in translating low-resource languages as a huge quantity of data is required to learn useful mappings across languages. The need for an efficient translation system becomes obvious in a large multilingual environment like India. Indian languages (ILs) are still treated as low-resource languages due to unavailability of corpora. In order to address such an asymmetric nature, the multilingual neural machine translation (MNMT) system evolves as an ideal approach in this direction. The MNMT converts many languages using a single model, which is extremely useful in terms of training process and lowering online maintenance costs. It is also helpful for improving low-resource translation. In this article, we propose an MNMT system to address the issues related to low-resource language translation. Our model comprises two MNMT systems, i.e., for English-Indic (one-to-many) and for Indic-English (many-to-one) with a shared encoder-decoder containing 15 language pairs (30 translation directions). Since most of IL pairs have a scanty amount of parallel corpora, not sufficient for training any machine translation model, we explore various augmentation strategies to improve overall translation quality through the proposed model. A state-of-the-art transformer architecture is used to realize the proposed model. In addition, the article addresses the use of language relationships (in terms of dialect, script, etc.), particularly about the role of high-resource languages of the same family in boosting the performance of low-resource languages. Moreover, the experimental results also show the advantage of back-translation and domain adaptation for ILs to enhance the translation quality of both source and target languages. Using all these key approaches, our proposed model emerges to be more efficient than the baseline model in terms of evaluation metrics, i.e., BLEU (BiLingual Evaluation Understudy) score for a set of ILs.
A novel hand gesture detection and recognition system based on ensemble-based convolutional neural network
Abir Sen, Tapas Kumar Mishra, Ratnakar Dash
Multimedia Tools and Applications, 2022
Source code auto-completion using various deep learning models under limited computing resources
Madhab Sharma, Tapas Kumar Mishra, Arun Kumar
Complex and Intelligent Systems, 2022
NIT Rourkela Machine Translation(MT) System Submission to WAT 2022 for MultiIndicMT: An Indic Language Multilingual Shared Task
Proceedings International Conference on Computational Linguistics Coling, 2022
Modular and fractional L-intersecting families of vector spaces
Rogers Mathew, Tapas Kumar Mishra, Ritabrata Ray, Shashank Srivastava
Electronic Journal of Combinatorics, 2022
A Survey: Security Issues and Challenges in Internet of Things
Balaji Yedle, Gunjan Shrivastava, Arun Kumar, Alekha Kumar Mishra, Tapas Kumar Mishra
Lecture Notes in Networks and Systems, 2021
System of unbiased representatives for a collection of bicolorings
Niranjan Balachandran, Rogers Mathew, Tapas Kumar Mishra, Sudebkumar Prasant Pal
Discrete Applied Mathematics, 2020
A Combinatorial Proof of Fisher’s Inequality
Rogers Mathew, Tapas Kumar Mishra
Graphs and Combinatorics, 2020
Bisecting and D-secting families for set systems
Niranjan Balachandran, Rogers Mathew, Tapas Kumar Mishra, Sudebkumar Prasant Pal
Discrete Applied Mathematics, 2020
Boundary Detection in Dynamic Wireless Sensor Networks using Convex Hull Techniques
Tapas Kumar Mishra, Jayadeep Sadhu, Arun Kumar
2020 IEEE Calcutta Conference Calcon 2020 Proceedings, 2020
Analyzing the Linguistic Structure of Questions to Make Unanswered Questions Answered
Shashank Bhatt, Tapas Kumar Mishra
Communications in Computer and Information Science, 2020
Fractional L-intersecting families
Niranjan Balachandran, Rogers Mathew, Tapas Kumar Mishra
Electronic Journal of Combinatorics, 2019
Induced-bisecting families of bicolorings for hypergraphs
Niranjan Balachandran, Rogers Mathew, Tapas Kumar Mishra, Sudebkumar Prasant Pal
Discrete Mathematics, 2018
Lower bounds for Ramsey numbers for complete bipartite and 3-uniform tripartite subgraphs
Tapas Kumar Mishra, Sudebkumar Prasant Pal
Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2013
Lower bounds for Ramsey numbers for complete bipartite and 3-uniform tripartite subgraphs
Tapas Kumar Mishra, Sudebkumar Prasant Pal
Journal of Graph Algorithms and Applications, 2013

RECENT SCHOLAR PUBLICATIONS

Erd\H {o} s Matching (Conjecture) Theorem
TK Mishra
arXiv preprint arXiv:2602.01471 , 2026
2026
The Linear Arboricity Conjecture for Graphs with Large Girth
TK Mishra
arXiv preprint arXiv:2512.11240 , 2025
2025
Development of a Low-Cost Named Entity Recognition System for Odia Language using Deep Active Learning
T Dalai, TK Mishra, PK Sa, P Mohanty, C Swain, AK Nayak
Proceedings of the Workshop on Beyond English: Natural Language Processing … , 2025
2025
A thresholding method for Improving translation Quality for Indic MT task
SB Das, LR Rodrigues, TK Mishra, BK Patra
Proceedings of the First Workshop on Advancing NLP for Low-Resource … , 2025
2025
OdNER: NER resource creation and system development for low-resource Odia language
T Dalai, A Das, TK Mishra, PK Sa
Natural Language Processing Journal 11, 100139 , 2025
2025
Citations: 6
Comparative analysis of subword tokenization approaches for Indian languages
SB Das, S Choudhury, TK Mishra, BK Patra
arXiv preprint arXiv:2505.16868 , 2025
2025
Citations: 5
Comparative analysis of subword tokenization approaches for Indian languages
S Bala Das, S Choudhury, TK Mishra, BK Patra
arXiv e-prints, arXiv: 2505.16868 , 2025
2025
Statistical machine translation for indic languages
SB Das, D Panda, TK Mishra, BK Patra
Natural Language Processing 31 (2), 328-345 , 2025
2025
Citations: 25
Investigating the Effect of Backtranslation for Indic Languages
SB Das, S Choudhury, TK Mishra, BK Patra
Proceedings of the First Workshop on Natural Language Processing for Indo … , 2025
2025
Citations: 5
HGR-FYOLO: a robust hand gesture recognition system for the normal and physically impaired person using frozen YOLOv5
A Sen, S Dombe, TK Mishra, R Dash
Multimedia Tools and Applications 83 (30), 73797-73815 , 2024
2024
Citations: 8
Novel Human Machine Interface via Robust Hand Gesture Recognition System using Channel Pruned YOLOv5s Model
A Sen, TK Mishra, R Dash
arXiv preprint arXiv:2407.02585 , 2024
2024
Citations: 3
Multilingual Neural Machine Translation for Indic to Indic Languages
S Bala Das, D Panda, T Kumar Mishra, B Kr. Patra, A Ekbal
ACM Transactions on Asian and Low-Resource Language Information Processing … , 2024
2024
Citations: 36
Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers
T Dalai, TK Mishra, PK Sa
ACM Transactions on Asian and Low-Resource Language Information Processing … , 2024
2024
Citations: 15
An approach for mistranslation removal from popular dataset for Indic MT Task
SB Das, LR Rodrigues, TK Mishra, BK Patra
arXiv preprint arXiv:2401.06398 , 2024
2024
Citations: 5
An approach for mistranslation removal from popular dataset for Indic MT Task
S Bala Das, LR Rodrigues, TK Mishra, BK Patra
arXiv e-prints, arXiv: 2401.06398 , 2024
2024
On the size of an -wise fractional -intersecting family
TK Mishra
Journal of Combinatorics 15 (1), 77-87 , 2024
2024
Citations: 2
Deep Learning-Based Hand Gesture Recognition System and Design of a Human–Machine Interface
A Sen, TK Mishra, R Dash
Neural Processing Letters 55 (9), 12569-12596 , 2023
2023
Citations: 33
Improving multilingual neural machine translation system for Indic languages
SB Das, A Biradar, TK Mishra, BK Patra
ACM Transactions on Asian and Low-Resource Language Information Processing … , 2023
2023
Citations: 61
Part-of-speech tagging of Odia language using statistical and deep learning based approaches
T Dalai, TK Mishra, PK Sa
ACM Transactions on Asian and Low-Resource Language Information Processing … , 2023
2023
Citations: 41
A novel hand gesture detection and recognition system based on ensemble-based convolutional neural network
A Sen, TK Mishra, R Dash
Multimedia Tools and Applications 81 (28), 40043-40066 , 2022
2022
Citations: 39

MOST CITED SCHOLAR PUBLICATIONS

Improving multilingual neural machine translation system for Indic languages
SB Das, A Biradar, TK Mishra, BK Patra
ACM Transactions on Asian and Low-Resource Language Information Processing … , 2023
2023
Citations: 61
Part-of-speech tagging of Odia language using statistical and deep learning based approaches
T Dalai, TK Mishra, PK Sa
ACM Transactions on Asian and Low-Resource Language Information Processing … , 2023
2023
Citations: 41
A novel hand gesture detection and recognition system based on ensemble-based convolutional neural network
A Sen, TK Mishra, R Dash
Multimedia Tools and Applications 81 (28), 40043-40066 , 2022
2022
Citations: 39
Multilingual Neural Machine Translation for Indic to Indic Languages
S Bala Das, D Panda, T Kumar Mishra, B Kr. Patra, A Ekbal
ACM Transactions on Asian and Low-Resource Language Information Processing … , 2024
2024
Citations: 36
Deep Learning-Based Hand Gesture Recognition System and Design of a Human–Machine Interface
A Sen, TK Mishra, R Dash
Neural Processing Letters 55 (9), 12569-12596 , 2023
2023
Citations: 33
Statistical machine translation for indic languages
SB Das, D Panda, TK Mishra, BK Patra
Natural Language Processing 31 (2), 328-345 , 2025
2025
Citations: 25
Blockchain: Basics, applications, challenges and opportunities
J Arya, A Kumar, AP Singh, TK Mishra, PHJ Chong
Jan , 2021
2021
Citations: 19
Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers
T Dalai, TK Mishra, PK Sa
ACM Transactions on Asian and Low-Resource Language Information Processing … , 2024
2024
Citations: 15
Fractional L-intersecting Families
N Balachandran, R Mathew, TK Mishra
The Electronic Journal of Combinatorics 26 (2), 2.40 , 2019
2019
Citations: 13
Source code auto-completion using various deep learning models under limited computing resources
M Sharma, TK Mishra, A Kumar
Complex & Intelligent Systems 8 (5), 4357-4368 , 2022
2022
Citations: 9
HGR-FYOLO: a robust hand gesture recognition system for the normal and physically impaired person using frozen YOLOv5
A Sen, S Dombe, TK Mishra, R Dash
Multimedia Tools and Applications 83 (30), 73797-73815 , 2024
2024
Citations: 8
NIT Rourkela machine translation (MT) system submission to WAT 2022 for MultiIndicMT: An Indic language multilingual shared task
SB Das, A Biradar, TK Mishra, BK Patra
Proceedings of the 9th Workshop on Asian Translation, 73-77 , 2022
2022
Citations: 7
A Combinatorial Proof of Fisher’s Inequality
R Mathew, TK Mishra
Graphs and Combinatorics 36 (6), 1953-1956 , 2020
2020
Citations: 7
Boundary detection in dynamic wireless sensor networks using convex hull techniques
TK Mishra, J Sadhu, A Kumar
2020 IEEE Calcutta Conference (CALCON), 368-372 , 2020
2020
Citations: 7
OdNER: NER resource creation and system development for low-resource Odia language
T Dalai, A Das, TK Mishra, PK Sa
Natural Language Processing Journal 11, 100139 , 2025
2025
Citations: 6
Modular and Fractional L -Intersecting Families of Vector Spaces
SS Rogers Mathew, Tapas Kumar Mishra, Ritabrata Ray
the electronic journal of combinatorics 29 (1), P1.45 , 2022
2022
Citations: 6
Comparative analysis of subword tokenization approaches for Indian languages
SB Das, S Choudhury, TK Mishra, BK Patra
arXiv preprint arXiv:2505.16868 , 2025
2025
Citations: 5
Investigating the Effect of Backtranslation for Indic Languages
SB Das, S Choudhury, TK Mishra, BK Patra
Proceedings of the First Workshop on Natural Language Processing for Indo … , 2025
2025
Citations: 5
An approach for mistranslation removal from popular dataset for Indic MT Task
SB Das, LR Rodrigues, TK Mishra, BK Patra
arXiv preprint arXiv:2401.06398 , 2024
2024
Citations: 5
Bisecting and D-secting families for set systems
N Balachandran, R Mathew, TK Mishra, SP Pal
Discrete Applied Mathematics 280, 2-13 , 2020
2020
Citations: 5

Dr. Tapas Kumar Mishra

RESEARCH, TEACHING, or OTHER INTERESTS

Scopus Publications

RECENT SCHOLAR PUBLICATIONS

MOST CITED SCHOLAR PUBLICATIONS