Dr. Tapas Kumar Mishra

@nitrkl.ac.in

Assistant Professor, Department of Computer Science & Engineering
NIT ROURKELA

Dr. Tapas Kumar Mishra

RESEARCH, TEACHING, or OTHER INTERESTS

Theoretical Computer Science, Computational Theory and Mathematics, Artificial Intelligence, Language and Linguistics
24

Scopus Publications

375

Scholar Citations

9

Scholar h-index

9

Scholar i10-index

Scopus Publications

  • AKS: Optimized Adaptive K-Nearest Neighbor-Based Oversampling for Imbalanced Datasets
    Annam Nandini, Tapas Kumar Mishra, Abinash Pujahari, Kshira Sagar Sahoo
    IEEE Access, 2026
  • OdNER: NER resource creation and system development for low-resource Odia language
    Tusarkanta Dalai, Anupam Das, Tapas Kumar Mishra, Pankaj Kumar Sa
    Natural Language Processing Journal, 2025
  • Statistical machine translation for Indic languages
    Sudhansu Bala Das, Divyajyoti Panda, Tapas Kumar Mishra, Bidyut Kr. Patra
    Natural Language Processing, 2025
    Statistical Machine Translation (SMT) systems use various probabilistic and statistical Natural Language Processing (NLP) methods to automatically translate from one language to another language while retaining the originality of the context. This paper aims to discuss the development of bilingual SMT models for translating English into fifteen low-resource Indic languages (ILs) and vice versa. The process to build the SMT model is described and explained using a workflow diagram. Samanantar and OPUS corpus are utilized for training, and Flores200 corpus is used for fine-tuning and testing purposes. The paper also highlights various preprocessing methods used to deal with corpus noise. The Moses open-source SMT toolkit is being investigated for the system’s development. The impact of distance-based reordering and Morpho-syntactic Descriptor Bidirectional Finite-State Encoder (msd-bidirectional-fe) reordering on ILs is compared in the paper. This paper provides a comparison of SMT models with Neural Machine Translation (NMT) for ILs. All the experiments assess the translation quality using standard metrics such as BiLingual Evaluation Understudy, Rank-based Intuitive Bilingual Evaluation Score, Translation Edit Rate, and Metric for Evaluation of Translation with Explicit Ordering. From the result, it is observed that msd-bidirectional-fe reordering performs better than the distance-based reordering model for ILs. It is also noticed that even though the IL-English and English-IL systems are trained using the same corpus, the former performs better for all the evaluation metrics. The comparison between SMT and NMT shows that across various languages, SMT performs better in some cases, while NMT outperforms in others.
  • HGR-FYOLO: a robust hand gesture recognition system for the normal and physically impaired person using frozen YOLOv5
    Abir Sen, Shubham Dombe, Tapas Kumar Mishra, Ratnakar Dash
    Multimedia Tools and Applications, 2024
  • Multilingual Neural Machine Translation for Indic to Indic Languages
    Sudhansu Bala Das, Divyajyoti Panda, Tapas Kumar Mishra, Bidyut Kr. Patra, Asif Ekbal
    ACM Transactions on Asian and Low Resource Language Information Processing, 2024
    The method of translation from one language to another without human intervention is known as Machine Translation (MT). Multilingual neural machine translation (MNMT) is a technique for MT that builds a single model for multiple languages. It is preferred over other approaches, since it decreases training time and improves translation in low-resource contexts, i.e., for languages that have insufficient corpus. However, good-quality MT models are yet to be built for many scenarios such as for Indic-to-Indic Languages (IL-IL). Hence, this article is an attempt to address and develop the baseline models for low-resource languages i.e., IL-IL (for 11 Indic Languages (ILs)) in a multilingual environment. The models are built on the Samanantar corpus and analyzed on the Flores-200 corpus. All the models are evaluated using standard evaluation metrics i.e., Bilingual Evaluation Understudy (BLEU) score (with the range of 0 to 100). This article examines the effect of the grouping of related languages, namely, East Indo-Aryan (EI), Dravidian (DR), and West Indo-Aryan (WI) on the MNMT model. From the experiments, the results reveal that related language grouping is beneficial for the WI group only while it is detrimental for the EI group and it shows an inconclusive effect on the DR group. The role of pivot-based MNMT models in enhancing translation quality is also investigated in this article. Owing to the presence of large good-quality corpora from English (EN) to ILs, MNMT IL-IL models using EN as a pivot are built and examined. To achieve this, English-Indic Language (EN-IL) models are developed with and without the usage of related languages. Results show that the use of related language grouping is advantageous specifically for EN to ILs. Thus, related language groups are used for the development of pivot MNMT models. It is also observed that the usage of pivot models greatly improves MNMT baselines. Furthermore, the effect of transliteration on ILs is also analyzed in this article. To explore transliteration, the best MNMT models from the previous approaches (in most of cases pivot model using related groups) are determined and built on corpus transliterated from the corresponding scripts to a modified Indian language Transliteration script (ITRANS). The outcome of the experiments indicates that transliteration helps the models built for lexically rich languages, with the best increment of BLEU scores observed in Malayalam (ML) and Tamil (TA), i.e., 6.74 and 4.72, respectively. The BLEU score using transliteration models ranges from 7.03 to 24.29. The best model obtained is the Punjabi (PA)-Hindi (HI) language pair trained on PA-WI transliterated corpus.
  • Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers
    Tusarkanta Dalai, Tapas Kumar Mishra, Pankaj K. Sa
    ACM Transactions on Asian and Low Resource Language Information Processing, 2024
    Developing effective natural language processing (NLP) tools for low-resourced languages poses significant challenges. This article centers its attention on the task of Part-of-speech (POS) tagging and chunking, which pertains to the identification and categorization of linguistic units within sentences. POS tagging and Chunking have already produced positive results in English and other European languages. However, in Indian languages, particularly in Odia language, it is not yet well explored because of the lack of supporting tools, resources, and its complex linguistic morphology. This study presents the building of a manually annotated dataset for Odia phrase chunking task and the development of a deep learning-based model specifically tailored to accommodate the distinctive properties of the language. The process of annotating the Odia chunking corpus involved the utilization of inside-outside-begin labels, which were tagged by using designed Odia chunking tagset. We utilize the constructed Odia chunking dataset to build Odia chunker based on deep learning techniques, employing state-of-the-art architectures. Various techniques, such as Recurrent Neural Networks, Convolutional Neural Networks, and transformer-based models, are investigated to determine the most effective approach for Odia POS tagging and chunking. In addition, we conduct experiments utilizing diverse input representations, including Odia word embeddings, character-level representations, and sub-word units, to effectively capture the complex linguistic characteristics of the Odia language. Numerous experiments are conducted that evaluate the performance of our Odia POS tagger and chunker, employing standard evaluation metrics and making comparisons with existing approaches. The results demonstrate that our transformer-based tagger and chunker achieves superior accuracy and robustness in identifying and categorizing linguistic POS tags and chunks within Odia sentences. It outperforms existing work and exhibits consistent performance across diverse linguistic contexts and sentence structures. The developed Odia POS tagger and chunker have enormous potential for a variety of NLP applications, including information extraction, syntactic parsing, and machine translation, all of which are tailored to the low-resource Odia language. This work contributes to developing NLP tools and technologies for low-resource languages, thereby facilitating enhanced language processing capabilities in various linguistic contexts.
  • On the Size of an r-wise fractional L-intersecting family
    Tapas Kumar Mishra
    Journal of Combinatorics, 2024
  • Deep Learning-Based Hand Gesture Recognition System and Design of a Human–Machine Interface
    Abir Sen, Tapas Kumar Mishra, Ratnakar Dash
    Neural Processing Letters, 2023
  • Part-of-Speech Tagging of Odia Language Using Statistical and Deep Learning Based Approaches
    Tusarkanta Dalai, Tapas Kumar Mishra, Pankaj K. Sa
    ACM Transactions on Asian and Low Resource Language Information Processing, 2023
    Automatic part-of-speech (POS) tagging is a preprocessing step of many natural language processing tasks, such as named entity recognition, speech processing, information extraction, word sense disambiguation, and machine translation. It has already gained promising results in English and European languages. However, in Indian languages, particularly in the Odia language, it is not yet well explored because of the lack of supporting tools, resources, and morphological richness of the language. Unfortunately, we were unable to locate an open source POS tagger for the Odia language, and only a handful of attempts have been made to develop POS taggers for the Odia language. The main contribution of this research work is to present statistical approaches such as the maximum entropy Markov model and conditional random field (CRF), as well as deep learning based approaches, including the convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM) to develop the Odia POS tagger. A publicly accessible corpus annotated with the Bureau of Indian Standards (BIS) tagset is used in our work. However, most of the languages around the globe have used the dataset annotated with the Universal Dependencies (UD) tagset. Hence, to maintain uniformity, the Odia dataset should use the same tagset. Thus, following the BIS and UD guidelines, we constructed a mapping from the BIS tagset to the UD tagset. The maximum entropy Markov model, CRF, Bi-LSTM, and CNN models are trained using the Indian Languages Corpora Initiative corpus with the BIS and UD tagsets. We have experimented with various feature sets as input to the statistical models to prepare a baseline system and observed the impact of constructed feature sets. The deep learning based model includes the Bi-LSTM network, the CNN network, the CRF layer, character sequence information, and a pre-trained word vector. Seven different combinations of neural sequence labeling models are implemented, and their performance measures are investigated. It has been observed that the Bi-LSTM model with the character sequence feature and pre-trained word vector achieved a result with 94.58% accuracy.
  • Improving Multilingual Neural Machine Translation System for Indic Languages
    Sudhansu Bala Das, Atharv Biradar, Tapas Kumar Mishra, Bidyut Kr. Patra
    ACM Transactions on Asian and Low Resource Language Information Processing, 2023
    The Machine Translation System (MTS) serves as effective tool for communication by translating text or speech from one language to another language. Recently, neural machine translation (NMT) has become popular for its performance and cost-effectiveness. However, NMT systems are restricted in translating low-resource languages as a huge quantity of data is required to learn useful mappings across languages. The need for an efficient translation system becomes obvious in a large multilingual environment like India. Indian languages (ILs) are still treated as low-resource languages due to unavailability of corpora. In order to address such an asymmetric nature, the multilingual neural machine translation (MNMT) system evolves as an ideal approach in this direction. The MNMT converts many languages using a single model, which is extremely useful in terms of training process and lowering online maintenance costs. It is also helpful for improving low-resource translation. In this article, we propose an MNMT system to address the issues related to low-resource language translation. Our model comprises two MNMT systems, i.e., for English-Indic (one-to-many) and for Indic-English (many-to-one) with a shared encoder-decoder containing 15 language pairs (30 translation directions). Since most of IL pairs have a scanty amount of parallel corpora, not sufficient for training any machine translation model, we explore various augmentation strategies to improve overall translation quality through the proposed model. A state-of-the-art transformer architecture is used to realize the proposed model. In addition, the article addresses the use of language relationships (in terms of dialect, script, etc.), particularly about the role of high-resource languages of the same family in boosting the performance of low-resource languages. Moreover, the experimental results also show the advantage of back-translation and domain adaptation for ILs to enhance the translation quality of both source and target languages. Using all these key approaches, our proposed model emerges to be more efficient than the baseline model in terms of evaluation metrics, i.e., BLEU (BiLingual Evaluation Understudy) score for a set of ILs.
  • A novel hand gesture detection and recognition system based on ensemble-based convolutional neural network
    Abir Sen, Tapas Kumar Mishra, Ratnakar Dash
    Multimedia Tools and Applications, 2022
  • Source code auto-completion using various deep learning models under limited computing resources
    Madhab Sharma, Tapas Kumar Mishra, Arun Kumar
    Complex and Intelligent Systems, 2022
  • NIT Rourkela Machine Translation(MT) System Submission to WAT 2022 for MultiIndicMT: An Indic Language Multilingual Shared Task
    Proceedings International Conference on Computational Linguistics Coling, 2022
  • Modular and fractional L-intersecting families of vector spaces
    Rogers Mathew, Tapas Kumar Mishra, Ritabrata Ray, Shashank Srivastava
    Electronic Journal of Combinatorics, 2022
  • A Survey: Security Issues and Challenges in Internet of Things
    Balaji Yedle, Gunjan Shrivastava, Arun Kumar, Alekha Kumar Mishra, Tapas Kumar Mishra
    Lecture Notes in Networks and Systems, 2021
  • System of unbiased representatives for a collection of bicolorings
    Niranjan Balachandran, Rogers Mathew, Tapas Kumar Mishra, Sudebkumar Prasant Pal
    Discrete Applied Mathematics, 2020
  • A Combinatorial Proof of Fisher’s Inequality
    Rogers Mathew, Tapas Kumar Mishra
    Graphs and Combinatorics, 2020
  • Bisecting and D-secting families for set systems
    Niranjan Balachandran, Rogers Mathew, Tapas Kumar Mishra, Sudebkumar Prasant Pal
    Discrete Applied Mathematics, 2020
  • Boundary Detection in Dynamic Wireless Sensor Networks using Convex Hull Techniques
    Tapas Kumar Mishra, Jayadeep Sadhu, Arun Kumar
    2020 IEEE Calcutta Conference Calcon 2020 Proceedings, 2020
  • Analyzing the Linguistic Structure of Questions to Make Unanswered Questions Answered
    Shashank Bhatt, Tapas Kumar Mishra
    Communications in Computer and Information Science, 2020
  • Fractional L-intersecting families
    Niranjan Balachandran, Rogers Mathew, Tapas Kumar Mishra
    Electronic Journal of Combinatorics, 2019
  • Induced-bisecting families of bicolorings for hypergraphs
    Niranjan Balachandran, Rogers Mathew, Tapas Kumar Mishra, Sudebkumar Prasant Pal
    Discrete Mathematics, 2018
  • Lower bounds for Ramsey numbers for complete bipartite and 3-uniform tripartite subgraphs
    Tapas Kumar Mishra, Sudebkumar Prasant Pal
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2013
  • Lower bounds for Ramsey numbers for complete bipartite and 3-uniform tripartite subgraphs
    Tapas Kumar Mishra, Sudebkumar Prasant Pal
    Journal of Graph Algorithms and Applications, 2013

RECENT SCHOLAR PUBLICATIONS

  • Erd\H {o} s Matching (Conjecture) Theorem
    TK Mishra
    arXiv preprint arXiv:2602.01471 , 2026
    2026
  • The Linear Arboricity Conjecture for Graphs with Large Girth
    TK Mishra
    arXiv preprint arXiv:2512.11240 , 2025
    2025
  • Development of a Low-Cost Named Entity Recognition System for Odia Language using Deep Active Learning
    T Dalai, TK Mishra, PK Sa, P Mohanty, C Swain, AK Nayak
    Proceedings of the Workshop on Beyond English: Natural Language Processing … , 2025
    2025
  • A thresholding method for Improving translation Quality for Indic MT task
    SB Das, LR Rodrigues, TK Mishra, BK Patra
    Proceedings of the First Workshop on Advancing NLP for Low-Resource … , 2025
    2025
  • OdNER: NER resource creation and system development for low-resource Odia language
    T Dalai, A Das, TK Mishra, PK Sa
    Natural Language Processing Journal 11, 100139 , 2025
    2025
    Citations: 6
  • Comparative analysis of subword tokenization approaches for Indian languages
    SB Das, S Choudhury, TK Mishra, BK Patra
    arXiv preprint arXiv:2505.16868 , 2025
    2025
    Citations: 5
  • Comparative analysis of subword tokenization approaches for Indian languages
    S Bala Das, S Choudhury, TK Mishra, BK Patra
    arXiv e-prints, arXiv: 2505.16868 , 2025
    2025
  • Statistical machine translation for indic languages
    SB Das, D Panda, TK Mishra, BK Patra
    Natural Language Processing 31 (2), 328-345 , 2025
    2025
    Citations: 25
  • Investigating the Effect of Backtranslation for Indic Languages
    SB Das, S Choudhury, TK Mishra, BK Patra
    Proceedings of the First Workshop on Natural Language Processing for Indo … , 2025
    2025
    Citations: 5
  • HGR-FYOLO: a robust hand gesture recognition system for the normal and physically impaired person using frozen YOLOv5
    A Sen, S Dombe, TK Mishra, R Dash
    Multimedia Tools and Applications 83 (30), 73797-73815 , 2024
    2024
    Citations: 8
  • Novel Human Machine Interface via Robust Hand Gesture Recognition System using Channel Pruned YOLOv5s Model
    A Sen, TK Mishra, R Dash
    arXiv preprint arXiv:2407.02585 , 2024
    2024
    Citations: 3
  • Multilingual Neural Machine Translation for Indic to Indic Languages
    S Bala Das, D Panda, T Kumar Mishra, B Kr. Patra, A Ekbal
    ACM Transactions on Asian and Low-Resource Language Information Processing … , 2024
    2024
    Citations: 36
  • Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers
    T Dalai, TK Mishra, PK Sa
    ACM Transactions on Asian and Low-Resource Language Information Processing … , 2024
    2024
    Citations: 15
  • An approach for mistranslation removal from popular dataset for Indic MT Task
    SB Das, LR Rodrigues, TK Mishra, BK Patra
    arXiv preprint arXiv:2401.06398 , 2024
    2024
    Citations: 5
  • An approach for mistranslation removal from popular dataset for Indic MT Task
    S Bala Das, LR Rodrigues, TK Mishra, BK Patra
    arXiv e-prints, arXiv: 2401.06398 , 2024
    2024
  • On the size of an -wise fractional -intersecting family
    TK Mishra
    Journal of Combinatorics 15 (1), 77-87 , 2024
    2024
    Citations: 2
  • Deep Learning-Based Hand Gesture Recognition System and Design of a Human–Machine Interface
    A Sen, TK Mishra, R Dash
    Neural Processing Letters 55 (9), 12569-12596 , 2023
    2023
    Citations: 33
  • Improving multilingual neural machine translation system for Indic languages
    SB Das, A Biradar, TK Mishra, BK Patra
    ACM Transactions on Asian and Low-Resource Language Information Processing … , 2023
    2023
    Citations: 61
  • Part-of-speech tagging of Odia language using statistical and deep learning based approaches
    T Dalai, TK Mishra, PK Sa
    ACM Transactions on Asian and Low-Resource Language Information Processing … , 2023
    2023
    Citations: 41
  • A novel hand gesture detection and recognition system based on ensemble-based convolutional neural network
    A Sen, TK Mishra, R Dash
    Multimedia Tools and Applications 81 (28), 40043-40066 , 2022
    2022
    Citations: 39

MOST CITED SCHOLAR PUBLICATIONS

  • Improving multilingual neural machine translation system for Indic languages
    SB Das, A Biradar, TK Mishra, BK Patra
    ACM Transactions on Asian and Low-Resource Language Information Processing … , 2023
    2023
    Citations: 61
  • Part-of-speech tagging of Odia language using statistical and deep learning based approaches
    T Dalai, TK Mishra, PK Sa
    ACM Transactions on Asian and Low-Resource Language Information Processing … , 2023
    2023
    Citations: 41
  • A novel hand gesture detection and recognition system based on ensemble-based convolutional neural network
    A Sen, TK Mishra, R Dash
    Multimedia Tools and Applications 81 (28), 40043-40066 , 2022
    2022
    Citations: 39
  • Multilingual Neural Machine Translation for Indic to Indic Languages
    S Bala Das, D Panda, T Kumar Mishra, B Kr. Patra, A Ekbal
    ACM Transactions on Asian and Low-Resource Language Information Processing … , 2024
    2024
    Citations: 36
  • Deep Learning-Based Hand Gesture Recognition System and Design of a Human–Machine Interface
    A Sen, TK Mishra, R Dash
    Neural Processing Letters 55 (9), 12569-12596 , 2023
    2023
    Citations: 33
  • Statistical machine translation for indic languages
    SB Das, D Panda, TK Mishra, BK Patra
    Natural Language Processing 31 (2), 328-345 , 2025
    2025
    Citations: 25
  • Blockchain: Basics, applications, challenges and opportunities
    J Arya, A Kumar, AP Singh, TK Mishra, PHJ Chong
    Jan , 2021
    2021
    Citations: 19
  • Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers
    T Dalai, TK Mishra, PK Sa
    ACM Transactions on Asian and Low-Resource Language Information Processing … , 2024
    2024
    Citations: 15
  • Fractional L-intersecting Families
    N Balachandran, R Mathew, TK Mishra
    The Electronic Journal of Combinatorics 26 (2), 2.40 , 2019
    2019
    Citations: 13
  • Source code auto-completion using various deep learning models under limited computing resources
    M Sharma, TK Mishra, A Kumar
    Complex & Intelligent Systems 8 (5), 4357-4368 , 2022
    2022
    Citations: 9
  • HGR-FYOLO: a robust hand gesture recognition system for the normal and physically impaired person using frozen YOLOv5
    A Sen, S Dombe, TK Mishra, R Dash
    Multimedia Tools and Applications 83 (30), 73797-73815 , 2024
    2024
    Citations: 8
  • NIT Rourkela machine translation (MT) system submission to WAT 2022 for MultiIndicMT: An Indic language multilingual shared task
    SB Das, A Biradar, TK Mishra, BK Patra
    Proceedings of the 9th Workshop on Asian Translation, 73-77 , 2022
    2022
    Citations: 7
  • A Combinatorial Proof of Fisher’s Inequality
    R Mathew, TK Mishra
    Graphs and Combinatorics 36 (6), 1953-1956 , 2020
    2020
    Citations: 7
  • Boundary detection in dynamic wireless sensor networks using convex hull techniques
    TK Mishra, J Sadhu, A Kumar
    2020 IEEE Calcutta Conference (CALCON), 368-372 , 2020
    2020
    Citations: 7
  • OdNER: NER resource creation and system development for low-resource Odia language
    T Dalai, A Das, TK Mishra, PK Sa
    Natural Language Processing Journal 11, 100139 , 2025
    2025
    Citations: 6
  • Modular and Fractional L -Intersecting Families of Vector Spaces
    SS Rogers Mathew, Tapas Kumar Mishra, Ritabrata Ray
    the electronic journal of combinatorics 29 (1), P1.45 , 2022
    2022
    Citations: 6
  • Comparative analysis of subword tokenization approaches for Indian languages
    SB Das, S Choudhury, TK Mishra, BK Patra
    arXiv preprint arXiv:2505.16868 , 2025
    2025
    Citations: 5
  • Investigating the Effect of Backtranslation for Indic Languages
    SB Das, S Choudhury, TK Mishra, BK Patra
    Proceedings of the First Workshop on Natural Language Processing for Indo … , 2025
    2025
    Citations: 5
  • An approach for mistranslation removal from popular dataset for Indic MT Task
    SB Das, LR Rodrigues, TK Mishra, BK Patra
    arXiv preprint arXiv:2401.06398 , 2024
    2024
    Citations: 5
  • Bisecting and D-secting families for set systems
    N Balachandran, R Mathew, TK Mishra, SP Pal
    Discrete Applied Mathematics 280, 2-13 , 2020
    2020
    Citations: 5