Sawan Rai

Scopus Publications

Large scale annotated dataset for code-mix abusive short noisy text
Paras Tiwari, Sawan Rai, C. Ravindranath Chowdary
Language Resources and Evaluation, 2025
On the Impact of Chunking Strategies in NLP Pipelines: A Multi-Task Empirical Study
Sawan Rai, Ramesh Chandra Belwal
Ocit 2025 Proceedings 23rd Oits International Conference on Information Technology, 2025
Chunking input text is a crucial preprocessing step when using Large Language Models (LLMs) for long or structured documents. However, its impact on downstream task performance remains underexplored. This study presents a comprehensive empirical analysis evaluating the effect of various chunking strategies: fixed-size, overlapping, sentence-based, and paragraph-based, across three fundamental NLP tasks: question answering, text classification, and abstractive summarization. Experiments were conducted using lightweight, open-access models such as Flan-T5, GPT-2, DistilBERT, and RoBERTa on benchmark datasets including SQuAD, CoQA, QuAC, IMDB, Amazon Polarity, CNN/DailyMail, and XSum. Performance was measured using task-appropriate metrics (ROUGE, EM, F1, precision, recall) along with latency. Results reveal that chunking strategies significantly affect performance and latency, with no single approach universally optimal. These findings highlight the need for task-specific chunking choices in practical LLM deployments, especially under resource constraints.
A Review of Existing Conversational Recommendation Systems
Subiya Zaidi, Sawan Rai, Kapil Juneja
2024 2nd International Conference on Disruptive Technologies Icdt 2024, 2024
ChatGPT, Alexa, Siri, Okay Google are an indispensable part of our lives today. These assistants are referred to as Digital Assistants and enable users to communicate their choices through natural language. The Digital Assistants ease the customer task of selecting items in various applications like movies, songs and so on. This process of making a choice through natural language conversations is known as a Conversational Recommender system (CoRS). CoRS is a dialogue-based model which aims to provide customer with accurate and quality recommendations. The interaction-oriented method gives the customer an edge over the traditional way of seeking recommendations. The traditional recommendation systems are static in nature and derive information through past history of the customer. A CoRS mitigates the challenges faced in the earlier methods of recommendation like cold start where in a new user is often recommended inaccurate choices. Other issues like data sparsity and lack of diversity due to not so updated content to choose from are common. CoRS is dynamic in nature, it works on delivering high end choices by interpreting the customer demands one dialogue at a time. This comprehensive survey aims to give an overview of the research in progress using conversation as a means to achieve better results for recommendation systems.
Accurate module name prediction using similarity based and sequence generation models
Sawan Rai, Ramesh Chandra Belwal, Atul Gupta
Journal of Ambient Intelligence and Humanized Computing, 2023
Extractive text summarization using clustering-based topic modeling
Ramesh Chandra Belwal, Sawan Rai, Atul Gupta
Soft Computing, 2023
Is the Corpus Ready for Machine Translation? A Case Study with Python to Pseudo-Code Corpus
Sawan Rai, Ramesh Chandra Belwal, Atul Gupta
Arabian Journal for Science and Engineering, 2023
Investigating the Application of Multi-lingual Transformer in Graph-Based Extractive Text Summarization for Hindi Text
Sawan Rai, Ramesh Chandra Belwal, Abhinav Sharma
Lecture Notes in Networks and Systems, 2023
A Mathematical Model for the Effect of Vaccination on COVID-19 Epidemic Spread
Avaneesh Singh, Sawan Rai, Manish Kumar Bajpai
Lecture Notes in Electrical Engineering, 2023
Advanced Hierarchical Topic Labeling for Short Text
Paras Tiwari, Ashutosh Tripathi, Avaneesh Singh, Sawan Rai
IEEE Access, 2023
Hierarchical Topic Modeling is the probabilistic approach for discovering latent topics distributed hierarchically among the documents. The distributed topics are represented with the respective topic terms. An unambiguous conclusion from the topic term distribution is a challenge for readers. The hierarchical topic labeling eases the challenge by facilitating an individual, appropriate label for each topic at every level. In this work, we propose a BERT-embedding inspired methodology for labeling hierarchical topics in short text corpora. The short texts have gained significant popularity on multiple platforms in diverse domains. The limited information available in the short text makes it difficult to deal with. In our work, we have used three diverse short text datasets that include both structured and unstructured instances. Such diversity ensures the broad application scope of this work. Considering the relevancy factor of the labels, the proposed methodology has been compared against both automatic and human annotators. Our proposed methodology outperformed the benchmark with an average score of 0.4185, 49.50, and 49.16 for cosine similarity, exact match, and partial match, respectively.
Generating class name in sequential manner using convolution attention neural network
Sawan Rai, Ramesh Chandra Belwal, Atul Gupta
Expert Systems with Applications, 2022
A Review on Source Code Documentation
Sawan Rai, Ramesh Chandra Belwal, Atul Gupta
ACM Transactions on Intelligent Systems and Technology, 2022
Effect of Identifier Tokenization on Automatic Source Code Documentation
Sawan Rai, Ramesh Chandra Belwal, Atul Gupta
Arabian Journal for Science and Engineering, 2022
A new graph-based extractive text summarization using keywords or topic modeling
Ramesh Chandra Belwal, Sawan Rai, Atul Gupta
Journal of Ambient Intelligence and Humanized Computing, 2021
Text summarization using topic-based vector space model and semantic measure
Ramesh Chandra Belwal, Sawan Rai, Atul Gupta
Information Processing and Management, 2021
Development of web browser prototype with embedded classification capability for mitigating Cross-Site Scripting attacks
Vikas K. Malviya, Sawan Rai, Atul Gupta
Applied Soft Computing, 2021
Mind Your Tweet: Abusive Tweet Detection
Paras Tiwari, Sawan Rai
Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2021
Development of a plugin based extensible feature extraction framework
Vikas Malviya, Sawan Rai, Atul Gupta
Proceedings of the ACM Symposium on Applied Computing, 2018
Method Level Text Summarization for Java Code Using Nano-Patterns
Sawan Rai, Tejaswini Gaikwad, Sparshi Jain, Atul Gupta
Proceedings Asia Pacific Software Engineering Conference APSEC, 2017

RECENT SCHOLAR PUBLICATIONS

On the Impact of Chunking Strategies in NLP Pipelines: A Multi-Task Empirical Study
S Rai, RC Belwal
2025 OITS International Conference on Information Technology (OCIT), 1-6 , 2025
2025
Creating and Evaluating Code-Mixed Nepali-English and Telugu-English Datasets for Abusive Language Detection Using Traditional and Deep Learning Models
M Pandey, NP Yadav, M Adduru, S Rai
arXiv preprint arXiv:2504.21026 , 2025
2025
Citations: 3
Large scale annotated dataset for code-mix abusive short noisy text
P Tiwari, S Rai, CR Chowdary
Language Resources and Evaluation 59 (1), 191-218 , 2025
2025
Citations: 7
A review of existing conversational recommendation systems
S Zaidi, S Rai, K Juneja
2024 2nd International Conference on Disruptive Technologies (ICDT), 22-26 , 2024
2024
Citations: 5
Accurate module name prediction using similarity based and sequence generation models
S Rai, RC Belwal, A Gupta
Journal of Ambient Intelligence and Humanized Computing 14 (9), 11531-11543 , 2023
2023
Citations: 1
A mathematical model for the effect of vaccination on COVID-19 epidemic spread
A Singh, S Rai, MK Bajpai
Machine Vision and Augmented Intelligence: Select Proceedings of MAI 2022 … , 2023
2023
Citations: 1
Advanced hierarchical topic labeling for short text
P Tiwari, A Tripathi, A Singh, S Rai
IEEE access 11, 35158-35174 , 2023
2023
Citations: 8
Extractive text summarization using clustering-based topic modeling
RC Belwal, S Rai, A Gupta
Soft Computing 27 (7), 3965-3982 , 2023
2023
Citations: 33
Is the corpus ready for machine translation? A case study with Python to pseudo-code corpus
S Rai, RC Belwal, A Gupta
Arabian Journal for Science and Engineering 48 (2), 1845-1858 , 2023
2023
Citations: 9
Investigating the Application of Multi-lingual Transformer in Graph-Based Extractive Text Summarization for Hindi Text
S Rai, RC Belwal, A Sharma
International Conference on Data Management, Analytics & Innovation, 393-403 , 2023
2023
Citations: 1
Generating class name in sequential manner using convolution attention neural network
S Rai, RC Belwal, A Gupta
Expert Systems with Applications 199, 116854 , 2022
2022
Citations: 1
A review on source code documentation
S Rai, RC Belwal, A Gupta
ACM Transactions on Intelligent Systems and Technology (TIST) 13 (5), 1-44 , 2022
2022
Citations: 53
Effect of identifier tokenization on automatic source code documentation
S Rai, RC Belwal, A Gupta
Arabian Journal for Science and Engineering 47 (2), 2141-2157 , 2022
2022
Citations: 5
A new graph-based extractive text summarization using keywords or topic modeling
RC Belwal, S Rai, A Gupta
Journal of Ambient Intelligence and Humanized Computing 12 (10), 8975-8990 , 2021
2021
Citations: 86
Mind your tweet: Abusive tweet detection
P Tiwari, S Rai
International Conference on Speech and Computer, 704-715 , 2021
2021
Citations: 6
Text summarization using topic-based vector space model and semantic measure
RC Belwal, S Rai, A Gupta
Information Processing & Management 58 (3), 102536 , 2021
2021
Citations: 103
Development of web browser prototype with embedded classification capability for mitigating Cross-Site Scripting attacks
VK Malviya, S Rai, A Gupta
Applied Soft Computing 102, 106873 , 2021
2021
Citations: 26
Generation of pseudo code from the python source code using rule-based machine translation
S Rai, A Gupta
arXiv preprint arXiv:1906.06117 , 2019
2019
Citations: 8
Development of a plugin based extensible feature extraction framework
V Malviya, S Rai, A Gupta
Proceedings of the 33rd Annual ACM Symposium on Applied Computing, 1840-1847 , 2018
2018
Citations: 4
Method level text summarization for java code using nano-patterns
S Rai, T Gaikwad, S Jain, A Gupta
2017 24th Asia-Pacific Software Engineering Conference (APSEC), 199-208 , 2017
2017
Citations: 15

MOST CITED SCHOLAR PUBLICATIONS

Text summarization using topic-based vector space model and semantic measure
RC Belwal, S Rai, A Gupta
Information Processing & Management 58 (3), 102536 , 2021
2021
Citations: 103
A new graph-based extractive text summarization using keywords or topic modeling
RC Belwal, S Rai, A Gupta
Journal of Ambient Intelligence and Humanized Computing 12 (10), 8975-8990 , 2021
2021
Citations: 86
A review on source code documentation
S Rai, RC Belwal, A Gupta
ACM Transactions on Intelligent Systems and Technology (TIST) 13 (5), 1-44 , 2022
2022
Citations: 53
Extractive text summarization using clustering-based topic modeling
RC Belwal, S Rai, A Gupta
Soft Computing 27 (7), 3965-3982 , 2023
2023
Citations: 33
Development of web browser prototype with embedded classification capability for mitigating Cross-Site Scripting attacks
VK Malviya, S Rai, A Gupta
Applied Soft Computing 102, 106873 , 2021
2021
Citations: 26
Method level text summarization for java code using nano-patterns
S Rai, T Gaikwad, S Jain, A Gupta
2017 24th Asia-Pacific Software Engineering Conference (APSEC), 199-208 , 2017
2017
Citations: 15
Is the corpus ready for machine translation? A case study with Python to pseudo-code corpus
S Rai, RC Belwal, A Gupta
Arabian Journal for Science and Engineering 48 (2), 1845-1858 , 2023
2023
Citations: 9
Advanced hierarchical topic labeling for short text
P Tiwari, A Tripathi, A Singh, S Rai
IEEE access 11, 35158-35174 , 2023
2023
Citations: 8
Generation of pseudo code from the python source code using rule-based machine translation
S Rai, A Gupta
arXiv preprint arXiv:1906.06117 , 2019
2019
Citations: 8
Large scale annotated dataset for code-mix abusive short noisy text
P Tiwari, S Rai, CR Chowdary
Language Resources and Evaluation 59 (1), 191-218 , 2025
2025
Citations: 7
Mind your tweet: Abusive tweet detection
P Tiwari, S Rai
International Conference on Speech and Computer, 704-715 , 2021
2021
Citations: 6
A review of existing conversational recommendation systems
S Zaidi, S Rai, K Juneja
2024 2nd International Conference on Disruptive Technologies (ICDT), 22-26 , 2024
2024
Citations: 5
Effect of identifier tokenization on automatic source code documentation
S Rai, RC Belwal, A Gupta
Arabian Journal for Science and Engineering 47 (2), 2141-2157 , 2022
2022
Citations: 5
Development of a plugin based extensible feature extraction framework
V Malviya, S Rai, A Gupta
Proceedings of the 33rd Annual ACM Symposium on Applied Computing, 1840-1847 , 2018
2018
Citations: 4
Creating and Evaluating Code-Mixed Nepali-English and Telugu-English Datasets for Abusive Language Detection Using Traditional and Deep Learning Models
M Pandey, NP Yadav, M Adduru, S Rai
arXiv preprint arXiv:2504.21026 , 2025
2025
Citations: 3
Accurate module name prediction using similarity based and sequence generation models
S Rai, RC Belwal, A Gupta
Journal of Ambient Intelligence and Humanized Computing 14 (9), 11531-11543 , 2023
2023
Citations: 1
A mathematical model for the effect of vaccination on COVID-19 epidemic spread
A Singh, S Rai, MK Bajpai
Machine Vision and Augmented Intelligence: Select Proceedings of MAI 2022 … , 2023
2023
Citations: 1
Investigating the Application of Multi-lingual Transformer in Graph-Based Extractive Text Summarization for Hindi Text
S Rai, RC Belwal, A Sharma
International Conference on Data Management, Analytics & Innovation, 393-403 , 2023
2023
Citations: 1
Generating class name in sequential manner using convolution attention neural network
S Rai, RC Belwal, A Gupta
Expert Systems with Applications 199, 116854 , 2022
2022
Citations: 1
On the Impact of Chunking Strategies in NLP Pipelines: A Multi-Task Empirical Study
S Rai, RC Belwal
2025 OITS International Conference on Information Technology (OCIT), 1-6 , 2025
2025

Sawan Rai

RESEARCH, TEACHING, or OTHER INTERESTS

Scopus Publications

RECENT SCHOLAR PUBLICATIONS

MOST CITED SCHOLAR PUBLICATIONS