Sawan Rai

@srmap.edu.in

Assistant Professor CSE
SRM University-AP

Sawan Rai

RESEARCH, TEACHING, or OTHER INTERESTS

Software, Artificial Intelligence, Computer Science, Computer Science Applications
18

Scopus Publications

375

Scholar Citations

8

Scholar h-index

6

Scholar i10-index

Scopus Publications

  • Large scale annotated dataset for code-mix abusive short noisy text
    Paras Tiwari, Sawan Rai, C. Ravindranath Chowdary
    Language Resources and Evaluation, 2025
  • On the Impact of Chunking Strategies in NLP Pipelines: A Multi-Task Empirical Study
    Sawan Rai, Ramesh Chandra Belwal
    Ocit 2025 Proceedings 23rd Oits International Conference on Information Technology, 2025
    Chunking input text is a crucial preprocessing step when using Large Language Models (LLMs) for long or structured documents. However, its impact on downstream task performance remains underexplored. This study presents a comprehensive empirical analysis evaluating the effect of various chunking strategies: fixed-size, overlapping, sentence-based, and paragraph-based, across three fundamental NLP tasks: question answering, text classification, and abstractive summarization. Experiments were conducted using lightweight, open-access models such as Flan-T5, GPT-2, DistilBERT, and RoBERTa on benchmark datasets including SQuAD, CoQA, QuAC, IMDB, Amazon Polarity, CNN/DailyMail, and XSum. Performance was measured using task-appropriate metrics (ROUGE, EM, F1, precision, recall) along with latency. Results reveal that chunking strategies significantly affect performance and latency, with no single approach universally optimal. These findings highlight the need for task-specific chunking choices in practical LLM deployments, especially under resource constraints.
  • A Review of Existing Conversational Recommendation Systems
    Subiya Zaidi, Sawan Rai, Kapil Juneja
    2024 2nd International Conference on Disruptive Technologies Icdt 2024, 2024
    ChatGPT, Alexa, Siri, Okay Google are an indispensable part of our lives today. These assistants are referred to as Digital Assistants and enable users to communicate their choices through natural language. The Digital Assistants ease the customer task of selecting items in various applications like movies, songs and so on. This process of making a choice through natural language conversations is known as a Conversational Recommender system (CoRS). CoRS is a dialogue-based model which aims to provide customer with accurate and quality recommendations. The interaction-oriented method gives the customer an edge over the traditional way of seeking recommendations. The traditional recommendation systems are static in nature and derive information through past history of the customer. A CoRS mitigates the challenges faced in the earlier methods of recommendation like cold start where in a new user is often recommended inaccurate choices. Other issues like data sparsity and lack of diversity due to not so updated content to choose from are common. CoRS is dynamic in nature, it works on delivering high end choices by interpreting the customer demands one dialogue at a time. This comprehensive survey aims to give an overview of the research in progress using conversation as a means to achieve better results for recommendation systems.
  • Accurate module name prediction using similarity based and sequence generation models
    Sawan Rai, Ramesh Chandra Belwal, Atul Gupta
    Journal of Ambient Intelligence and Humanized Computing, 2023
  • Extractive text summarization using clustering-based topic modeling
    Ramesh Chandra Belwal, Sawan Rai, Atul Gupta
    Soft Computing, 2023
  • Is the Corpus Ready for Machine Translation? A Case Study with Python to Pseudo-Code Corpus
    Sawan Rai, Ramesh Chandra Belwal, Atul Gupta
    Arabian Journal for Science and Engineering, 2023
  • Investigating the Application of Multi-lingual Transformer in Graph-Based Extractive Text Summarization for Hindi Text
    Sawan Rai, Ramesh Chandra Belwal, Abhinav Sharma
    Lecture Notes in Networks and Systems, 2023
  • A Mathematical Model for the Effect of Vaccination on COVID-19 Epidemic Spread
    Avaneesh Singh, Sawan Rai, Manish Kumar Bajpai
    Lecture Notes in Electrical Engineering, 2023
  • Advanced Hierarchical Topic Labeling for Short Text
    Paras Tiwari, Ashutosh Tripathi, Avaneesh Singh, Sawan Rai
    IEEE Access, 2023
    Hierarchical Topic Modeling is the probabilistic approach for discovering latent topics distributed hierarchically among the documents. The distributed topics are represented with the respective topic terms. An unambiguous conclusion from the topic term distribution is a challenge for readers. The hierarchical topic labeling eases the challenge by facilitating an individual, appropriate label for each topic at every level. In this work, we propose a BERT-embedding inspired methodology for labeling hierarchical topics in short text corpora. The short texts have gained significant popularity on multiple platforms in diverse domains. The limited information available in the short text makes it difficult to deal with. In our work, we have used three diverse short text datasets that include both structured and unstructured instances. Such diversity ensures the broad application scope of this work. Considering the relevancy factor of the labels, the proposed methodology has been compared against both automatic and human annotators. Our proposed methodology outperformed the benchmark with an average score of 0.4185, 49.50, and 49.16 for cosine similarity, exact match, and partial match, respectively.
  • Generating class name in sequential manner using convolution attention neural network
    Sawan Rai, Ramesh Chandra Belwal, Atul Gupta
    Expert Systems with Applications, 2022
  • A Review on Source Code Documentation
    Sawan Rai, Ramesh Chandra Belwal, Atul Gupta
    ACM Transactions on Intelligent Systems and Technology, 2022
  • Effect of Identifier Tokenization on Automatic Source Code Documentation
    Sawan Rai, Ramesh Chandra Belwal, Atul Gupta
    Arabian Journal for Science and Engineering, 2022
  • A new graph-based extractive text summarization using keywords or topic modeling
    Ramesh Chandra Belwal, Sawan Rai, Atul Gupta
    Journal of Ambient Intelligence and Humanized Computing, 2021
  • Text summarization using topic-based vector space model and semantic measure
    Ramesh Chandra Belwal, Sawan Rai, Atul Gupta
    Information Processing and Management, 2021
  • Development of web browser prototype with embedded classification capability for mitigating Cross-Site Scripting attacks
    Vikas K. Malviya, Sawan Rai, Atul Gupta
    Applied Soft Computing, 2021
  • Mind Your Tweet: Abusive Tweet Detection
    Paras Tiwari, Sawan Rai
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2021
  • Development of a plugin based extensible feature extraction framework
    Vikas Malviya, Sawan Rai, Atul Gupta
    Proceedings of the ACM Symposium on Applied Computing, 2018
  • Method Level Text Summarization for Java Code Using Nano-Patterns
    Sawan Rai, Tejaswini Gaikwad, Sparshi Jain, Atul Gupta
    Proceedings Asia Pacific Software Engineering Conference APSEC, 2017

RECENT SCHOLAR PUBLICATIONS

  • On the Impact of Chunking Strategies in NLP Pipelines: A Multi-Task Empirical Study
    S Rai, RC Belwal
    2025 OITS International Conference on Information Technology (OCIT), 1-6 , 2025
    2025
  • Creating and Evaluating Code-Mixed Nepali-English and Telugu-English Datasets for Abusive Language Detection Using Traditional and Deep Learning Models
    M Pandey, NP Yadav, M Adduru, S Rai
    arXiv preprint arXiv:2504.21026 , 2025
    2025
    Citations: 3
  • Large scale annotated dataset for code-mix abusive short noisy text
    P Tiwari, S Rai, CR Chowdary
    Language Resources and Evaluation 59 (1), 191-218 , 2025
    2025
    Citations: 7
  • A review of existing conversational recommendation systems
    S Zaidi, S Rai, K Juneja
    2024 2nd International Conference on Disruptive Technologies (ICDT), 22-26 , 2024
    2024
    Citations: 5
  • Accurate module name prediction using similarity based and sequence generation models
    S Rai, RC Belwal, A Gupta
    Journal of Ambient Intelligence and Humanized Computing 14 (9), 11531-11543 , 2023
    2023
    Citations: 1
  • A mathematical model for the effect of vaccination on COVID-19 epidemic spread
    A Singh, S Rai, MK Bajpai
    Machine Vision and Augmented Intelligence: Select Proceedings of MAI 2022 … , 2023
    2023
    Citations: 1
  • Advanced hierarchical topic labeling for short text
    P Tiwari, A Tripathi, A Singh, S Rai
    IEEE access 11, 35158-35174 , 2023
    2023
    Citations: 8
  • Extractive text summarization using clustering-based topic modeling
    RC Belwal, S Rai, A Gupta
    Soft Computing 27 (7), 3965-3982 , 2023
    2023
    Citations: 33
  • Is the corpus ready for machine translation? A case study with Python to pseudo-code corpus
    S Rai, RC Belwal, A Gupta
    Arabian Journal for Science and Engineering 48 (2), 1845-1858 , 2023
    2023
    Citations: 9
  • Investigating the Application of Multi-lingual Transformer in Graph-Based Extractive Text Summarization for Hindi Text
    S Rai, RC Belwal, A Sharma
    International Conference on Data Management, Analytics & Innovation, 393-403 , 2023
    2023
    Citations: 1
  • Generating class name in sequential manner using convolution attention neural network
    S Rai, RC Belwal, A Gupta
    Expert Systems with Applications 199, 116854 , 2022
    2022
    Citations: 1
  • A review on source code documentation
    S Rai, RC Belwal, A Gupta
    ACM Transactions on Intelligent Systems and Technology (TIST) 13 (5), 1-44 , 2022
    2022
    Citations: 53
  • Effect of identifier tokenization on automatic source code documentation
    S Rai, RC Belwal, A Gupta
    Arabian Journal for Science and Engineering 47 (2), 2141-2157 , 2022
    2022
    Citations: 5
  • A new graph-based extractive text summarization using keywords or topic modeling
    RC Belwal, S Rai, A Gupta
    Journal of Ambient Intelligence and Humanized Computing 12 (10), 8975-8990 , 2021
    2021
    Citations: 86
  • Mind your tweet: Abusive tweet detection
    P Tiwari, S Rai
    International Conference on Speech and Computer, 704-715 , 2021
    2021
    Citations: 6
  • Text summarization using topic-based vector space model and semantic measure
    RC Belwal, S Rai, A Gupta
    Information Processing & Management 58 (3), 102536 , 2021
    2021
    Citations: 103
  • Development of web browser prototype with embedded classification capability for mitigating Cross-Site Scripting attacks
    VK Malviya, S Rai, A Gupta
    Applied Soft Computing 102, 106873 , 2021
    2021
    Citations: 26
  • Generation of pseudo code from the python source code using rule-based machine translation
    S Rai, A Gupta
    arXiv preprint arXiv:1906.06117 , 2019
    2019
    Citations: 8
  • Development of a plugin based extensible feature extraction framework
    V Malviya, S Rai, A Gupta
    Proceedings of the 33rd Annual ACM Symposium on Applied Computing, 1840-1847 , 2018
    2018
    Citations: 4
  • Method level text summarization for java code using nano-patterns
    S Rai, T Gaikwad, S Jain, A Gupta
    2017 24th Asia-Pacific Software Engineering Conference (APSEC), 199-208 , 2017
    2017
    Citations: 15

MOST CITED SCHOLAR PUBLICATIONS

  • Text summarization using topic-based vector space model and semantic measure
    RC Belwal, S Rai, A Gupta
    Information Processing & Management 58 (3), 102536 , 2021
    2021
    Citations: 103
  • A new graph-based extractive text summarization using keywords or topic modeling
    RC Belwal, S Rai, A Gupta
    Journal of Ambient Intelligence and Humanized Computing 12 (10), 8975-8990 , 2021
    2021
    Citations: 86
  • A review on source code documentation
    S Rai, RC Belwal, A Gupta
    ACM Transactions on Intelligent Systems and Technology (TIST) 13 (5), 1-44 , 2022
    2022
    Citations: 53
  • Extractive text summarization using clustering-based topic modeling
    RC Belwal, S Rai, A Gupta
    Soft Computing 27 (7), 3965-3982 , 2023
    2023
    Citations: 33
  • Development of web browser prototype with embedded classification capability for mitigating Cross-Site Scripting attacks
    VK Malviya, S Rai, A Gupta
    Applied Soft Computing 102, 106873 , 2021
    2021
    Citations: 26
  • Method level text summarization for java code using nano-patterns
    S Rai, T Gaikwad, S Jain, A Gupta
    2017 24th Asia-Pacific Software Engineering Conference (APSEC), 199-208 , 2017
    2017
    Citations: 15
  • Is the corpus ready for machine translation? A case study with Python to pseudo-code corpus
    S Rai, RC Belwal, A Gupta
    Arabian Journal for Science and Engineering 48 (2), 1845-1858 , 2023
    2023
    Citations: 9
  • Advanced hierarchical topic labeling for short text
    P Tiwari, A Tripathi, A Singh, S Rai
    IEEE access 11, 35158-35174 , 2023
    2023
    Citations: 8
  • Generation of pseudo code from the python source code using rule-based machine translation
    S Rai, A Gupta
    arXiv preprint arXiv:1906.06117 , 2019
    2019
    Citations: 8
  • Large scale annotated dataset for code-mix abusive short noisy text
    P Tiwari, S Rai, CR Chowdary
    Language Resources and Evaluation 59 (1), 191-218 , 2025
    2025
    Citations: 7
  • Mind your tweet: Abusive tweet detection
    P Tiwari, S Rai
    International Conference on Speech and Computer, 704-715 , 2021
    2021
    Citations: 6
  • A review of existing conversational recommendation systems
    S Zaidi, S Rai, K Juneja
    2024 2nd International Conference on Disruptive Technologies (ICDT), 22-26 , 2024
    2024
    Citations: 5
  • Effect of identifier tokenization on automatic source code documentation
    S Rai, RC Belwal, A Gupta
    Arabian Journal for Science and Engineering 47 (2), 2141-2157 , 2022
    2022
    Citations: 5
  • Development of a plugin based extensible feature extraction framework
    V Malviya, S Rai, A Gupta
    Proceedings of the 33rd Annual ACM Symposium on Applied Computing, 1840-1847 , 2018
    2018
    Citations: 4
  • Creating and Evaluating Code-Mixed Nepali-English and Telugu-English Datasets for Abusive Language Detection Using Traditional and Deep Learning Models
    M Pandey, NP Yadav, M Adduru, S Rai
    arXiv preprint arXiv:2504.21026 , 2025
    2025
    Citations: 3
  • Accurate module name prediction using similarity based and sequence generation models
    S Rai, RC Belwal, A Gupta
    Journal of Ambient Intelligence and Humanized Computing 14 (9), 11531-11543 , 2023
    2023
    Citations: 1
  • A mathematical model for the effect of vaccination on COVID-19 epidemic spread
    A Singh, S Rai, MK Bajpai
    Machine Vision and Augmented Intelligence: Select Proceedings of MAI 2022 … , 2023
    2023
    Citations: 1
  • Investigating the Application of Multi-lingual Transformer in Graph-Based Extractive Text Summarization for Hindi Text
    S Rai, RC Belwal, A Sharma
    International Conference on Data Management, Analytics & Innovation, 393-403 , 2023
    2023
    Citations: 1
  • Generating class name in sequential manner using convolution attention neural network
    S Rai, RC Belwal, A Gupta
    Expert Systems with Applications 199, 116854 , 2022
    2022
    Citations: 1
  • On the Impact of Chunking Strategies in NLP Pipelines: A Multi-Task Empirical Study
    S Rai, RC Belwal
    2025 OITS International Conference on Information Technology (OCIT), 1-6 , 2025
    2025