Sawan Rai

@srmap.edu.in

Assistant Professor CSE
SRM University-AP



                 

https://researchid.co/sawan777

RESEARCH, TEACHING, or OTHER INTERESTS

Software, Artificial Intelligence, Computer Science, Computer Science Applications

16

Scopus Publications

175

Scholar Citations

6

Scholar h-index

5

Scholar i10-index

Scopus Publications

  • Large scale annotated dataset for code-mix abusive short noisy text
    Paras Tiwari, Sawan Rai, and C. Ravindranath Chowdary

    Springer Science and Business Media LLC

  • Accurate module name prediction using similarity based and sequence generation models
    Sawan Rai, Ramesh Chandra Belwal, and Atul Gupta

    Springer Science and Business Media LLC

  • Extractive text summarization using clustering-based topic modeling
    Ramesh Chandra Belwal, Sawan Rai, and Atul Gupta

    Springer Science and Business Media LLC

  • Is the Corpus Ready for Machine Translation? A Case Study with Python to Pseudo-Code Corpus
    Sawan Rai, Ramesh Chandra Belwal, and Atul Gupta

    Springer Science and Business Media LLC


  • A Mathematical Model for the Effect of Vaccination on COVID-19 Epidemic Spread
    Avaneesh Singh, Sawan Rai, and Manish Kumar Bajpai

    Springer Nature Singapore

  • Advanced Hierarchical Topic Labeling for Short Text
    Paras Tiwari, Ashutosh Tripathi, Avaneesh Singh, and Sawan Rai

    Institute of Electrical and Electronics Engineers (IEEE)
    Hierarchical Topic Modeling is the probabilistic approach for discovering latent topics distributed hierarchically among the documents. The distributed topics are represented with the respective topic terms. An unambiguous conclusion from the topic term distribution is a challenge for readers. The hierarchical topic labeling eases the challenge by facilitating an individual, appropriate label for each topic at every level. In this work, we propose a BERT-embedding inspired methodology for labeling hierarchical topics in short text corpora. The short texts have gained significant popularity on multiple platforms in diverse domains. The limited information available in the short text makes it difficult to deal with. In our work, we have used three diverse short text datasets that include both structured and unstructured instances. Such diversity ensures the broad application scope of this work. Considering the relevancy factor of the labels, the proposed methodology has been compared against both automatic and human annotators. Our proposed methodology outperformed the benchmark with an average score of 0.4185, 49.50, and 49.16 for cosine similarity, exact match, and partial match, respectively.

  • Generating class name in sequential manner using convolution attention neural network
    Sawan Rai, Ramesh Chandra Belwal, and Atul Gupta

    Elsevier BV

  • A Review on Source Code Documentation
    Sawan Rai, Ramesh Chandra Belwal, and Atul Gupta

    Association for Computing Machinery (ACM)
    Context: Coding is an incremental activity where a developer may need to understand a code before making suitable changes in the code. Code documentation is considered one of the best practices in software development but requires significant efforts from developers. Recent advances in natural language processing and machine learning have provided enough motivation to devise automated approaches for source code documentation at multiple levels. Objective: The review aims to study current code documentation practices and analyze the existing literature to provide a perspective on their preparedness to address the stated problem and the challenges that lie ahead. Methodology: We provide a detailed account of the literature in the area of automated source code documentation at different levels and critically analyze the effectiveness of the proposed approaches. This also allows us to infer gaps and challenges to address the problem at different levels. Findings: (1) The research community focused on method-level summarization. (2) Deep learning has dominated the past five years of this research field. (3) Researchers are regularly proposing bigger corpora for source code documentation. (4) Java and Python are the widely used programming languages as corpus. (5) Bilingual Evaluation Understudy is the most favored evaluation metric for the research persons.

  • Effect of Identifier Tokenization on Automatic Source Code Documentation
    Sawan Rai, Ramesh Chandra Belwal, and Atul Gupta

    Springer Science and Business Media LLC

  • A new graph-based extractive text summarization using keywords or topic modeling
    Ramesh Chandra Belwal, Sawan Rai, and Atul Gupta

    Springer Science and Business Media LLC

  • Text summarization using topic-based vector space model and semantic measure
    Ramesh Chandra Belwal, Sawan Rai, and Atul Gupta

    Elsevier BV


  • Mind Your Tweet: Abusive Tweet Detection
    Paras Tiwari and Sawan Rai

    Springer International Publishing

  • Development of a plugin based extensible feature extraction framework
    Vikas Malviya, Sawan Rai, and Atul Gupta

    ACM
    An important ingredient for a successful recipe for solving machine learning problems is the availability of a suitable dataset. However, such a dataset may have to be extracted from a large unstructured and semi-structured data like programming code, scripts, and text. In this work, we propose a plug-in based, extensible feature extraction framework for which we have prototyped as a tool. The proposed framework is demonstrated by extracting features from two different sources of semi-structured and unstructured data. The semi-structured data comprised of web page and script based data whereas the other data was taken from email data for spam filtering. The usefulness of the tool was also assessed on the aspect of ease of programming.

  • Method Level Text Summarization for Java Code Using Nano-Patterns
    Sawan Rai, Tejaswini Gaikwad, Sparshi Jain, and Atul Gupta

    IEEE
    Rapid growth in providing automated solutions resulted in large code bases to get quickly developed and consumed. However, maintaining code and its subsequent reuse pose some challenges here. One of the best practices used to handle such issues is also to provide suitable text summary of the code to allow the human developers to comprehend the code easily, but this can be quite time-consuming and costly affair. A few efforts have been made in this direction where the text summary of the code either generated from the method signature or its body. In this paper, we propose a text summarization approach for Java code that makes use of identification of code level nano-patterns to obtain text summary. The approach also looks for associations between these nano-patterns in a Java method code and then use a template based text generation to obtain the final text summary of the Java method. We evaluated the summary generated by the proposed approach using a controlled experiment with other three existing approaches. Our results suggested that the summary generated by our approach was better on the part of completeness and correctness criteria. The feedback obtained during the experimental validation suggested additional inputs to improve the generated text summary on the other two accounts as well.

RECENT SCHOLAR PUBLICATIONS

  • Large scale annotated dataset for code-mix abusive short noisy text
    P Tiwari, S Rai, CR Chowdary
    Language Resources and Evaluation, 1-28 2024

  • Accurate module name prediction using similarity based and sequence generation models
    S Rai, RC Belwal, A Gupta
    Journal of Ambient Intelligence and Humanized Computing 14 (9), 11531-11543 2023

  • A Mathematical Model for the Effect of Vaccination on COVID-19 Epidemic Spread
    A Singh, S Rai, MK Bajpai
    Machine Vision and Augmented Intelligence: Select Proceedings of MAI 2022 2023

  • Advanced hierarchical topic labeling for short text
    P Tiwari, A Tripathi, A Singh, S Rai
    IEEE Access 2023

  • Extractive text summarization using clustering-based topic modeling
    RC Belwal, S Rai, A Gupta
    Soft Computing 27 (7), 3965-3982 2023

  • Is the corpus ready for machine translation? A case study with Python to pseudo-code corpus
    S Rai, RC Belwal, A Gupta
    Arabian Journal for Science and Engineering 48 (2), 1845-1858 2023

  • Investigating the Application of Multi-lingual Transformer in Graph-Based Extractive Text Summarization for Hindi Text
    S Rai, RC Belwal, A Sharma
    International Conference on Data Management, Analytics & Innovation, 393-403 2023

  • Generating class name in sequential manner using convolution attention neural network
    S Rai, RC Belwal, A Gupta
    Expert Systems with Applications 199, 116854 2022

  • A review on source code documentation
    S Rai, RC Belwal, A Gupta
    ACM Transactions on Intelligent Systems and Technology (TIST) 13 (5), 1-44 2022

  • Effect of identifier tokenization on automatic source code documentation
    S Rai, RC Belwal, A Gupta
    Arabian Journal for Science and Engineering 47 (2), 2141-2157 2022

  • A new graph-based extractive text summarization using keywords or topic modeling
    RC Belwal, S Rai, A Gupta
    Journal of Ambient Intelligence and Humanized Computing 12 (10), 8975-8990 2021

  • Mind your tweet: Abusive tweet detection
    P Tiwari, S Rai
    International Conference on Speech and Computer, 704-715 2021

  • Text summarization using topic-based vector space model and semantic measure
    RC Belwal, S Rai, A Gupta
    Information Processing & Management 58 (3), 102536 2021

  • Development of web browser prototype with embedded classification capability for mitigating Cross-Site Scripting attacks
    VK Malviya, S Rai, A Gupta
    Applied Soft Computing 102, 106873 2021

  • Generation of pseudo code from the python source code using rule-based machine translation
    S Rai, A Gupta
    arXiv preprint arXiv:1906.06117 2019

  • Development of a plugin based extensible feature extraction framework
    V Malviya, S Rai, A Gupta
    Proceedings of the 33rd Annual ACM Symposium on Applied Computing, 1840-1847 2018

  • Method level text summarization for java code using nano-patterns
    S Rai, T Gaikwad, S Jain, A Gupta
    2017 24th Asia-Pacific Software Engineering Conference (APSEC), 199-208 2017

  • Method level text summarization for java code using nano-patterns. In 2017 24th Asia-Pacific Software Engineering Conference (APSEC)
    S Rai, T Gaikwad, S Jain, A Gupta
    IEEE, 199ś208 2017

MOST CITED SCHOLAR PUBLICATIONS

  • Text summarization using topic-based vector space model and semantic measure
    RC Belwal, S Rai, A Gupta
    Information Processing & Management 58 (3), 102536 2021
    Citations: 58

  • A new graph-based extractive text summarization using keywords or topic modeling
    RC Belwal, S Rai, A Gupta
    Journal of Ambient Intelligence and Humanized Computing 12 (10), 8975-8990 2021
    Citations: 41

  • Development of web browser prototype with embedded classification capability for mitigating Cross-Site Scripting attacks
    VK Malviya, S Rai, A Gupta
    Applied Soft Computing 102, 106873 2021
    Citations: 16

  • A review on source code documentation
    S Rai, RC Belwal, A Gupta
    ACM Transactions on Intelligent Systems and Technology (TIST) 13 (5), 1-44 2022
    Citations: 13

  • Method level text summarization for java code using nano-patterns
    S Rai, T Gaikwad, S Jain, A Gupta
    2017 24th Asia-Pacific Software Engineering Conference (APSEC), 199-208 2017
    Citations: 11

  • Generation of pseudo code from the python source code using rule-based machine translation
    S Rai, A Gupta
    arXiv preprint arXiv:1906.06117 2019
    Citations: 7

  • Extractive text summarization using clustering-based topic modeling
    RC Belwal, S Rai, A Gupta
    Soft Computing 27 (7), 3965-3982 2023
    Citations: 6

  • Method level text summarization for java code using nano-patterns. In 2017 24th Asia-Pacific Software Engineering Conference (APSEC)
    S Rai, T Gaikwad, S Jain, A Gupta
    IEEE, 199ś208 2017
    Citations: 5

  • Is the corpus ready for machine translation? A case study with Python to pseudo-code corpus
    S Rai, RC Belwal, A Gupta
    Arabian Journal for Science and Engineering 48 (2), 1845-1858 2023
    Citations: 4

  • Advanced hierarchical topic labeling for short text
    P Tiwari, A Tripathi, A Singh, S Rai
    IEEE Access 2023
    Citations: 3

  • Mind your tweet: Abusive tweet detection
    P Tiwari, S Rai
    International Conference on Speech and Computer, 704-715 2021
    Citations: 3

  • Development of a plugin based extensible feature extraction framework
    V Malviya, S Rai, A Gupta
    Proceedings of the 33rd Annual ACM Symposium on Applied Computing, 1840-1847 2018
    Citations: 3

  • Effect of identifier tokenization on automatic source code documentation
    S Rai, RC Belwal, A Gupta
    Arabian Journal for Science and Engineering 47 (2), 2141-2157 2022
    Citations: 2

  • Accurate module name prediction using similarity based and sequence generation models
    S Rai, RC Belwal, A Gupta
    Journal of Ambient Intelligence and Humanized Computing 14 (9), 11531-11543 2023
    Citations: 1

  • Investigating the Application of Multi-lingual Transformer in Graph-Based Extractive Text Summarization for Hindi Text
    S Rai, RC Belwal, A Sharma
    International Conference on Data Management, Analytics & Innovation, 393-403 2023
    Citations: 1

  • Generating class name in sequential manner using convolution attention neural network
    S Rai, RC Belwal, A Gupta
    Expert Systems with Applications 199, 116854 2022
    Citations: 1