Pooja Rani

@uzh.ch

Postdoctoral Researcher and Department of Informatics (IFI)
University of Zurich



                          

https://researchid.co/poojaruhal

I am a postdoctoral researcher in the Zurich Empirical Software engineering Team (Prof. Alberto Bacchelli) since Sep, 2022. Previously, I was a postdoctoral researcher in the Software Engineering Group (Prof. Timo Kehrer). I have done my PhD in the Software composition group at University of Bern under the supervision of Prof. Oscar Nierstrasz and Dr. Sebastiano Panichella. I work on mainly supporting developers in comprehending source code via comments and their maintenance and evolution aspect. My research area lie in the general area of Software Engineering (SE), Natural Language Processing (NLP), Program Comprehension, Object-Oriented Programming, and Machine learning applied to SE. I have co-reviewed papers of various international conference (e.g., ICSME, SANER, SATTOSE) and journal (TOSEM, IST).

15

Scopus Publications

135

Scholar Citations

6

Scholar h-index

6

Scholar i10-index

Scopus Publications

  • Variable Discovery with Large Language Models for Metamorphic Testing of Scientific Software
    Christos Tsigkanos, Pooja Rani, Sebastian Müller, and Timo Kehrer

    Springer Nature Switzerland

  • Large Language Models: The Next Frontier for Variable Discovery within Metamorphic Testing?
    Christos Tsigkanos, Pooja Rani, Sebastian Müller, and Timo Kehrer

    IEEE
    Metamorphic testing involves reasoning on necessary properties that a program under test should exhibit regarding multiple input and output variables. A general approach consists of extracting metamorphic relations from auxiliary artifacts such as user manuals or documentation, a strategy particularly fitting to testing scientific software. However, such software typically has large input-output spaces, and the fundamental prerequisite – extracting variables of interest – is an arduous and non-scalable process when performed manually. To this end, we devise a workflow around an autoregressive transformer-based Large Language Model (LLM) towards the extraction of variables from user manuals of scientific software. Our end-to-end approach, besides a prompt specification consisting of few-shot examples by a human user, is fully automated, in contrast to current practice requiring human intervention. We showcase our LLM workflow over a real case, and compare variables extracted to ground truth manually labelled by experts. Our preliminary results show that our LLM-based workflow achieves an accuracy of 0.87, while successfully deriving 61.8% of variables as partial matches and 34.7% as exact matches.

  • The NLBSE'23 Tool Competition
    Rafael Kallis, Maliheh Izadi, Luca Pascarella, Oscar Chaparro, and Pooja Rani

    IEEE
    We report on the organization and results of the second edition of the tool competition from the International Workshop on Natural Language-based Software Engineering (NLBSE'23). As in the prior edition, we organized the competition on automated issue report classification, with a larger dataset. This year, we featured an extra competition on au-tomated code comment classification. In this tool competition edition, five teams submitted multiple classification models to automatically classify issue reports and code comments. The submitted models were fine-tuned and evaluated on a benchmark dataset of 1.4 million issue reports or 6.7 thousand code comments, respectively. The goal of the competition was to improve the classification performance of the baseline models that we provided. This paper reports details of the competition, including the rules, the teams and contestant models, and the ranking of models based on their average classification performance across issue report and code comment types.

  • A decade of code comment quality assessment: A systematic literature review
    Pooja Rani, Arianna Blasi, Nataliia Stulova, Sebastiano Panichella, Alessandra Gorla, and Oscar Nierstrasz

    Elsevier BV

  • Can We Automatically Generate Class Comments in Pharo?


  • How to identify class comment types? A multi-language approach for class comment classification
    Pooja Rani, Sebastiano Panichella, Manuel Leuenberger, Andrea Di Sorbo, and Oscar Nierstrasz

    Elsevier BV

  • What do class comments tell us? An investigation of comment evolution and practices in Pharo Smalltalk
    Pooja Rani, Sebastiano Panichella, Manuel Leuenberger, Mohammad Ghafari, and Oscar Nierstrasz

    Springer Science and Business Media LLC
    Abstract Context Previous studies have characterized code comments in various programming languages, showing how high quality of code comments is crucial to support program comprehension activities, and to improve the effectiveness of maintenance tasks. However, very few studies have focused on understanding developer practices to write comments. None of them has compared such developer practices to the standard comment guidelines to study the extent to which developers follow the guidelines. Objective Therefore, our goal is to investigate developer commenting practices and compare them to the comment guidelines. Method This paper reports the first empirical study investigating commenting practices in Pharo Smalltalk. First, we analyze class comment evolution over seven Pharo versions. Then, we quantitatively and qualitatively investigate the information types embedded in class comments. Finally, we study the adherence of developer commenting practices to the official class comment template over Pharo versions. Results Our results show that there is a rapid increase in class comments in the initial three Pharo versions, while in subsequent versions developers added comments to both new and old classes, thus maintaining a similar code to comment ratio. We furthermore found three times as many information types in class comments as those suggested by the template. However, the information types suggested by the template tend to be present more often than other types of information. Additionally, we find that a substantial proportion of comments follow the writing style of the template in writing these information types, but they are written and formatted in a non-uniform way. Conclusion The results suggest the need to standardize the commenting guidelines for formatting the text, and to provide headers for the different information types to ensure a consistent style and to identify the information easily. Given the importance of high-quality code comments, we draw numerous implications for developers and researchers to improve the support for comment quality assessment tools.

  • Entropy based enhanced particle swarm optimization on multi-objective software reliability modelling for optimal testing resources allocation
    Pooja Rani and G. S. Mahapatra

    Wiley
    SummaryThis paper proposes a generalization of the exponential software reliability model to characterize several factors including fault introduction and time‐varying fault detection rate. The software life cycle is designed based on module structure such as testing effort spent during module testing and detected software faults etc. The resource allocation problem is a critical phase in the testing stage of software reliability modelling. It is required to make decisions for optimal resource allocation among the modules to achieve the desired level of reliability. We formulate a multi‐objective software reliability model of testing resources for a new generalized exponential reliability function to characterizes dynamic allocation of total expected cost and testing effort. An enhanced particle swarm optimization (EPSO) is proposed to maximize software reliability and minimize allocation cost. We perform experiments with randomly generated testing‐resource sets and varying the performance using the entropy function. The multi‐objective model is compared with modules according to weighted cost function and testing effort measures in a typical modular testing environment.

  • Speculative Analysis for Quality Assessment of Code Comments
    Pooja Rani

    IEEE
    Previous studies have shown that high-quality code comments assist developers in program comprehension and maintenance tasks. However, the semi-structured nature of comments, unclear conventions for writing good comments, and the lack of quality assessment tools for all aspects of comments make their evaluation and maintenance a non-trivial problem. To achieve high-quality comments, we need a deeper understanding of code comment characteristics and the practices developers follow. In this thesis, we approach the problem of assessing comment quality from three different perspectives: what developers ask about commenting practices, what they write in comments, and how researchers support them in assessing comment quality. Our preliminary findings show that developers embed various kinds of information in class comments across programming languages. Still, they face problems in locating relevant guidelines to write consistent and informative comments, verifying the adherence of their comments to the guidelines, and evaluating the overall state of comment quality. To help developers and researchers in building comment quality assessment tools, we provide: (i) an empirically validated taxonomy of comment convention-related questions from various community forums, (ii) an empirically validated taxonomy of comment information types from various programming languages, (iii) a language-independent approach to automatically identify the information types, and (iv) a comment quality taxonomy prepared from a systematic literature review.

  • Makar: A Framework for Multi-source Studies based on Unstructured Data
    Mathias Birrer, Pooja Rani, Sebastiano Panichella, and Oscar Nierstrasz

    IEEE
    To perform various development and maintenance tasks, developers frequently seek information on various sources such as mailing lists, Stack Overflow (SO), and Quora. Researchers analyze these sources to understand developer information needs in these tasks. However, extracting and preprocessing unstructured data from various sources, building and maintaining a reusable dataset is often a time-consuming and iterative process. Additionally, the lack of tools for automating this data analysis process complicates the task to reproduce previous results or datasets.To address these concerns we propose Makar, which provides various data extraction and preprocessing methods to support researchers in conducting reproducible multi-source studies. To evaluate Makar, we conduct a case study that analyzes code comment related discussions from SO, Quora, and mailing lists. Our results show that Makar is helpful for preparing reproducible datasets from multiple sources with little effort, and for identifying the relevant data to answer specific research questions in a shorter time compared to state-of-the-art tools, which is of critical importance for studies based on unstructured data. Tool webpage: https://github.com/maethub/makar

  • Do Comments follow Commenting Conventions? A Case Study in Java and Python
    Pooja Rani, Suada Abukar, Nataliia Stulova, Alexandre Bergel, and Oscar Nierstrasz

    IEEE

  • What Do Developers Discuss about Code Comments?
    Pooja Rani, Mathias Birrer, Sebastiano Panichella, Mohammad Ghafari, and Oscar Nierstrasz

    IEEE

  • A neuro-particle swarm optimization logistic model fitting algorithm for software reliability analysis
    Pooja Rani and GS Mahapatra

    SAGE Publications
    This article develops a particle swarm optimization algorithm based on a feed-forward neural network architecture to fit software reliability growth models. We employ adaptive inertia weight within the proposed particle swarm optimization in consideration of learning algorithm. The dynamic adaptive nature of proposed prior best particle swarm optimization prevents the algorithm from becoming trapped in local optima. These neuro-prior best particle swarm optimization algorithms were applied to a popular flexible logistic growth curve as the [Formula: see text] model based on the weights derived by the artificial neural network learning algorithm. We propose the prior best particle swarm optimization algorithm to train the network for application to three different software failure data sets. The new search strategy improves the rate of convergence because it retains information on the prior particle, thereby enabling better predictions. The results are verified through testing approaching of constant, modified, and linear inertia weight. We assess the fitness of each particle according to the normalized root mean squared error which updates the best particle and velocity to accelerate convergence to an optimal solution. Experimental results demonstrate that the proposed [Formula: see text] model based prior best Particle Swarm Optimization based on Neural Network (pPSONN) improves predictive quality over the [Formula: see text], [Formula: see text], and existing model.

  • A single change point hazard rate software reliability model with imperfect debugging
    Pooja Rani and G.S. Mahapatra

    IEEE
    Software reliability models with imperfect debugging can characterize the quality of software fault removal, in addition to the rate of faults discovered. Change points are therefore useful to quantity such changes in the fault discovery rate. To address these issues, this paper develops a hazard rate model based on the widely studied Jelinski-Moranda model, introducing two additional parameters, namely an imperfect debugging parameter and a single change point parameters. We compare the proposed model with simpler models to show these additional parameters are justifiable. Both information theoretic and predictive measures of goodness of fit are applied to demonstrate these additional parameters are appropriate on some data sets.


RECENT SCHOLAR PUBLICATIONS

  • How does Simulation-based Testing for Self-driving Cars match Human Perception?
    C Birchler, TK Mohammed, P Rani, T Nechita, T Kehrer, S Panichella
    arXiv preprint arXiv:2401.14736 2024

  • Energy Patterns for Web: An Exploratory Study
    P Rani, J Zellweger, V Kousadianos, L Cruz, T Kehrer, A Bacchelli
    arXiv preprint arXiv:2401.06482 2024

  • Beyond Code: Is There a Difference between Comments in Visual and Textual Languages?
    A Boll, P Rani, A Schulthei, T Kehrer
    Available at SSRN 4650661 2024

  • Variable Discovery with Large Language Models for Metamorphic Testing of Scientific Software
    C Tsigkanos, P Rani, S Mller, T Kehrer
    International Conference on Computational Science, 321-335 2023

  • Large Language Models: The Next Frontier for Variable Discovery within Metamorphic Testing?
    C Tsigkanos, P Rani, S Mller, T Kehrer
    2023 IEEE International Conference on Software Analysis, Evolution and 2023

  • LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain
    J Niklaus, V Matoshi, P Rani, A Galassi, M Strmer, I Chalkidis
    arXiv preprint arXiv:2301.13126 2023

  • A decade of code comment quality assessment: A systematic literature review
    P Rani, A Blasi, N Stulova, S Panichella, A Gorla, O Nierstrasz
    Journal of Systems and Software 195, 111515 2023

  • Assessing Comment Quality in Object-Oriented Languages
    PR Pooja Rani, O Nierstrasz
    Institute of Informatik 2022

  • Can We Automatically Generate Class Comments in Pharo?
    PR Pooja Rani, AH Bergel, L Hess, TB Kehrer, O Nierstrasz
    2022

  • What do class comments tell us? An investigation of comment evolution and practices in Pharo Smalltalk
    P Rani, S Panichella, M Leuenberger, M Ghafari, O Nierstrasz
    Empirical software engineering 26 (6), 112 2021

  • How to identify class comment types? A multi-language approach for class comment classification
    P Rani, S Panichella, M Leuenberger, A Di Sorbo, O Nierstrasz
    Journal of Systems and Software 181, 111047 2021

  • Do Comments follow Commenting Conventions? A Case Study in Java and Python
    P Rani, S Abukar, N Stulova, A Bergel, O Nierstrasz
    2021 IEEE 21st International Working Conference on Source Code Analysis and 2021

  • What do developers discuss about code comments?
    P Rani, M Birrer, S Panichella, M Ghafari, O Nierstrasz
    2021 IEEE 21st International Working Conference on Source Code Analysis and 2021

  • Adherence of class comments to style guidelines.
    PR S Abukar, O Nierstrasz
    U Bern, Bachelor thesis 2021

  • Tool Support for Commenting Conventions
    NS M Dooley, P Rani, O Nierstrasz
    U Bern, Bachelor thesis 2021

  • Generating automatically class comments in Pharo
    PR L Hess, O Nierstrasz
    U Bern, Bachelor thesis 2021

  • Speculative Analysis for Quality Assessment of Code Comments
    P Rani
    2021 IEEE/ACM 43rd International Conference on Software Engineering 2021

  • Makar: A Framework for Multi-source Studies based on Unstructured Data
    M Birrer, P Rani, S Panichella, O Nierstrasz
    2021 IEEE International Conference on Software Analysis, Evolution and 2021

  • Analysis of Developer Information Needs on Collaborative Platforms
    M Birrer, O Nierstrasz, P Rani
    Masters thesis, University of Bern 2020

  • Software Developers’ Information Needs
    J Richner, P Rani, O Nierstrasz
    2019

MOST CITED SCHOLAR PUBLICATIONS

  • How to identify class comment types? A multi-language approach for class comment classification
    P Rani, S Panichella, M Leuenberger, A Di Sorbo, O Nierstrasz
    Journal of Systems and Software 181, 111047 2021
    Citations: 36

  • LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain
    J Niklaus, V Matoshi, P Rani, A Galassi, M Strmer, I Chalkidis
    arXiv preprint arXiv:2301.13126 2023
    Citations: 23

  • What do class comments tell us? An investigation of comment evolution and practices in Pharo Smalltalk
    P Rani, S Panichella, M Leuenberger, M Ghafari, O Nierstrasz
    Empirical software engineering 26 (6), 112 2021
    Citations: 17

  • A decade of code comment quality assessment: A systematic literature review
    P Rani, A Blasi, N Stulova, S Panichella, A Gorla, O Nierstrasz
    Journal of Systems and Software 195, 111515 2023
    Citations: 13

  • What do developers discuss about code comments?
    P Rani, M Birrer, S Panichella, M Ghafari, O Nierstrasz
    2021 IEEE 21st International Working Conference on Source Code Analysis and 2021
    Citations: 13

  • Do Comments follow Commenting Conventions? A Case Study in Java and Python
    P Rani, S Abukar, N Stulova, A Bergel, O Nierstrasz
    2021 IEEE 21st International Working Conference on Source Code Analysis and 2021
    Citations: 12

  • Large Language Models: The Next Frontier for Variable Discovery within Metamorphic Testing?
    C Tsigkanos, P Rani, S Mller, T Kehrer
    2023 IEEE International Conference on Software Analysis, Evolution and 2023
    Citations: 6

  • Variable Discovery with Large Language Models for Metamorphic Testing of Scientific Software
    C Tsigkanos, P Rani, S Mller, T Kehrer
    International Conference on Computational Science, 321-335 2023
    Citations: 5

  • Speculative Analysis for Quality Assessment of Code Comments
    P Rani
    2021 IEEE/ACM 43rd International Conference on Software Engineering 2021
    Citations: 3

  • Generating automatically class comments in Pharo
    PR L Hess, O Nierstrasz
    U Bern, Bachelor thesis 2021
    Citations: 2

  • Makar: A Framework for Multi-source Studies based on Unstructured Data
    M Birrer, P Rani, S Panichella, O Nierstrasz
    2021 IEEE International Conference on Software Analysis, Evolution and 2021
    Citations: 2

  • Analysis of Developer Information Needs on Collaborative Platforms
    M Birrer, O Nierstrasz, P Rani
    Masters thesis, University of Bern 2020
    Citations: 2

  • Software Developers’ Information Needs
    J Richner, P Rani, O Nierstrasz
    2019
    Citations: 1