Dipti Rana

@svnit.ac.in/facup

Assistant Professor
Sardar Vallabhbhai National Institute of Technology



                          

https://researchid.co/dprsvnit

EDUCATION

Ph. D.

RESEARCH INTERESTS

Data Mining, Big Data Analytics, DBMS, Computer Organization

59

Scopus Publications

474

Scholar Citations

11

Scholar h-index

14

Scholar i10-index

Scopus Publications

  • HARUIM: high average recent utility itemset mining
    Mathe John Kenny Kumar and Dipti Rana

    Inderscience Publishers


  • RSPHUIM: Recent Short Period High Utility Itemset Mining
    Mathe John Kenny Kumar and Dipti Rana

    Springer Science and Business Media LLC


  • DSPWE: distributed sentiment polarized word embedding for voluminous textual data
    Jenish Dhanani, Rupa Mehta, and Dipti P. Rana

    Springer Science and Business Media LLC

  • An Improved Fake News Detection Model by Applying a Recursive Feature Elimination Approach for Credibility Assessment and Uncertainty
    I. Y. Agarwal and D. P. Rana

    World Scientific Pub Co Pte Ltd
    World Health Organization (W.H.O) has coined the word “Infodemic” to refer to the dissemination of fake news during this pandemic, which is considered to be as harmful as the virus itself. Verifying the information available on the internet is a prerequisite to ensuring the ecosystem is maintained which is the driving force behind this work. The primary goal of this study is to address the problem of time-consuming automatically voluminous fake news detection of certain data and consider the uncertainty of data from causal relations using a rich feature set. This research produces significant feature reduction for reduced time and improved accuracy by filtering out significant features using recursive feature selection (RFE). The retained features by the RFE algorithm are also compared to a standard statistical measure of Pearson’s correlation to ensure no information loss while reducing features. The suggested methodology has also defined appropriate class output assurance levels and accurate prediction ambiguity for the fake identification jobs. Comparative analysis for existing methods for feature selection is performed. The result of experimentation testifies the improvement of a 6% increase in precision and a 97% reduction in execution time.

  • Analysis of contextual features’ granularity for fake news detection
    Isha Agarwal, Dipti Rana, Kalp Panwala, Raj Shah, and Viren Kathiriya

    Springer Science and Business Media LLC

  • Research Challenges for Legal Document Summarization
    Nikita, Dipti P. Rana, and Rupa G. Mehta

    IEEE
    Legal judgment documents are detailed and contains legal terms and codes. These characteristics of legal documents makes it complex to read and analyze, which makes processing legal documents a challenging task. This raises a need for generating automatic summaries. Several techniques have been used by researchers to summarize legal documents such as traditional methods, legal specific approaches and transformer models based approaches. This research focuses on role of summarization in legal domain and various methods for summary generation. We summarize Indian judgment documents with various state of art methods for comparative study. The analysis opened various research challenges.


  • An Exploratory Study of Scholarly Platforms and Features to Help Emerging Scholars Gain Visibility in the Scholars’ Community
    Mitali Desai, Rupa Mehta, and Dipti Rana

    EManuscript Technologies
    Among available scholarly features on digitized scholarly platforms, certain features have high significance in assessing scholar's influence. If these features are identified, using them legitimately, emerging scholars can increase their influence and gain visibility in the scholars’ community. The purpose of this research is to identify and rank significant features on scholarly platforms. To select a data source, a comparative analysis of well-known scholarly platforms is performed. Based on the analysis, ResearchGate (RG) is selected. For RG, this research proposes a methodology to identify and rank significant scholarly features. The results demonstrate that for the rendered RG data, identified significant features in the order of their significance are number of citations, research items, followers, reads, recommendations, followings and projects. Significant features discovered in this research can be employed by various scholarly platforms to identify influential scholars. These scholars can be utilized in applications such as expert finding, influence ranking, recommendation systems, interdisciplinary collaborations etc. Moreover, the identified significant features will help scholars in focusing on certain aspects (features) to increase their influence legitimately.

  • Feature optimization in CNN using MROA for disease classification
    Pranita Mahajan and Dipti Rana

    IOS Press
    Electronic Medical Records (EMR) carry important information about a patient’s journey. The past decade shows substantial use of Natural Language Processing (NLP)-based Information Retrieval (IR) techniques to extract insights such as symptoms, diseases, and tests from these unstructured records. The state-of-the-art shows that convolutional neural networks (CNN) make a significant contribution to the disease classification task.A significant improvement in precise knowledge mining is possible with precise feature extraction. Feature selection addresses undesirable, unneeded, or irrelevant features. This article proposes a Modified Rider Optimization Algorithm (MROA) to choose important features by selecting optimal weights from a pool of randomly generated weights based on high accuracy and less training time in the CNN algorithm. A modified approach is trained on 114 N2C2 patients’ records to extract symptoms, disease, and tests are performed on them to perform disease classification tasks. The proposed approach is found to be accurate, with 97.77% accuracy in the disease classification and treatment prediction task from EMR.

  • GeoAI-Based Covid-19 Prediction Model
    Jyoti Kumari and Dipti P. Rana

    Springer Nature Singapore

  • HAUOPM: High Average Utility Occupancy Pattern Mining
    Mathe John Kenny Kumar and Dipti Rana

    Springer Science and Business Media LLC

  • FNH—A Data Repository for Studying Fake News in Healthcare Domain
    Isha Agarwal, Dipti Rana, Ch Surya Teja, and Nunna Naga Surya Sai Daivik

    Springer Nature Singapore


  • Contextual analysis of scholarly communications to identify the source of disinformation on digital scholarly platforms
    Mitali Desai, Rupa G. Mehta, and Dipti P. Rana

    Emerald
    PurposeScholarly communications, particularly, questions and answers (Q&A) present on digital scholarly platforms provide a new avenue to gain knowledge. However, several studies have raised a concern about the content anomalies in these Q&A and suggested a proper validation before utilizing them in scholarly applications such as influence analysis and content-based recommendation systems. The content anomalies are referred as disinformation in this research. The purpose of this research is firstly, to assess scholarly communications in order to identify disinformation and secondly, to help scholarly platforms determine the scholars who probably disseminate such disinformation. These scholars are referred as the probable sources of disinformation.Design/methodology/approachTo identify disinformation, the proposed model deduces (1) content redundancy and contextual redundancy in questions (2) contextual nonrelevance in answers with respect to the questions and (3) quality of answers with respect to the expertise of the answering scholars. Then, the model determines the probable sources of disinformation using the statistical analysis.FindingsThe model is evaluated on ResearchGate (RG) data. Results suggest that the model efficiently identifies disinformation from scholarly communications and accurately detects the probable sources of disinformation.Practical implicationsDifferent platforms with communication portals can use this model as a regulatory mechanism to restrict the prorogation of disinformation. Scholarly platforms can use this model to generate an accurate influence assessment mechanism and also relevant recommendations for their scholars.Originality/valueThe existing studies majorly deal with validating the answers using statistical measures. The proposed model focuses on questions as well as answers and performs a contextual analysis using an advanced word embedding technique.

  • Spatio-temporal approach for classification of COVID-19 pandemic fake news
    I. Y. Agarwal, D. P. Rana, M. Shaikh, and S. Poudel

    Springer Science and Business Media LLC

  • Effective and scalable legal judgment recommendation using pre-learned word embedding
    Jenish Dhanani, Rupa Mehta, and Dipti Rana

    Springer Science and Business Media LLC
    AbstractLegal professionals strongly demand an automatic and convenient legal document recommendation system (LDRS) to identify similar judgments for preparing the advantageous and strategic arguments in the Court. Doc2Vec excellently learns semantically rich embedding (i.e., vector) space from the textual information of judgment corpus. During Doc2Vec learning, the practice of prior domain-specific knowledge can potentially enhance the embedding representation. This research thus proposes a pre-learned word embedding based LDRS (P-LDRS) that learns the Doc2Vec embedding using Legal domain-specific pre-learned word embedding possessing the Legal semantic knowledge. However, learning the judgment embedding from existing substantial Legal documents turns out to be a scalability issue for Doc2Vec. The proposed P-LDRS also provides additional functionality to learn the judgment embedding distributedly over the cluster of computing nodes using frameworks like MapReduce and Spark to address the scalability issue. The empirical analysis is performed with a non-distributed and a distributed variant of the proposed P-LDRS to validate the effectiveness and scalability. Experiment results showcase that proposed non-distributed P-LDRS perform significantly better than traditional Doc2Vec based LDRS with an Accuracy of 0.88, F1-Score of 0.82 and MCC Score of 0.73. They also demonstrate that the proposed distributed P-LDRS improves the time efficiency and achieves stable Accuracy of $$\\approx $$ ≈ 0.88, F1-Score of $$\\approx $$ ≈ 0.83 and MCC Score of $$\\approx $$ ≈ 0.72, with an increasing number of nodes.



  • An Extensive Review of Deep Learning Driven Remote Sensing Image Classification Models
    Vaishnavee V. Rathod, Dipti P. Rana, and Rupa G. Mehta

    IEEE
    Remote sensing images (RSI) are significant data to examine and observe complete structure on the Earth’s surface. RSI classification has gained significant attention in earth observation technologies, commonly employed in military and civil fields. It becomes a challenging process because of high dimensional features and small amount of labeled data. Advancements in machine learning (ML) and deep learning (DL) models are capable in effective RSI classification. Numerous research is going on in RSI detection and classification area using ML and DL models. In this view, this article focuses on the review of recently developed RSI classification models. A brief introduction to RSI, types, characteristics and challenging issues is given. By a meta-analysis, different approaches related to the RSI classification models are identified and summarized with key findings. Besides, this survey covers the recently developed ML and DL based RSI classification models with their major aim, methodology used, merits, and demerits. At last, a concluding remark related to the present state of art approaches with possible future scope is discussed.

  • Novel Oversampling Algorithm for Handling Imbalanced Data Classification Novel Oversampling Algorithm
    Anjali S. More and Dipti P. Rana

    The Science and Information Organization
    —In the current age, the attention of researchers is immersed by numerous imbalanced data applications. These application areas are intrusion detection in security, fraud recognition in finance, medical applications dealing with disease diagnosis pilfering in electricity, and many more. Imbalanced data applications are categorized into two types: binary and multiclass data imbalance. Unequal data distribution among data diverts classification performance metrics towards the majority data instance class and ignores the minority data, instance class. Data imbalance leads to an increase in the classification error rate. Random Forest Classification (RFC) is best suitable technique to deal with imbalanced datasets. This paper proposes the novel oversampling rate calculation algorithm as Improvised Dynamic Binary-Multiclass Imbalanced Oversampling Rate (IDBMORate). Experimentation analysis of the proposed novel approach IDBMORate on Page-block (Binary) dataset shows that instances of positive class is increased from 559 to 1118 whereas negative instance class remains same as 4913. In case of referred multiclass dataset (Ecoli), IDBMORate produces the consistent result as minority classes (om, omL, imS, imL) instances are oversampled majority class instances remains unchanged. IDBMORate algorithm reduces the ignorance of minority class and oversamples its data without disturbing the size of the majority instance class. Thus, it reduces the overall computation cost and leads towards the improvisation of classification performance.

  • AI federated learning based improvised random Forest classifier with error reduction mechanism for skewed data sets
    Anjali More and Dipti Rana

    Emerald
    Purpose Referred data set produces reliable information about the network flows and common attacks meeting with real-world criteria. Accordingly, this study aims to focus on the use of imbalanced intrusion detection benchmark knowledge discovery in database (KDD) data set. KDD data set is most preferably used by many researchers for experimentation and analysis. The proposed algorithm improvised random forest classification with error tuning factors (IRFCETF) deals with experimentation on KDD data set and evaluates the performance of a complete set of network traffic features through IRFCETF. Design/methodology/approach In the current era of applications, the attention of researchers is immersed by a diverse number of existing time applications that deals with imbalanced data classification (ImDC). Real-time application areas, artificial intelligence (AI), Industrial Internet of Things (IIoT), etc. are dealing ImDC undergo with diverted classification performance due to skewed data distribution (SkDD). There are numerous application areas that deal with SkDD. Many of the data applications in AI and IIoT face the diverted data classification rate in SkDD. In recent advancements, there is an exponential expansion in the volume of computer network data and related application developments. Intrusion detection is one of the demanding applications of ImDC. The proposed study focusses on imbalanced intrusion benchmark data set, KDD data set and other benchmark data set with the proposed IRFCETF approach. IRFCETF justifies the enriched classification performance on imbalanced data set over the existing approach. The purpose of this work is to review imbalanced data applications in numerous application areas including AI and IIoT and tuning the performance with respect to principal component analysis. This study also focusses on the out-of-bag error performance-tuning factor. Findings Experimental results on KDD data set shows that proposed algorithm gives enriched performance. For referred intrusion detection data set, IRFCETF classification accuracy is 99.57% and error rate is 0.43%. Research limitations/implications This research work extended for further improvements in classification techniques with multiple correspondence analysis (MCA); hierarchical MCA can be focussed with the use of classification models for wide range of skewed data sets. Practical implications The metrics enhancement is measurable and helpful in dealing with intrusion detection systems–related imbalanced applications in current application domains such as security, AI and IIoT digitization. Analytical results show improvised metrics of the proposed approach than other traditional machine learning algorithms. Thus, error-tuning parameter creates a measurable impact on classification accuracy is justified with the proposed IRFCETF. Social implications Proposed algorithm is useful in numerous IIoT applications such as health care, machinery automation etc. Originality/value This research work addressed classification metric enhancement approach IRFCETF. The proposed method yields a test set categorization for each case with error reduction mechanism.

  • Anatomising the impact of ResearchGate followers and followings on influence identification
    Mitali Desai, Rupa G Mehta, and Dipti P Rana

    SAGE Publications
    Influence analysis, derived from Social Network Analysis (SNA), is extremely useful in academic literature analytic. Different Academic Social Network Sites (ASNS) have been widely examined for influence analysis in terms of co-authorship and co-citation networks. The impact of other network-based features, such as followers and followings, provided by ASNS such as ResearchGate (RG) and Academia is yet to be anatomised. As proven in ingrained social theories, the followers and followings have significant impact in influence prorogation. This research aims at examining the same in one of the widely adopted ASNS, RG. The rendering process is developed to render real-time RG information, which is modelled into graph. Standard centrality measures are implemented to identify influential users from the constructed RG graph. Each centrality measure gives a list of top- k influential RG users. The results are compared with RGScore and Total Research Interest (TRI) to discover the most effective centrality measure. Betweenness and closeness centrality measures have shown the outperforming results compared with others. A procedure is established to discover influential RG users that are commonly present in all top- k centrality results to identify dominant skills, affiliations, departments and locations from the rendered data.


RECENT SCHOLAR PUBLICATIONS

  • Contextual analysis of scholarly communications to identify the source of disinformation on digital scholarly platforms
    M Desai, RG Mehta, DP Rana
    Kybernetes 53 (4), 1434-1449 2024

  • HARUIM: high average recent utility itemset mining
    MJK Kumar, D Rana
    International Journal of Data Mining, Modelling and Management 16 (1), 66-100 2024

  • Analysis of contextual features’ granularity for fake news detection
    I Agarwal, D Rana, K Panwala, R Shah, V Kathiriya
    Multimedia Tools and Applications, 1-17 2023

  • Analyzing the Impact of Social Collaborations on Influence Identification in Scientific Literature Analytic: An Analysis on ResearchGate and Academia
    M Desai, R Mehta, D Rana
    International Journal of Information Science and Management (IJISM) 21 (4 2023

  • Text mining approach for the prediction of disease status from discharge summaries using CCBE and NEROA-CNN
    PY Mahajan, DP Rana
    Expert Systems with Applications 227, 120310 2023

  • An Exploratory Study of Scholarly Platforms and Features to Help Emerging Scholars Gain Visibility in the Scholars’ Community
    M Desai, R Mehta, D Rana
    Journal of Scientometric Research 12 (2), 480-489 2023

  • ScholarRec: A scholars’ recommender system that combines scholastic influence and social collaborations in academic social networks
    M Desai, RG Mehta, DP Rana
    International Journal of Data Science and Analytics 16 (2), 203-216 2023

  • Research Challenges for Legal Document Summarization
    DP Rana, RG Mehta
    2023 IEEE World Conference on Applied Intelligence and Computing (AIC), 307-312 2023

  • DSPWE: distributed sentiment polarized word embedding for voluminous textual data
    J Dhanani, R Mehta, DP Rana
    Journal of Ambient Intelligence and Humanized Computing 14 (7), 9419-9433 2023

  • RSPHUIM: Recent Short Period High Utility Itemset Mining
    MJ Kenny Kumar, D Rana
    SN Computer Science 4 (5), 485 2023

  • GeoAI-Based Covid-19 Prediction Model
    J Kumari, DP Rana
    Advances in Data-driven Computing and Intelligent Systems: Selected Papers 2023

  • HAUOPM: High Average Utility Occupancy Pattern Mining
    MJK Kumar, D Rana
    Arabian Journal for Science and Engineering, 1-20 2023

  • Fine-Tuned Predictive Models for Forecasting Severity Level of COVID-19 Patient Using Epidemiological Data
    SA Tikhe, DP Rana
    Frontiers of ICT in Healthcare: Proceedings of EAIT 2022, 431-442 2023

  • FNH—A Data Repository for Studying Fake News in Healthcare Domain
    I Agarwal, D Rana, CS Teja, NNS Sai Daivik
    Frontiers of ICT in Healthcare: Proceedings of EAIT 2022, 39-51 2023

  • An Improved Fake News Detection Model by Applying a Recursive Feature Elimination Approach for Credibility Assessment and Uncertainty
    IY Agarwal, DP Rana
    Journal of Uncertain Systems 16 (01), 2242008 2023

  • Hierarchical Earthquake Prediction Framework
    D Rana, C Shah, Y Kabra, U Daginawala, P Tibrewal
    Proceedings of the International Conference on Paradigms of Computing 2023

  • Feature optimization in CNN using MROA for disease classification
    P Mahajan, D Rana
    Intelligent Decision Technologies, 1-15 2023

  • A Model to Identify Redundancy and Relevancy in Question-Answer Systems of Digital Scholarly Platforms
    M Desai, RG Mehta, DP Rana
    Procedia Computer Science 218, 2383-2391 2023

  • Spatio-temporal approach for classification of COVID-19 pandemic fake news
    IY Agarwal, DP Rana, M Shaikh, S Poudel
    Social Network Analysis and Mining 12 (1), 68 2022

  • AI federated learning based improvised random Forest classifier with error reduction mechanism for skewed data sets
    A More, D Rana
    International Journal of Pervasive Computing and Communications 2022

MOST CITED SCHOLAR PUBLICATIONS

  • Review of random forest classification techniques to resolve data imbalance
    AS More, DP Rana
    2017 1st International conference on intelligent systems and information 2017
    Citations: 135

  • HTTP botnet detection using frequent patternset mining
    SS Garasia, DP Rana, RG Mehta
    International Journal of Engineering Science & Advanced Technology 2 (3 2012
    Citations: 27

  • Legal Document Recommendation System: A Cluster Based Pairwise Similarity Computation
    J Dhanani, R Mehta, D Rana
    Journal of Intelligent & Fuzzy Systems 41 (6), 5497-5509 2021
    Citations: 20

  • Discretization of temporal data: a survey
    P Chaudhari, DP Rana, RG Mehta, NJ Mistry, MM Raghuwanshi
    arXiv preprint arXiv:1402.4283 2014
    Citations: 20

  • Social Network Influencer Rank Recommender Using Diverse Features from Topical Graph
    D Mittal, M Suthar, Pooja, Patil, PGS Pranaya, D Rana, B Tidke
    International Conference on Computational Intelligence and Data Science 2019
    Citations: 16

  • An Experimental Assessment of Random Forest Classification Performance Improvisation with Sampling and Stage Wise Success Rate Calculation
    A More, D Rana
    International Conference on Computational Intelligence and Data Science 2019
    Citations: 13

  • Detecting e-mail spam using spam word associations
    NS Kumar, DP Rana, RG Mehta
    International Journal of Emerging Technology and Advanced Engineering 2 (4 2012
    Citations: 13

  • A novel fuzzy based classification for data mining using fuzzy discretization
    RG Mehta, DP Rana, MA Zaveri
    2009 WRI World Congress on Computer Science and Information Engineering 3 2009
    Citations: 13

  • Random forest classifier approach for imbalanced big data classification for smart city application domains
    A S More, D P Rana, I Agarwal
    International Journal of Computational Intelligence & IoT 1 (2) 2018
    Citations: 12

  • Effective and scalable legal judgment recommendation using pre-learned word embedding
    J Dhanani, R Mehta, D Rana
    Complex & Intelligent Systems 8 (4), 3199-3213 2022
    Citations: 11

  • Issues and challenges in big graph modelling for smart city: An extensive survey
    M Desai, R Mehta, D Rana
    International Conference on Computational Intelligence and Internet of 2018
    Citations: 11

  • FApriori: A modified Apriori algorithm based on checkpoint
    MR Patel, DP Rana, RG Mehta
    2013 International Conference on Information Systems and Computer Networks 2013
    Citations: 11

  • Data Mining with Meteorological Data
    AR Chaudhari, DP Rana, RG Mehta
    International Journal of Advanced Computer Research (ISSN (print): 2249-7277 2013
    Citations: 10

  • A novel approach for high dimensional data clustering
    B Tidke, RG Mehta, DP Rana
    Lap Lambert Academic Publishing 2012
    Citations: 10

  • Spatio-temporal approach for classification of COVID-19 pandemic fake news
    IY Agarwal, DP Rana, M Shaikh, S Poudel
    Social Network Analysis and Mining 12 (1), 68 2022
    Citations: 9

  • An Empirical Analysis to Identify the Effect of Indexing on Influence Detection using Graph Databases
    M Desai, R Mehta, D Rana
    International Conference on Innovations in Communication Computing And 2019
    Citations: 8

  • A Survey on Techniques for Indexing and Hashing in Big Data
    M Desai, R Mehta, D Rana
    IEEE 4th International Conference on Computing Communication and Automation 2018
    Citations: 8

  • Legal document recommendation system: a dictionary based approach
    J Dhanani, R Mehta, DP Rana
    International Journal of Web Information Systems 17 (3), 187-203 2021
    Citations: 7

  • RGNet: the novel framework to model linked ResearchGate information into network using hierarchical data rendering
    M Desai, RG Mehta, DP Rana
    Advances in Machine Learning and Computational Intelligence: Proceedings of 2020
    Citations: 7

  • Efficient word2vec vectors for sentiment analysis to improve commercial movie success
    Y Parikh, A Palusa, S Kasthuri, R Mehta, D Rana
    Advanced Computational and Communication Paradigms: Proceedings of 2018
    Citations: 7