Dipti Rana

@svnit.ac.in/facup

Assistant Professor
Sardar Vallabhbhai National Institute of Technology

https://researchid.co/dprsvnit

EDUCATION

Ph. D.

RESEARCH INTERESTS

Data Mining, Big Data Analytics, DBMS, Computer Organization

Scopus Publications

474

Scholar Citations

Scholar h-index

Scholar i10-index

Scopus Publications

HARUIM: high average recent utility itemset mining
Mathe John Kenny Kumar and Dipti Rana
Inderscience Publishers

Text mining approach for the prediction of disease status from discharge summaries using CCBE and NEROA-CNN
Pranita Y. Mahajan and Dipti P. Rana
Elsevier BV

RSPHUIM: Recent Short Period High Utility Itemset Mining
Mathe John Kenny Kumar and Dipti Rana
Springer Science and Business Media LLC

ScholarRec: a scholars’ recommender system that combines scholastic influence and social collaborations in academic social networks
Mitali Desai, Rupa G. Mehta, and Dipti P. Rana
Springer Science and Business Media LLC

DSPWE: distributed sentiment polarized word embedding for voluminous textual data
Jenish Dhanani, Rupa Mehta, and Dipti P. Rana
Springer Science and Business Media LLC

An Improved Fake News Detection Model by Applying a Recursive Feature Elimination Approach for Credibility Assessment and Uncertainty
I. Y. Agarwal and D. P. Rana
World Scientific Pub Co Pte Ltd
World Health Organization (W.H.O) has coined the word “Infodemic” to refer to the dissemination of fake news during this pandemic, which is considered to be as harmful as the virus itself. Verifying the information available on the internet is a prerequisite to ensuring the ecosystem is maintained which is the driving force behind this work. The primary goal of this study is to address the problem of time-consuming automatically voluminous fake news detection of certain data and consider the uncertainty of data from causal relations using a rich feature set. This research produces significant feature reduction for reduced time and improved accuracy by filtering out significant features using recursive feature selection (RFE). The retained features by the RFE algorithm are also compared to a standard statistical measure of Pearson’s correlation to ensure no information loss while reducing features. The suggested methodology has also defined appropriate class output assurance levels and accurate prediction ambiguity for the fake identification jobs. Comparative analysis for existing methods for feature selection is performed. The result of experimentation testifies the improvement of a 6% increase in precision and a 97% reduction in execution time.

Analysis of contextual features’ granularity for fake news detection
Isha Agarwal, Dipti Rana, Kalp Panwala, Raj Shah, and Viren Kathiriya
Springer Science and Business Media LLC

Research Challenges for Legal Document Summarization
Nikita, Dipti P. Rana, and Rupa G. Mehta
IEEE
Legal judgment documents are detailed and contains legal terms and codes. These characteristics of legal documents makes it complex to read and analyze, which makes processing legal documents a challenging task. This raises a need for generating automatic summaries. Several techniques have been used by researchers to summarize legal documents such as traditional methods, legal specific approaches and transformer models based approaches. This research focuses on role of summarization in legal domain and various methods for summary generation. We summarize Indian judgment documents with various state of art methods for comparative study. The analysis opened various research challenges.

Analyzing the Impact of Social Collaborations on Influence Identification in Scientific Literature Analytic: An Analysis on ResearchGate and Academia

An Exploratory Study of Scholarly Platforms and Features to Help Emerging Scholars Gain Visibility in the Scholars’ Community
Mitali Desai, Rupa Mehta, and Dipti Rana
EManuscript Technologies
Among available scholarly features on digitized scholarly platforms, certain features have high significance in assessing scholar's influence. If these features are identified, using them legitimately, emerging scholars can increase their influence and gain visibility in the scholars’ community. The purpose of this research is to identify and rank significant features on scholarly platforms. To select a data source, a comparative analysis of well-known scholarly platforms is performed. Based on the analysis, ResearchGate (RG) is selected. For RG, this research proposes a methodology to identify and rank significant scholarly features. The results demonstrate that for the rendered RG data, identified significant features in the order of their significance are number of citations, research items, followers, reads, recommendations, followings and projects. Significant features discovered in this research can be employed by various scholarly platforms to identify influential scholars. These scholars can be utilized in applications such as expert finding, influence ranking, recommendation systems, interdisciplinary collaborations etc. Moreover, the identified significant features will help scholars in focusing on certain aspects (features) to increase their influence legitimately.

Feature optimization in CNN using MROA for disease classification
Pranita Mahajan and Dipti Rana
IOS Press
Electronic Medical Records (EMR) carry important information about a patient’s journey. The past decade shows substantial use of Natural Language Processing (NLP)-based Information Retrieval (IR) techniques to extract insights such as symptoms, diseases, and tests from these unstructured records. The state-of-the-art shows that convolutional neural networks (CNN) make a significant contribution to the disease classification task.A significant improvement in precise knowledge mining is possible with precise feature extraction. Feature selection addresses undesirable, unneeded, or irrelevant features. This article proposes a Modified Rider Optimization Algorithm (MROA) to choose important features by selecting optimal weights from a pool of randomly generated weights based on high accuracy and less training time in the CNN algorithm. A modified approach is trained on 114 N2C2 patients’ records to extract symptoms, disease, and tests are performed on them to perform disease classification tasks. The proposed approach is found to be accurate, with 97.77% accuracy in the disease classification and treatment prediction task from EMR.

GeoAI-Based Covid-19 Prediction Model
Jyoti Kumari and Dipti P. Rana
Springer Nature Singapore

HAUOPM: High Average Utility Occupancy Pattern Mining
Mathe John Kenny Kumar and Dipti Rana
Springer Science and Business Media LLC

FNH—A Data Repository for Studying Fake News in Healthcare Domain
Isha Agarwal, Dipti Rana, Ch Surya Teja, and Nunna Naga Surya Sai Daivik
Springer Nature Singapore

Fine-Tuned Predictive Models for Forecasting Severity Level of COVID-19 Patient Using Epidemiological Data
Shweta A. Tikhe and Dipti P. Rana
Springer Nature Singapore

Contextual analysis of scholarly communications to identify the source of disinformation on digital scholarly platforms
Mitali Desai, Rupa G. Mehta, and Dipti P. Rana
Emerald
PurposeScholarly communications, particularly, questions and answers (Q&A) present on digital scholarly platforms provide a new avenue to gain knowledge. However, several studies have raised a concern about the content anomalies in these Q&A and suggested a proper validation before utilizing them in scholarly applications such as influence analysis and content-based recommendation systems. The content anomalies are referred as disinformation in this research. The purpose of this research is firstly, to assess scholarly communications in order to identify disinformation and secondly, to help scholarly platforms determine the scholars who probably disseminate such disinformation. These scholars are referred as the probable sources of disinformation.Design/methodology/approachTo identify disinformation, the proposed model deduces (1) content redundancy and contextual redundancy in questions (2) contextual nonrelevance in answers with respect to the questions and (3) quality of answers with respect to the expertise of the answering scholars. Then, the model determines the probable sources of disinformation using the statistical analysis.FindingsThe model is evaluated on ResearchGate (RG) data. Results suggest that the model efficiently identifies disinformation from scholarly communications and accurately detects the probable sources of disinformation.Practical implicationsDifferent platforms with communication portals can use this model as a regulatory mechanism to restrict the prorogation of disinformation. Scholarly platforms can use this model to generate an accurate influence assessment mechanism and also relevant recommendations for their scholars.Originality/valueThe existing studies majorly deal with validating the answers using statistical measures. The proposed model focuses on questions as well as answers and performs a contextual analysis using an advanced word embedding technique.

Spatio-temporal approach for classification of COVID-19 pandemic fake news
I. Y. Agarwal, D. P. Rana, M. Shaikh, and S. Poudel
Springer Science and Business Media LLC

Effective and scalable legal judgment recommendation using pre-learned word embedding
Jenish Dhanani, Rupa Mehta, and Dipti Rana
Springer Science and Business Media LLC
AbstractLegal professionals strongly demand an automatic and convenient legal document recommendation system (LDRS) to identify similar judgments for preparing the advantageous and strategic arguments in the Court. Doc2Vec excellently learns semantically rich embedding (i.e., vector) space from the textual information of judgment corpus. During Doc2Vec learning, the practice of prior domain-specific knowledge can potentially enhance the embedding representation. This research thus proposes a pre-learned word embedding based LDRS (P-LDRS) that learns the Doc2Vec embedding using Legal domain-specific pre-learned word embedding possessing the Legal semantic knowledge. However, learning the judgment embedding from existing substantial Legal documents turns out to be a scalability issue for Doc2Vec. The proposed P-LDRS also provides additional functionality to learn the judgment embedding distributedly over the cluster of computing nodes using frameworks like MapReduce and Spark to address the scalability issue. The empirical analysis is performed with a non-distributed and a distributed variant of the proposed P-LDRS to validate the effectiveness and scalability. Experiment results showcase that proposed non-distributed P-LDRS perform significantly better than traditional Doc2Vec based LDRS with an Accuracy of 0.88, F1-Score of 0.82 and MCC Score of 0.73. They also demonstrate that the proposed distributed P-LDRS improves the time efficiency and achieves stable Accuracy of $$\\approx $$ ≈ 0.88, F1-Score of $$\\approx $$ ≈ 0.83 and MCC Score of $$\\approx $$ ≈ 0.72, with an increasing number of nodes.

A Model to Identify Redundancy and Relevancy in Question-Answer Systems of Digital Scholarly Platforms
Mitali Desai, Rupa G. Mehta, and Dipti P. Rana
Elsevier BV

Dictionary Based Gender Identification and Gender Based Sentiment Analysis with Polarized Word2Vec
Navodita Saini and Dipti P. Rana
Springer Nature Singapore

An Extensive Review of Deep Learning Driven Remote Sensing Image Classification Models
Vaishnavee V. Rathod, Dipti P. Rana, and Rupa G. Mehta
IEEE
Remote sensing images (RSI) are significant data to examine and observe complete structure on the Earth’s surface. RSI classification has gained significant attention in earth observation technologies, commonly employed in military and civil fields. It becomes a challenging process because of high dimensional features and small amount of labeled data. Advancements in machine learning (ML) and deep learning (DL) models are capable in effective RSI classification. Numerous research is going on in RSI detection and classification area using ML and DL models. In this view, this article focuses on the review of recently developed RSI classification models. A brief introduction to RSI, types, characteristics and challenging issues is given. By a meta-analysis, different approaches related to the RSI classification models are identified and summarized with key findings. Besides, this survey covers the recently developed ML and DL based RSI classification models with their major aim, methodology used, merits, and demerits. At last, a concluding remark related to the present state of art approaches with possible future scope is discussed.

Novel Oversampling Algorithm for Handling Imbalanced Data Classification Novel Oversampling Algorithm
Anjali S. More and Dipti P. Rana
The Science and Information Organization
—In the current age, the attention of researchers is immersed by numerous imbalanced data applications. These application areas are intrusion detection in security, fraud recognition in finance, medical applications dealing with disease diagnosis pilfering in electricity, and many more. Imbalanced data applications are categorized into two types: binary and multiclass data imbalance. Unequal data distribution among data diverts classification performance metrics towards the majority data instance class and ignores the minority data, instance class. Data imbalance leads to an increase in the classification error rate. Random Forest Classification (RFC) is best suitable technique to deal with imbalanced datasets. This paper proposes the novel oversampling rate calculation algorithm as Improvised Dynamic Binary-Multiclass Imbalanced Oversampling Rate (IDBMORate). Experimentation analysis of the proposed novel approach IDBMORate on Page-block (Binary) dataset shows that instances of positive class is increased from 559 to 1118 whereas negative instance class remains same as 4913. In case of referred multiclass dataset (Ecoli), IDBMORate produces the consistent result as minority classes (om, omL, imS, imL) instances are oversampled majority class instances remains unchanged. IDBMORate algorithm reduces the ignorance of minority class and oversamples its data without disturbing the size of the majority instance class. Thus, it reduces the overall computation cost and leads towards the improvisation of classification performance.

AI federated learning based improvised random Forest classifier with error reduction mechanism for skewed data sets
Anjali More and Dipti Rana
Emerald
Purpose Referred data set produces reliable information about the network flows and common attacks meeting with real-world criteria. Accordingly, this study aims to focus on the use of imbalanced intrusion detection benchmark knowledge discovery in database (KDD) data set. KDD data set is most preferably used by many researchers for experimentation and analysis. The proposed algorithm improvised random forest classification with error tuning factors (IRFCETF) deals with experimentation on KDD data set and evaluates the performance of a complete set of network traffic features through IRFCETF. Design/methodology/approach In the current era of applications, the attention of researchers is immersed by a diverse number of existing time applications that deals with imbalanced data classification (ImDC). Real-time application areas, artificial intelligence (AI), Industrial Internet of Things (IIoT), etc. are dealing ImDC undergo with diverted classification performance due to skewed data distribution (SkDD). There are numerous application areas that deal with SkDD. Many of the data applications in AI and IIoT face the diverted data classification rate in SkDD. In recent advancements, there is an exponential expansion in the volume of computer network data and related application developments. Intrusion detection is one of the demanding applications of ImDC. The proposed study focusses on imbalanced intrusion benchmark data set, KDD data set and other benchmark data set with the proposed IRFCETF approach. IRFCETF justifies the enriched classification performance on imbalanced data set over the existing approach. The purpose of this work is to review imbalanced data applications in numerous application areas including AI and IIoT and tuning the performance with respect to principal component analysis. This study also focusses on the out-of-bag error performance-tuning factor. Findings Experimental results on KDD data set shows that proposed algorithm gives enriched performance. For referred intrusion detection data set, IRFCETF classification accuracy is 99.57% and error rate is 0.43%. Research limitations/implications This research work extended for further improvements in classification techniques with multiple correspondence analysis (MCA); hierarchical MCA can be focussed with the use of classification models for wide range of skewed data sets. Practical implications The metrics enhancement is measurable and helpful in dealing with intrusion detection systems–related imbalanced applications in current application domains such as security, AI and IIoT digitization. Analytical results show improvised metrics of the proposed approach than other traditional machine learning algorithms. Thus, error-tuning parameter creates a measurable impact on classification accuracy is justified with the proposed IRFCETF. Social implications Proposed algorithm is useful in numerous IIoT applications such as health care, machinery automation etc. Originality/value This research work addressed classification metric enhancement approach IRFCETF. The proposed method yields a test set categorization for each case with error reduction mechanism.

Anatomising the impact of ResearchGate followers and followings on influence identification
Mitali Desai, Rupa G Mehta, and Dipti P Rana
SAGE Publications
Influence analysis, derived from Social Network Analysis (SNA), is extremely useful in academic literature analytic. Different Academic Social Network Sites (ASNS) have been widely examined for influence analysis in terms of co-authorship and co-citation networks. The impact of other network-based features, such as followers and followings, provided by ASNS such as ResearchGate (RG) and Academia is yet to be anatomised. As proven in ingrained social theories, the followers and followings have significant impact in influence prorogation. This research aims at examining the same in one of the widely adopted ASNS, RG. The rendering process is developed to render real-time RG information, which is modelled into graph. Standard centrality measures are implemented to identify influential users from the constructed RG graph. Each centrality measure gives a list of top- k influential RG users. The results are compared with RGScore and Total Research Interest (TRI) to discover the most effective centrality measure. Betweenness and closeness centrality measures have shown the outperforming results compared with others. A procedure is established to discover influential RG users that are commonly present in all top- k centrality results to identify dominant skills, affiliations, departments and locations from the rendered data.

Performance enrichment through parameter tuning of random forest classification for imbalanced data applications
Anjali S. More and Dipti P. Rana
Elsevier BV

RECENT SCHOLAR PUBLICATIONS

Contextual analysis of scholarly communications to identify the source of disinformation on digital scholarly platforms
M Desai, RG Mehta, DP Rana
Kybernetes 53 (4), 1434-1449 2024

HARUIM: high average recent utility itemset mining
MJK Kumar, D Rana
International Journal of Data Mining, Modelling and Management 16 (1), 66-100 2024

Analysis of contextual features’ granularity for fake news detection
I Agarwal, D Rana, K Panwala, R Shah, V Kathiriya
Multimedia Tools and Applications, 1-17 2023

Analyzing the Impact of Social Collaborations on Influence Identification in Scientific Literature Analytic: An Analysis on ResearchGate and Academia
M Desai, R Mehta, D Rana
International Journal of Information Science and Management (IJISM) 21 (4 2023

Text mining approach for the prediction of disease status from discharge summaries using CCBE and NEROA-CNN
PY Mahajan, DP Rana
Expert Systems with Applications 227, 120310 2023

An Exploratory Study of Scholarly Platforms and Features to Help Emerging Scholars Gain Visibility in the Scholars’ Community
M Desai, R Mehta, D Rana
Journal of Scientometric Research 12 (2), 480-489 2023

ScholarRec: A scholars’ recommender system that combines scholastic influence and social collaborations in academic social networks
M Desai, RG Mehta, DP Rana
International Journal of Data Science and Analytics 16 (2), 203-216 2023

Research Challenges for Legal Document Summarization
DP Rana, RG Mehta
2023 IEEE World Conference on Applied Intelligence and Computing (AIC), 307-312 2023

DSPWE: distributed sentiment polarized word embedding for voluminous textual data
J Dhanani, R Mehta, DP Rana
Journal of Ambient Intelligence and Humanized Computing 14 (7), 9419-9433 2023

RSPHUIM: Recent Short Period High Utility Itemset Mining
MJ Kenny Kumar, D Rana
SN Computer Science 4 (5), 485 2023

GeoAI-Based Covid-19 Prediction Model
J Kumari, DP Rana
Advances in Data-driven Computing and Intelligent Systems: Selected Papers 2023

HAUOPM: High Average Utility Occupancy Pattern Mining
MJK Kumar, D Rana
Arabian Journal for Science and Engineering, 1-20 2023

Fine-Tuned Predictive Models for Forecasting Severity Level of COVID-19 Patient Using Epidemiological Data
SA Tikhe, DP Rana
Frontiers of ICT in Healthcare: Proceedings of EAIT 2022, 431-442 2023

FNH—A Data Repository for Studying Fake News in Healthcare Domain
I Agarwal, D Rana, CS Teja, NNS Sai Daivik
Frontiers of ICT in Healthcare: Proceedings of EAIT 2022, 39-51 2023

An Improved Fake News Detection Model by Applying a Recursive Feature Elimination Approach for Credibility Assessment and Uncertainty
IY Agarwal, DP Rana
Journal of Uncertain Systems 16 (01), 2242008 2023

Hierarchical Earthquake Prediction Framework
D Rana, C Shah, Y Kabra, U Daginawala, P Tibrewal
Proceedings of the International Conference on Paradigms of Computing 2023

Feature optimization in CNN using MROA for disease classification
P Mahajan, D Rana
Intelligent Decision Technologies, 1-15 2023

A Model to Identify Redundancy and Relevancy in Question-Answer Systems of Digital Scholarly Platforms
M Desai, RG Mehta, DP Rana
Procedia Computer Science 218, 2383-2391 2023

Spatio-temporal approach for classification of COVID-19 pandemic fake news
IY Agarwal, DP Rana, M Shaikh, S Poudel
Social Network Analysis and Mining 12 (1), 68 2022

AI federated learning based improvised random Forest classifier with error reduction mechanism for skewed data sets
A More, D Rana
International Journal of Pervasive Computing and Communications 2022

MOST CITED SCHOLAR PUBLICATIONS

Review of random forest classification techniques to resolve data imbalance
AS More, DP Rana
2017 1st International conference on intelligent systems and information 2017
Citations: 135

HTTP botnet detection using frequent patternset mining
SS Garasia, DP Rana, RG Mehta
International Journal of Engineering Science & Advanced Technology 2 (3 2012
Citations: 27

Legal Document Recommendation System: A Cluster Based Pairwise Similarity Computation
J Dhanani, R Mehta, D Rana
Journal of Intelligent & Fuzzy Systems 41 (6), 5497-5509 2021
Citations: 20

Discretization of temporal data: a survey
P Chaudhari, DP Rana, RG Mehta, NJ Mistry, MM Raghuwanshi
arXiv preprint arXiv:1402.4283 2014
Citations: 20

Social Network Influencer Rank Recommender Using Diverse Features from Topical Graph
D Mittal, M Suthar, Pooja, Patil, PGS Pranaya, D Rana, B Tidke
International Conference on Computational Intelligence and Data Science 2019
Citations: 16

An Experimental Assessment of Random Forest Classification Performance Improvisation with Sampling and Stage Wise Success Rate Calculation
A More, D Rana
International Conference on Computational Intelligence and Data Science 2019
Citations: 13

Detecting e-mail spam using spam word associations
NS Kumar, DP Rana, RG Mehta
International Journal of Emerging Technology and Advanced Engineering 2 (4 2012
Citations: 13

A novel fuzzy based classification for data mining using fuzzy discretization
RG Mehta, DP Rana, MA Zaveri
2009 WRI World Congress on Computer Science and Information Engineering 3 2009
Citations: 13

Random forest classifier approach for imbalanced big data classification for smart city application domains
A S More, D P Rana, I Agarwal
International Journal of Computational Intelligence & IoT 1 (2) 2018
Citations: 12

Effective and scalable legal judgment recommendation using pre-learned word embedding
J Dhanani, R Mehta, D Rana
Complex & Intelligent Systems 8 (4), 3199-3213 2022
Citations: 11

Issues and challenges in big graph modelling for smart city: An extensive survey
M Desai, R Mehta, D Rana
International Conference on Computational Intelligence and Internet of 2018
Citations: 11

FApriori: A modified Apriori algorithm based on checkpoint
MR Patel, DP Rana, RG Mehta
2013 International Conference on Information Systems and Computer Networks 2013
Citations: 11

Data Mining with Meteorological Data
AR Chaudhari, DP Rana, RG Mehta
International Journal of Advanced Computer Research (ISSN (print): 2249-7277 2013
Citations: 10

A novel approach for high dimensional data clustering
B Tidke, RG Mehta, DP Rana
Lap Lambert Academic Publishing 2012
Citations: 10

Spatio-temporal approach for classification of COVID-19 pandemic fake news
IY Agarwal, DP Rana, M Shaikh, S Poudel
Social Network Analysis and Mining 12 (1), 68 2022
Citations: 9

An Empirical Analysis to Identify the Effect of Indexing on Influence Detection using Graph Databases
M Desai, R Mehta, D Rana
International Conference on Innovations in Communication Computing And 2019
Citations: 8

A Survey on Techniques for Indexing and Hashing in Big Data
M Desai, R Mehta, D Rana
IEEE 4th International Conference on Computing Communication and Automation 2018
Citations: 8

Legal document recommendation system: a dictionary based approach
J Dhanani, R Mehta, DP Rana
International Journal of Web Information Systems 17 (3), 187-203 2021
Citations: 7

RGNet: the novel framework to model linked ResearchGate information into network using hierarchical data rendering
M Desai, RG Mehta, DP Rana
Advances in Machine Learning and Computational Intelligence: Proceedings of 2020
Citations: 7

Efficient word2vec vectors for sentiment analysis to improve commercial movie success
Y Parikh, A Palusa, S Kasthuri, R Mehta, D Rana
Advanced Computational and Communication Paradigms: Proceedings of 2018
Citations: 7