@svnit.ac.in/facup
Assistant Professor
Sardar Vallabhbhai National Institute of Technology
Ph. D.
Data Mining, Big Data Analytics, DBMS, Computer Organization
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Mathe John Kenny Kumar and Dipti Rana
Inderscience Publishers
Pranita Y. Mahajan and Dipti P. Rana
Elsevier BV
Mathe John Kenny Kumar and Dipti Rana
Springer Science and Business Media LLC
Mitali Desai, Rupa G. Mehta, and Dipti P. Rana
Springer Science and Business Media LLC
Jenish Dhanani, Rupa Mehta, and Dipti P. Rana
Springer Science and Business Media LLC
I. Y. Agarwal and D. P. Rana
World Scientific Pub Co Pte Ltd
World Health Organization (W.H.O) has coined the word “Infodemic” to refer to the dissemination of fake news during this pandemic, which is considered to be as harmful as the virus itself. Verifying the information available on the internet is a prerequisite to ensuring the ecosystem is maintained which is the driving force behind this work. The primary goal of this study is to address the problem of time-consuming automatically voluminous fake news detection of certain data and consider the uncertainty of data from causal relations using a rich feature set. This research produces significant feature reduction for reduced time and improved accuracy by filtering out significant features using recursive feature selection (RFE). The retained features by the RFE algorithm are also compared to a standard statistical measure of Pearson’s correlation to ensure no information loss while reducing features. The suggested methodology has also defined appropriate class output assurance levels and accurate prediction ambiguity for the fake identification jobs. Comparative analysis for existing methods for feature selection is performed. The result of experimentation testifies the improvement of a 6% increase in precision and a 97% reduction in execution time.
Isha Agarwal, Dipti Rana, Kalp Panwala, Raj Shah, and Viren Kathiriya
Springer Science and Business Media LLC
Nikita, Dipti P. Rana, and Rupa G. Mehta
IEEE
Legal judgment documents are detailed and contains legal terms and codes. These characteristics of legal documents makes it complex to read and analyze, which makes processing legal documents a challenging task. This raises a need for generating automatic summaries. Several techniques have been used by researchers to summarize legal documents such as traditional methods, legal specific approaches and transformer models based approaches. This research focuses on role of summarization in legal domain and various methods for summary generation. We summarize Indian judgment documents with various state of art methods for comparative study. The analysis opened various research challenges.
Mitali Desai, Rupa Mehta, and Dipti Rana
EManuscript Technologies
Among available scholarly features on digitized scholarly platforms, certain features have high significance in assessing scholar's influence. If these features are identified, using them legitimately, emerging scholars can increase their influence and gain visibility in the scholars’ community. The purpose of this research is to identify and rank significant features on scholarly platforms. To select a data source, a comparative analysis of well-known scholarly platforms is performed. Based on the analysis, ResearchGate (RG) is selected. For RG, this research proposes a methodology to identify and rank significant scholarly features. The results demonstrate that for the rendered RG data, identified significant features in the order of their significance are number of citations, research items, followers, reads, recommendations, followings and projects. Significant features discovered in this research can be employed by various scholarly platforms to identify influential scholars. These scholars can be utilized in applications such as expert finding, influence ranking, recommendation systems, interdisciplinary collaborations etc. Moreover, the identified significant features will help scholars in focusing on certain aspects (features) to increase their influence legitimately.
Pranita Mahajan and Dipti Rana
IOS Press
Electronic Medical Records (EMR) carry important information about a patient’s journey. The past decade shows substantial use of Natural Language Processing (NLP)-based Information Retrieval (IR) techniques to extract insights such as symptoms, diseases, and tests from these unstructured records. The state-of-the-art shows that convolutional neural networks (CNN) make a significant contribution to the disease classification task.A significant improvement in precise knowledge mining is possible with precise feature extraction. Feature selection addresses undesirable, unneeded, or irrelevant features. This article proposes a Modified Rider Optimization Algorithm (MROA) to choose important features by selecting optimal weights from a pool of randomly generated weights based on high accuracy and less training time in the CNN algorithm. A modified approach is trained on 114 N2C2 patients’ records to extract symptoms, disease, and tests are performed on them to perform disease classification tasks. The proposed approach is found to be accurate, with 97.77% accuracy in the disease classification and treatment prediction task from EMR.
Jyoti Kumari and Dipti P. Rana
Springer Nature Singapore
Mathe John Kenny Kumar and Dipti Rana
Springer Science and Business Media LLC
Isha Agarwal, Dipti Rana, Ch Surya Teja, and Nunna Naga Surya Sai Daivik
Springer Nature Singapore
Shweta A. Tikhe and Dipti P. Rana
Springer Nature Singapore
Mitali Desai, Rupa G. Mehta, and Dipti P. Rana
Emerald
PurposeScholarly communications, particularly, questions and answers (Q&A) present on digital scholarly platforms provide a new avenue to gain knowledge. However, several studies have raised a concern about the content anomalies in these Q&A and suggested a proper validation before utilizing them in scholarly applications such as influence analysis and content-based recommendation systems. The content anomalies are referred as disinformation in this research. The purpose of this research is firstly, to assess scholarly communications in order to identify disinformation and secondly, to help scholarly platforms determine the scholars who probably disseminate such disinformation. These scholars are referred as the probable sources of disinformation.Design/methodology/approachTo identify disinformation, the proposed model deduces (1) content redundancy and contextual redundancy in questions (2) contextual nonrelevance in answers with respect to the questions and (3) quality of answers with respect to the expertise of the answering scholars. Then, the model determines the probable sources of disinformation using the statistical analysis.FindingsThe model is evaluated on ResearchGate (RG) data. Results suggest that the model efficiently identifies disinformation from scholarly communications and accurately detects the probable sources of disinformation.Practical implicationsDifferent platforms with communication portals can use this model as a regulatory mechanism to restrict the prorogation of disinformation. Scholarly platforms can use this model to generate an accurate influence assessment mechanism and also relevant recommendations for their scholars.Originality/valueThe existing studies majorly deal with validating the answers using statistical measures. The proposed model focuses on questions as well as answers and performs a contextual analysis using an advanced word embedding technique.
I. Y. Agarwal, D. P. Rana, M. Shaikh, and S. Poudel
Springer Science and Business Media LLC
Jenish Dhanani, Rupa Mehta, and Dipti Rana
Springer Science and Business Media LLC
AbstractLegal professionals strongly demand an automatic and convenient legal document recommendation system (LDRS) to identify similar judgments for preparing the advantageous and strategic arguments in the Court. Doc2Vec excellently learns semantically rich embedding (i.e., vector) space from the textual information of judgment corpus. During Doc2Vec learning, the practice of prior domain-specific knowledge can potentially enhance the embedding representation. This research thus proposes a pre-learned word embedding based LDRS (P-LDRS) that learns the Doc2Vec embedding using Legal domain-specific pre-learned word embedding possessing the Legal semantic knowledge. However, learning the judgment embedding from existing substantial Legal documents turns out to be a scalability issue for Doc2Vec. The proposed P-LDRS also provides additional functionality to learn the judgment embedding distributedly over the cluster of computing nodes using frameworks like MapReduce and Spark to address the scalability issue. The empirical analysis is performed with a non-distributed and a distributed variant of the proposed P-LDRS to validate the effectiveness and scalability. Experiment results showcase that proposed non-distributed P-LDRS perform significantly better than traditional Doc2Vec based LDRS with an Accuracy of 0.88, F1-Score of 0.82 and MCC Score of 0.73. They also demonstrate that the proposed distributed P-LDRS improves the time efficiency and achieves stable Accuracy of $$\\approx $$ ≈ 0.88, F1-Score of $$\\approx $$ ≈ 0.83 and MCC Score of $$\\approx $$ ≈ 0.72, with an increasing number of nodes.
Mitali Desai, Rupa G. Mehta, and Dipti P. Rana
Elsevier BV
Navodita Saini and Dipti P. Rana
Springer Nature Singapore
Vaishnavee V. Rathod, Dipti P. Rana, and Rupa G. Mehta
IEEE
Remote sensing images (RSI) are significant data to examine and observe complete structure on the Earth’s surface. RSI classification has gained significant attention in earth observation technologies, commonly employed in military and civil fields. It becomes a challenging process because of high dimensional features and small amount of labeled data. Advancements in machine learning (ML) and deep learning (DL) models are capable in effective RSI classification. Numerous research is going on in RSI detection and classification area using ML and DL models. In this view, this article focuses on the review of recently developed RSI classification models. A brief introduction to RSI, types, characteristics and challenging issues is given. By a meta-analysis, different approaches related to the RSI classification models are identified and summarized with key findings. Besides, this survey covers the recently developed ML and DL based RSI classification models with their major aim, methodology used, merits, and demerits. At last, a concluding remark related to the present state of art approaches with possible future scope is discussed.
Anjali S. More and Dipti P. Rana
The Science and Information Organization
—In the current age, the attention of researchers is immersed by numerous imbalanced data applications. These application areas are intrusion detection in security, fraud recognition in finance, medical applications dealing with disease diagnosis pilfering in electricity, and many more. Imbalanced data applications are categorized into two types: binary and multiclass data imbalance. Unequal data distribution among data diverts classification performance metrics towards the majority data instance class and ignores the minority data, instance class. Data imbalance leads to an increase in the classification error rate. Random Forest Classification (RFC) is best suitable technique to deal with imbalanced datasets. This paper proposes the novel oversampling rate calculation algorithm as Improvised Dynamic Binary-Multiclass Imbalanced Oversampling Rate (IDBMORate). Experimentation analysis of the proposed novel approach IDBMORate on Page-block (Binary) dataset shows that instances of positive class is increased from 559 to 1118 whereas negative instance class remains same as 4913. In case of referred multiclass dataset (Ecoli), IDBMORate produces the consistent result as minority classes (om, omL, imS, imL) instances are oversampled majority class instances remains unchanged. IDBMORate algorithm reduces the ignorance of minority class and oversamples its data without disturbing the size of the majority instance class. Thus, it reduces the overall computation cost and leads towards the improvisation of classification performance.
Anjali More and Dipti Rana
Emerald
Purpose Referred data set produces reliable information about the network flows and common attacks meeting with real-world criteria. Accordingly, this study aims to focus on the use of imbalanced intrusion detection benchmark knowledge discovery in database (KDD) data set. KDD data set is most preferably used by many researchers for experimentation and analysis. The proposed algorithm improvised random forest classification with error tuning factors (IRFCETF) deals with experimentation on KDD data set and evaluates the performance of a complete set of network traffic features through IRFCETF. Design/methodology/approach In the current era of applications, the attention of researchers is immersed by a diverse number of existing time applications that deals with imbalanced data classification (ImDC). Real-time application areas, artificial intelligence (AI), Industrial Internet of Things (IIoT), etc. are dealing ImDC undergo with diverted classification performance due to skewed data distribution (SkDD). There are numerous application areas that deal with SkDD. Many of the data applications in AI and IIoT face the diverted data classification rate in SkDD. In recent advancements, there is an exponential expansion in the volume of computer network data and related application developments. Intrusion detection is one of the demanding applications of ImDC. The proposed study focusses on imbalanced intrusion benchmark data set, KDD data set and other benchmark data set with the proposed IRFCETF approach. IRFCETF justifies the enriched classification performance on imbalanced data set over the existing approach. The purpose of this work is to review imbalanced data applications in numerous application areas including AI and IIoT and tuning the performance with respect to principal component analysis. This study also focusses on the out-of-bag error performance-tuning factor. Findings Experimental results on KDD data set shows that proposed algorithm gives enriched performance. For referred intrusion detection data set, IRFCETF classification accuracy is 99.57% and error rate is 0.43%. Research limitations/implications This research work extended for further improvements in classification techniques with multiple correspondence analysis (MCA); hierarchical MCA can be focussed with the use of classification models for wide range of skewed data sets. Practical implications The metrics enhancement is measurable and helpful in dealing with intrusion detection systems–related imbalanced applications in current application domains such as security, AI and IIoT digitization. Analytical results show improvised metrics of the proposed approach than other traditional machine learning algorithms. Thus, error-tuning parameter creates a measurable impact on classification accuracy is justified with the proposed IRFCETF. Social implications Proposed algorithm is useful in numerous IIoT applications such as health care, machinery automation etc. Originality/value This research work addressed classification metric enhancement approach IRFCETF. The proposed method yields a test set categorization for each case with error reduction mechanism.
Mitali Desai, Rupa G Mehta, and Dipti P Rana
SAGE Publications
Influence analysis, derived from Social Network Analysis (SNA), is extremely useful in academic literature analytic. Different Academic Social Network Sites (ASNS) have been widely examined for influence analysis in terms of co-authorship and co-citation networks. The impact of other network-based features, such as followers and followings, provided by ASNS such as ResearchGate (RG) and Academia is yet to be anatomised. As proven in ingrained social theories, the followers and followings have significant impact in influence prorogation. This research aims at examining the same in one of the widely adopted ASNS, RG. The rendering process is developed to render real-time RG information, which is modelled into graph. Standard centrality measures are implemented to identify influential users from the constructed RG graph. Each centrality measure gives a list of top- k influential RG users. The results are compared with RGScore and Total Research Interest (TRI) to discover the most effective centrality measure. Betweenness and closeness centrality measures have shown the outperforming results compared with others. A procedure is established to discover influential RG users that are commonly present in all top- k centrality results to identify dominant skills, affiliations, departments and locations from the rendered data.
Anjali S. More and Dipti P. Rana
Elsevier BV