@klnce.edu
Associate Professor / Department of Computer Science and Engineering
K.L.N. College of Engineering
Ph.D.,
Information Retrieval, Algorithms, Data Mining and Agriculture
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
R. Lakshmi and S. Baskar
Inderscience Publishers
R. Lakshmi and S. Baskar
Elsevier BV
Abstract Weighting and normalization are the most important factor that may affect the text representation significantly. This paper presents two novel term weighting schemes to represent text documents, namely, i). Term-weighting scheme for document representation based on Term Frequency - Ranking of Term Frequency (TF-RTF) and ii). Term-weighting scheme for document representation based on Term Frequency - Ranking of fuzzy logic with semantic relationship of terms (TF-RFST). The ranking of each term in a document provides its priority of the document and uses these priorities for document representation in TF-RTF. In TF-RFST, each term is represented based on its frequency and the frequency of semantic related terms for that term. Hence, the ranking of each term is based on the combined frequencies of the term and its semantic related terms with a specific weighting scheme. With appropriate weighting schemes such as TF-RFT and TF-RFST, the proposed methods provide better clustering performance in terms of accuracy, entropy, recall and F-Measure than previously suggested methods, such as word count, Term Frequency-Inverse Document Frequency (TF-IDF), Term Frequency-Inverse Corpus Frequency (TF-ICF), Multi Aspect TF (MATF), BM25 and BM25F. Experiments carried out on the Reuters-8, Reuters-52 and WebKB data sets with K-means and K-means++ clustering algorithms for demonstrate the effectiveness of the proposed term weighting schemes.
R Lakshmi and S Baskar
SAGE Publications
In this article, a new initial centroid selection for a K-means document clustering algorithm, namely, Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means (DIC-DOC- K-means), to improve the performance of text document clustering is proposed. The first centroid is the document having the minimum standard deviation of its term frequency. Each of the other subsequent centroids is selected based on the dissimilarities of the previously selected centroids. For comparing the performance of the proposed DIC-DOC- K-means algorithm, the results of the K-means, K-means++ and weighted average of terms-based initial centroid selection + K-means (Weight_Avg_Initials + K-means) clustering algorithms are considered. The results show that the proposed DIC-DOC- K-means algorithm performs significantly better than the K-means, K-means++ and Weight_Avg_Initials+ K-means clustering algorithms for Reuters-21578 and WebKB with respect to purity, entropy and F-measure for most of the cluster sizes. The cluster sizes used for Reuters-8 are 8, 16, 24 and 32 and those for WebKB are 4, 8, 12 and 16. The results of the proposed DIC-DOC- K-means give a better performance for the number of clusters that are equal to the number of classes in the data set.
Rayasam Lakshmi Satya, R. Kaviya, and R. Valarmathi
IEEE
An Effective Vehicular on board accident detection system aimed at helping the injured on accident of the two-wheeler leading to a near fatal accident. The accident is detected and if fatal. the location of the accident is triangulated and emergency services are alerted either as a normal message or as an app notification. The usage of cloud technology provides further application as it can be used to identify accident hot spots by identifying it using machine learning.
K. Prabhavathi and R. Lakshmi
Indian Society for Education and Environment
Background/Objectives: Lung cancer is one of the mostly deadliest cancers across the world. Various approaches have been used for diagnosis of lung cancer. This paper surveys various approaches used for lung cancer diagnosis. Methods/Statistical Analysis: This paper classifies techniques in the following ways, 1) Data mining approach, 2) Medical approach, 3) Biophotonic imaging approach. Also discusses the various pros and cons of these approaches. Findings: This paper surveys the different approaches used for lung cancer diagnosis. Improvements/Applications: It provides the efficient way for early detection of lung cancer. It reduces the death rate and increases the survival rate.
T. Preethi and R. Lakshmi
IEEE
The NSFC is the largest government funding agency in China, with the primary aim to fund and manage basic research. The agency is made up of seven scientific departments, four bureaus, one general office, and three associated units. The scientific departments are the decision-making units responsible for funding recommendations and management of funded projects. Selection of research projects is an important and recurring activity in many organizations such as government research funding agencies. Current method of grouping proposals are based on manual matching of similar research discipline areas but it fails to be accurate. Text clustering methods those are not having semantic approach provide less accuracy. A novel ontology based text mining approach to cluster proposals is proposed. Research project selection is an important task for government and private research funding agencies. When a large number of research proposals are received, it is common to group them according to their similarities in research disciplines. The grouped proposals are then assigned to the appropriate experts for peer review. The review results are collected, and the proposals are then ranked based on the aggregation of the experts' review results.