Subaji M

@vit.ac.in

Professor - Institute for International and Industry Programmes
Vellore Institute of Technology

21

Scopus Publications

Scopus Publications

  • Combining Wi-Fi Fingerprinting and Pedestrian Dead Reckoning to Mitigate External Factors for a Sustainable Indoor Positioning System
    Bhulakshmi Bonthu and Subaji Mohan

    MDPI AG
    Wi-Fi-based indoor positioning systems are becoming increasingly prevalent in digital transitions; therefore, ensuring accurate and robust positioning is essential to supporting the growth in demand for smartphones’ location-based services. The indoor positioning system on a smartphone, which is generally based on Wi-Fi received signal strength (RSS) measurements or the fingerprinting comparison technique, uses the K-NN algorithm to estimate the position due to its high accuracy. The fingerprinting algorithm is popular due to its ease of implementation and its ability to produce the desired accuracy. However, in a practical environment, the Wi-Fi signal strength-based positioning system is highly influenced by external factors such as changes in the environment, human interventions, obstacles in the signal path, signal inconsistency, signal loss due to the barriers, the non-line of sight (NLOS) during signal propagation, and the high level of fluctuations in the RSS, which affects location accuracy. In this paper, we propose a method that combines pedestrian dead reckoning (PDR) and Wi-Fi fingerprinting to select a k-node to participate in the K-NN algorithm for fingerprinting-based IPSs. The selected K-node is used for the K-NN algorithm to improve the robustness and overall accuracy. The proposed hybrid method can overcome practical environmental issues and reduces the KNN algorithm’s complexity by selecting the nearest neighbors’ search space for comparison using the PDR position estimate as the reference position. Our approach provides a sustainable solution for indoor positioning systems, reducing energy consumption and improving the overall environmental impact. The proposed method has potential applications in various domains, such as smart buildings, healthcare, and retail. The proposed method outperforms the traditional KNN algorithm in our experimental condition since its average position error is less than 1.2 m, and provides better accuracy.

  • A Novel Feature Fusion for the Classification of Histopathological Carcinoma Images
    Salini S Nair and M. Subaji

    The Science and Information Organization
    —Breast cancer is a significant global health concern, demanding advanced diagnostic approaches. Although traditional imaging and manual examinations are common, the potential of artificial intelligence (AI) and machine learning (ML) in breast cancer detection remains underexplored. This study proposes a hybrid approach combining image processing and ML methods to address breast cancer diagnosis challenges. The method utilizes feature fusion with gray-level co-occurrence matrix (GLCM), local binary patterns (LBP), and histogram features, alongside an ensemble learning technique for improved classification. Results demonstrate the approach's effectiveness in accurately classifying three carcinoma classes (ductal, lobular, and papillary). The Voting Classifier, an ensemble learning model, achieves the highest accuracy, precision, recall, and F1-scores across carcinoma classes. By harnessing feature extraction and ensemble learning, the proposed approach offers advantages such as early detection, improved accuracy, personalized medicine recommendations, and efficient analysis. Integration of AI and ML in breast cancer diagnosis shows promise for enhancing accuracy, effectiveness, and personalized patient care, supporting informed decision-making by healthcare professionals. Future research and technological advancements can refine AI-ML algorithms, contributing to earlier detection, better treatment outcomes, and higher survival rates for breast cancer patients. Validation and scalability studies are needed to confirm the effectiveness of the proposed hybrid approach. In conclusion, leveraging AI and ML techniques has the potential to revolutionize breast cancer diagnosis, leading to more accurate and personalized detection and treatment. Technology-driven advances can significantly impact breast cancer care and management.

  • An improved hidden behavioral pattern mining approach to enhance the performance of recommendation system in a big data environment
    P. Shanmuga Sundari and M. Subaji

    Elsevier BV
    Abstract The proposed work aims to solve data sparsity problem in the recommendation system. It handles two-level pre-processing techniques to reduce the data size at the item level. Additional resources like items genre, tag, and time are added to learn and analyse the behaviour of the user preferences in-depth. The advantage of the proposed method is to recommend the item, based on user interest pattern and avoid recommending the outdated items. User information are grouped based on similar item genre and tag feature. This effectively handle overlapping conditions that exist on item’s genre, as it has more than one genre at initial level. Further, based on time, it analyses the user non-static interest. Overall it reduces the dimensions which is an initial way to prepare data, to analyse hidden pattern. To enhance the performance, the proposed method utilized Apache’s spark Mllib FP-Growth and association rule mining approach in a distributed environment. To reduce the computation cost of constructing tree in FP-Growth, the candidate data set is stored in matrix form. The experiments were conducted using MovieLens data set. The observed results shows that the proposed method achieves 4% increase in accuracy when compared to earlier methods.

  • NROI based feature learning for automated tumor stage classification of pulmonary lung nodules using deep convolutional neural networks
    Supriya Suresh and Subaji Mohan

    Elsevier BV
    Abstract Identifying the exact pulmonary nodule boundaries in computed tomography (CT) images are crucial tasks to computer-aided detection systems (CADx). Segregation of CT images as benign, malignant and non-cancerous is essential for early detection of lung cancers to improve survival rates. In this paper, a methodology for automated tumor stage classification of pulmonary lung nodules is proposed using an end-to-end learning Deep Convolutional Neural Network (DCNN). The images used in the study were acquired from the Lung Image Database Consortium and Infectious Disease Research Institute (LIDC-IDRI) public repository comprising of 1018 cases. Lung CT images with candidate nodules are segmented into a 52 × 52 pixel nodule region of interest (NROI) rectangle based on four radiologists’ annotations and markings with ground truth (GT) values. The approach aims in analyzing and extracting the self-learned salient features from the NROI consisting of differently structured nodules. DCNN are trained with NROI samples and are further classified according to the tumor patterns as non-cancerous, benign or malignant samples. Data augmentation and dropouts are used to avoid overfitting. The algorithm was compared with the state of art methods and traditional hand-crafted features like the statistical, texture and morphological behavior of lung CT images. A consistent improvement in the performance of the DCNN was observed using nodule grouped dataset and the classification accuracy of 97.8%, the specificity of 97.2%, the sensitivity of 97.1%, and area under the receiver operating characteristic curve (AUC) score of 0.9956 was achieved with reduced low false positives.

  • A comparative study to recognize fake ratings in recommendation system using classification techniques
    P. Shanmuga Sundari and M. Subaji

    IOS Press
    The recommendation system is affected with attacks when the users are given liberty to rate the items based on their impression about the product or service. Some malicious user or other competitors’ try to inject fake rating to degrade the item’s graces that are mostly adored by several users. Attacks in the rating matrix are not executed just by a single profile. A group of users profile is injected into rating matrix to decrease the performance. It is highly complex to extract the fake ratings from the mixture of genuine profile as it resides the same pattern. Identifying the attacked profile and the target item of the fake rating is a challenging task in the big data environment. This paper proposes a unique method to identify the attacks in collaborating filtering method. The process of extracting fake rating is carried out in two phases. During the initial phase, doubtful user profile is identified from the rating matrix. In the following phase, the target item is analysed using push attack count to reduce the false positive rates from the doubtful user profile. The proposed model is evaluated with detection rate and false positive rates by considering the filler size and attacks size. The experiment was conducted with 6%, 8% and 10% filler sizes and with different attack sizes that ranges from 0%–100%. Various classification techniques such as decision tree, logistic regression, SVM and random forest methods are used to classify the fake ratings. From the results, it is witnessed that SVM model works better with random and bandwagon attack models at an average of 4% higher accuracy. Similarly the decision tree method performance better at an average of 3% on average attack model.

  • ROI-based feature learning for efficient true positive prediction using convolutional neural network for lung cancer diagnosis
    Supriya Suresh and Subaji Mohan

    Springer Science and Business Media LLC
    Convolutional neural network (CNN) is one of the deep structured algorithms widely applied to analyze the ability to visualize and extract the hidden texture features of image datasets. The study aims to automatically extract the self-learned features using an end-to-end learning CNN and compares the results with the conventional state-of-art and traditional computer-aided diagnosis system’s performance. The architecture consists of eight layers: one input layer, three convolutional layers and three sub-sampling layers intercepted with batch normalization, ReLu and max-pooling for salient feature extraction, and one fully connected layer that uses softmax function connected to 3 neurons as output layer, classifying an input image into one of three classes categorized as nodules $$\\ge$$ ≥  3 mm as benign (low malignancy nodules), malignant (high malignancy nodules), and nodules < 3 mm and non-nodules $$\\ge$$ ≥  3 mm combined as non-cancerous. For the input layer, lung nodule CT images are acquired from the Lung Image Database Consortium public repository having 1018 cases. Images are pre-processed to uniquely segment the nodule region of interest (NROI) in correspondence to four radiologists’ annotations and markings describing the coordinates and ground-truth values. A two-dimensional set of re-sampled images of size 52  $$\\times$$ ×  52 pixels with random translation, rotation, and scaling corresponding to the NROI are generated as input samples. In addition, generative adversarial networks (GANs) are employed to generate additional images with similar characteristics as pulmonary nodules. CNNs are trained using images generated by GAN and are fine-tuned with actual input samples to differentiate and classify the lung nodules based on the classification strategy. The pre-trained and fine-tuned process upon the trained network’s architecture results in aggregate probability scores for nodule detection reducing false positives. A total of 5188 images with an augmented image data store are used to enhance the performance of the network in the study generating high sensitivity scores with good true positives. Our proposed CNN achieved the classification accuracy of 93.9%, an average specificity of 93%, and an average sensitivity of 93.4% with reduced false positives and evaluated the area under the receiver operating characteristic curve with the highest observed value of 0.934 using the GAN generated images.

  • Integrating Sentiment Analysis on Hybrid Collaborative Filtering Method in a Big Data Environment
    P. Shanmuga Sundari and M. Subaji

    World Scientific Pub Co Pte Lt
    Most of the traditional recommendation systems are based on user ratings. Here, users provide the ratings towards the product after use or experiencing it. Accordingly, the user item transactional database is constructed for recommendation. The rating based collaborative filtering method is well known method for recommendation system. This system leads to data sparsity problem as the user is unaware of other similar items. Web cataloguing service such as tags plays a significant role to analyse the user’s perception towards a particular product. Some system use tags as additional resource to reduce the data sparsity issue. But these systems require lot of specific details related to the tags. Existing system either focuses on ratings or tags based recommendation to enhance the accuracy. So these systems suffer from data sparsity and efficiency problem that leads to ineffective recommendations accuracy. To address the above said issues, this paper proposed hybrid recommendation system (Iter_ALS Iterative Alternate Least Square) to enhance the recommendation accuracy by integrating rating and emotion tags. The rating score reveals overall perception of the item and emotion tags reflects user’s feelings. In the absence of emotional tags, scores found in rating is assumed as positive or negative emotional tag score. Lexicon based semantic analysis on emotion tags value is adopted to represent the exclusive value of tag. Unified value is represented into Iter_ALS model to reduce the sparsity problem. In addition, this method handles opinion bias between ratings and tags. Experiments were tested and verified using a benchmark project of MovieLens dataset. Initially this model was tested with different sparsity levels varied between 0%-100 percent and the results obtained from the experiments shows the proposed method outperforms with baseline methods. Further tests were conducted to authenticate how it handles opinion bias by users before recommending the item. The proposed method is more capable to be adopted in many real world applications

  • An effective algorithm to overcome the practical hindrance for Wi-Fi based indoor positioning system
    Bhulakshmi Bonthu and M Subaji

    Walter de Gruyter GmbH
    AbstractIndoor tracking has evolved with various methods. The most popular method is using signal strength measuring techniques like triangulation, trilateration and fingerprinting, etc. Generally, these methods use the internal sensors of the smartphone. All these techniques require an adequate number of access point signals. The estimated positioning accuracy depends on the number of signals received at any point and precision of its signal (Wi-Fi radio waves) strength. In a practical environment, the received signal strength indicator (RSSI) of the access point is hindered by obstacles or blocks in the direct path or Line of sight. Such access points become an anomaly in the calculation of position. By detecting the anomaly access points and neglecting it during the computation of an indoor position will improve the accuracy of the positioning system. The proposed method, Practical Hindrance Avoidance in an Indoor Positioning System (PHA-IPS), eliminate the anomaly nodes while estimating the position, so then enhances the accuracy.


  • Aspect level sentiment analysis in deep learning technique using CNN


  • A method for empowering native mobile app data discovery
    Bhulakshmi Bonthu and M Subaji

    American Scientific Publishers

  • Web user interface based on OGC standards for sensor cloud using big data
    Vijayasherly Velayutham, Srimathi Chandrasekaran, and Subaji Mohan

    Inderscience Publishers

  • A survey on effective similarity search models and techniques for big data processing in healthcare system
    P. Shanmuga Sundari, M. Subaji, and J. Karthikeyan

    Diva Enterprises Private Limited
    In traditional DBMS system handled well structured and no two elements occur twice. But more than one occurrence is quite natural in big data processing. Moreover last decades many characteristics (like volume, variety, value) coupled with the data, makes the searching complex for the traditional database system. Effective way of storing the data makes it easier way to processes the data. The main objective of this paper is to find similarity over large data that needs effective and efficient processing of raw data within a satisfactory response time.

  • Analysis of sensor middleware using big data as a cloud computing utility
    Srimathi Chandrasekaran, Rajesh Natarajan, and Subaji Mohan

    Inderscience Publishers

  • Big data analytics in healthcare system for diverse perspectives


  • Survey on health cloud characteristics
    Srimathi Chandrasekaran, Subaji Mohan, and Rajesh Natarajan

    Springer Science and Business Media LLC
    Migration of the healthcare applications to cloud computing needs certain features to be fulfilled for complete user satisfaction. The users quench for satisfaction can be fulfilled by attaining the cloud computing characteristics such as data storage, accessibility, information sharing, interoperability, security, service availability, scalability, resiliency, resource management, and monitoring. The work is a survey on the cloud computing characteristics and its implementation in healthcare applications.

  • A study on Distributed Data Mining frameworks


  • Domain Specific Modeling of business processes and entity mapping using generic modeling environment (GME)
    Subaji Mohan, Eunmi Choi, and Dugki Min

    IEEE
    Designing business process is very complex as it involves activities, resources, products and tools. Adopting the concept of domain specific modeling helps creating and using business processes, packaged as services and enable reusability, loose coupling, higher abstraction, agility, and interoperability. Domain specific modeling can be used to design the business processes as it increases the business centric value as well the productivity and quality. In this paper, GME tool is used for creating DSM of business process and also discussed how business processes are captured and defined at domain level. Also the design of business process and entity meta-model's and its interpretation to generate the Input model is shown in detail. GReAT Tool is used for applying configuration and transformation rules on the imported models. Finally full code generation is shown.

  • A taxonomy and survey on distributed file systems
    Tran Doan Thanh, Subaji Mohan, Eunmi Choi, SangBum Kim, and Pilsung Kim

    IEEE
    Applications that process large volumes of data (such as, search engines, grid computing applications, data mining applications, etc.) require a backend infrastructure for storing data. The distributed file system is the central component for storing data infrastructure. There have been many projects focused on network computing that have designed and implemented distributed file systems with a variety of architectures and functionalities. In this paper, we develop a comprehensive taxonomy for describing distributed file system architectures and use this taxonomy to survey existing distributed file system implementations in very large-scale network computing systems such as Grids, Search Engines, etc. We use the taxonomy and the survey results to identify architectural approaches that have not been fully explored in the distributed file system research.

  • Conceptual modeling of enterprise application system using social networking and web 2.0 "social CRM system"
    Subaji Mohan, Eunmi Choi, and Dugki Min

    IEEE
    Information technology is moving towards the next generation and is attaining a new paradigm shift with the development of new technologies and its impact towards application development are inevitable. In particular, the focus on Web services (Web 2.0) are increasing and currently it is been considered as a platform rather than service, due to its capability of data integration and working jointly with other applications. This has helped the enterprise to have new dimension to their application development. Similarly, social networking is playing a crucial role and it provides organizations with the critical data to build strong relationships with their customers and partners. In this paper, we have proposed the conceptual model of social CRM system, combining the features of Web 2.0 and social networking with the current CRM system. This will certainly provide a new direction and currently organizations have started towards this direction.