@snuniv.ac.in
Professor, Dept. of Computer Science & Engineering
Sister Nivedita University
Dr. Soumadip Ghosh presently acts as a Professor in the Department of Computer Science and Engineering at Sister Nivedita University, West Bengal, India. He has more than eighteen years of teaching and research experience. Previously, he had served as a Professor at the Institute of Engineering and Management, Kolkata. He also served as an Associate Professor in the Academy of Technology (Hooghly, WB, India) and as an Assistant Professor in the College of Engineering and Management (Kolaghat, WB, India), respectively.
Dr. Ghosh earned his PhD (Engineering) degree from the University of Kalyani, West Bengal, India, in 2017. He received his M. Tech (CSE) and B. Tech (CST) degrees from the University of Calcutta and the University of Kalyani in 2005 and 2002, respectively. He is a noted author in Java technology and has authored two books in that domain.
His research interests include Data Mining, Machine Learning, Deep Learning, Java and Web technologies, Cloud Computing, and the Internet of Things (IoT).
Scopus Publications
Scholar Citations
Scholar h-index
Scholar i10-index
Subhash Mondal, Ranjan Maity, Yachang Omo, Soumadip Ghosh, and Amitava Nag
Institute of Electrical and Electronics Engineers (IEEE)
Cardiovascular diseases (CVDs) continue to be a prominent cause of global mortality, necessitating the development of effective risk prediction models to combat the rise in heart disease (HD) mortality rates. This work presents a novel dual-stage stacked machine learning (ML) based computational risk prediction model for cardiac disorders. Leveraging a dataset that includes eleven significant characteristics from 1190 patients from five distinct sources, five ML classifiers are utilized to create the initial prediction model. To ensure robustness and generalizability, the classifiers are cross-validated ten times. The model performance is optimized by employing two hyperparameter tuning approaches: RandomizedSearchCV and GridSearchCV. These methods aim to find the optimal estimator values. The highest-performing models, specifically Random Forest, Extreme Gradient Boost, and Decision Tree undergo additional refinement using a stacking ensemble technique. The stacking model, which leverages the capabilities of the three models, attains a remarkable accuracy rate of 96%, a recall value of 0.98, and a ROC-AUC score of 0.96. Notably, the rate of false-negative results is below 1%, demonstrating a high level of accuracy and a non-overfitted model. To evaluate the model’s stability and repeatability, a comparable dataset consisting of 1000 occurrences is employed. The model consistently achieves an accuracy of 96.88% under identical experimental settings. This highlights the strength and dependability of the suggested computer model for predicting the risk of cardiac illnesses. The outcomes indicate that employing this two-step stacking ML method shows potential for prompt and precise diagnosis, hence aiding the worldwide endeavor to decrease fatalities caused by heart disease.
Chandra Kishore, Vaishali Ji, Saurav Mallik, Ayan Mukherji, Namrata Tomar, Kumar Pati Soumen, Ai Min Li, Sinthia Roy Banerjee, Soumadip Ghosh and Raza Ali Naqvi
Subhash Mondal, Soumadip Ghosh, and Amitava Nag
Springer Science and Business Media LLC
Subhash Mondal, Souptik Dutta, Soumadip Ghosh, Sarbartha Gupta, Dhrubajit Kakati, and Amitava Nag
IEEE
The thyroid gland plays a significant role in the human body's metabolism, growth, and development. Though it is not a life-threatening disease, a person suffering from thyroid faces many complications in their daily life. Recent trends have shown that women suffer more from thyroid-related diseases than men. The many contributing factors that lead to thyroid disease may be controlled upon early diagnosis stages. Machine learning prediction models help healthcare professionals diagnose thyroid diseases at an initial stage and take measures accordingly. This study deployed initial Sixteen ML models, including six boosting algorithms, on a dataset of 9172 instances with related features. The model performances have been judged through various standard performance metrics. The boosting algorithms showed exceptional results, and Cat Boost (CB) model produced the best accuracy of 95.75%. The hyperparameter tuning performed on boosting models by implementing Randomized Search CV increased the accuracy to 96.19% for CB. The stacking ensemble approach was applied on top of the six boosting tuned models with the CB classifier as the meta-learner. At the same time, the other boosting algorithms were kept as a base learner for the final model prediction. The accuracy of the stack model was impressive, with 95.32% compared with default models, the ROC-AUC at 0.95, and the other results were also promising. The model’s standard deviation was significantly less at 0.57, implying the model’s stability and robustness, and the False Negative (FN) rate reached 1.8%.
Prithwineel Paul, Soumadip Ghosh, and Arpita Mandal
IEEE
DNA molecules are the building blocks of life. In the last few decades, the fields of biotechnology and molecular biology have made significant progress. Computer scientists have used the storage capacity and massive parallelism of DNA molecules in order to build computing devices and investigated the computing powers of these models and their efficiency in solving computational hard problems. In this paper, we investigate Szilard / control languages of a new type of computing model where the underlying model is based on the working of DNA molecules. DNA molecules are double-stranded and contain chains of nucleotides. These nucleotides are characterized into four types, i.e., A (adenine), T (thymine), G (guanine) and C (cytosine) based on their chemical bases. Furthermore, these nucleotides follow Watson-Crick complementary, i.e., A-T and G-C pairing. We derive the Szilard / control languages of Watson-crick (WK) grammars. Also, we compare the family of Szilard / control languages of these systems with the family of languages such as REG, CF and RE.
Saurav Mallik, Anasua Sarkar, Sagnik Nath, Ujjwal Maulik, Supantha Das, Soumen Kumar Pati, Soumadip Ghosh, and Zhongming Zhao
Frontiers Media SA
In this current era, biomedical big data handling is a challenging task. Interestingly, the integration of multi-modal data, followed by significant feature mining (gene signature detection), becomes a daunting task. Remembering this, here, we proposed a novel framework, namely, three-factor penalized, non-negative matrix factorization-based multiple kernel learning with soft margin hinge loss (3PNMF-MKL) for multi-modal data integration, followed by gene signature detection. In brief, limma, employing the empirical Bayes statistics, was initially applied to each individual molecular profile, and the statistically significant features were extracted, which was followed by the three-factor penalized non-negative matrix factorization method used for data/matrix fusion using the reduced feature sets. Multiple kernel learning models with soft margin hinge loss had been deployed to estimate average accuracy scores and the area under the curve (AUC). Gene modules had been identified by the consecutive analysis of average linkage clustering and dynamic tree cut. The best module containing the highest correlation was considered the potential gene signature. We utilized an acute myeloid leukemia cancer dataset from The Cancer Genome Atlas (TCGA) repository containing five molecular profiles. Our algorithm generated a 50-gene signature that achieved a high classification AUC score (viz., 0.827). We explored the functions of signature genes using pathway and Gene Ontology (GO) databases. Our method outperformed the state-of-the-art methods in terms of computing AUC. Furthermore, we included some comparative studies with other related methods to enhance the acceptability of our method. Finally, it can be notified that our algorithm can be applied to any multi-modal dataset for data integration, followed by gene module discovery.
Supantha Das, Soumadip Ghosh, Saurav Mallik, and Guimin Qin
CRC Press
Soumadip Ghosh, Suharta Banerjee, Supantha Das, Arnab Hazra, Saurav Mallik, Zhongming Zhao, and Ayan Mukherji
MDPI AG
Accurate detection of an individual’s coronavirus disease 2019 (COVID-19) status has become critical as the COVID-19 pandemic has led to over 615 million cases and over 6.454 million deaths since its outbreak in 2019. Our proposed research work aims to present a deep convolutional neural network-based framework for the detection of COVID-19 status from chest X-ray and CT scan imaging data acquired from three benchmark imagery datasets. VGG-19, ResNet-50 and Inception-V3 models are employed in this research study to perform image classification. A variety of evaluation metrics including kappa statistic, Root-Mean-Square Error (RMSE), accuracy, True Positive Rate (TPR), False Positive Rate (FPR), Recall, precision, and F-measure are used to ensure adequate performance of the proposed framework. Our findings indicate that the Inception-V3 model has the best performance in terms of COVID-19 status detection.
Subhash Mondal, Ranjan Maity, Yash Raj Singh, Soumadip Ghosh, and Amitava Nag
IEEE
Coronary-Heart-Disease (CHD) risk increases daily due to the uncontrolled lifestyle of today's adult age group. The early detection of the disease can prevent unfortunate death due to heart-related complications. The Machine Learning (ML) technique is essential for the early diagnosis of CHD and for identifying its many contributing factor variables. To build the prediction model, we have used the dataset consisting of 4240 instances and 15 related features to predict the possibility of future risk of CHD in the next ten years. Initially, thirteen ML models were deployed with 10-fold cross-validation, reflecting the highest test accuracy of 91.28% for the Random Forest (RF) classifier. The models were turned further, and the boosting algorithms showed the highest accuracy of 91 % and above; the Gradient Boost (GB) classifier performed better with an accuracy of 92.11 %. The voting ensemble approaches using the best-performing boosting models, namely GB, HGB, XGB, CB, and LGBM, have been considered for the final prediction. The prediction results reflected an accuracy of 92.26%, an F1 score of 91.25%, a ROC-AUC score of 0.917, and the number of False Negatives (FN) values is about 6.25% of the total test dataset.
Supantha Das, Arnab Hazra, Soumen Kumar Pati, Soumadip Ghosh, Saurav Mallik, Suharta Banerjee, Ayan Mukherji, Aimin Li, and Zhongming Zhao
IEEE
Chronic kidney disease (CKD) is as severe as cancer in today s world. It may even lead to the permanent failure of kidney. The initial detection of this disease is needed for timely cure. Our work presents a classifier (named ANFIS) in accordance with the notion of neuro-fuzzy in order to detect the existence of chronic kidney disease. We use blood test results of several patients for our research study. We compare our proposed classifier with some conventional classifiers such as Multi-layer Perceptron, Support Vector Machine, Logistic Regression and Decision Tree. Experimental results indicates that our proposed neuro-fuzzy rule-based classifier performs better than the other classifiers used here. ANFIS has given 3% to 4% better accuracy compared to the other classifiers.
Prithwineel Paul and Soumadip Ghosh
IEEE
In this paper, we investigate the family of languages generated by the labels of the homogeneous variant of spiking neural P systems with structural plasticity (HSNPSSP). SNPSSP is an interesting variant of SNPS where the synapse creation and deletion between the neurons are taken into account. The basic structure and the functioning of HSNPSSP is similar to SNPSSP with an exception. In HSNPSSP, all the neurons have similar types of rules as well as the same number of rules. The labels associated with the rules in HSNPSSP work as a new tool for the generation of languages. We show that any regular language is a label language of HSNPSSP with only one rule in each neuron. Furthermore, any recursively enumerable language is a label language of HSNPSSP with 6 rules in each neuron.
Soumadip Ghosh, Shayak Sadhu, Sushanta Biswas, Debasree Sarkar, and Partha Pratim Sarkar
Univ. of Malaya
Soumadip Ghosh, Arnab Hazra, Bikramjit Choudhury, Payel Biswas, and Amitava Nag
Springer International Publishing
S. Ghorai
CRC Press
Soumadip Ghosh, Debasish Biswas, Sushanta Biswas, Debasree Chanda Sarkar, and Partha Pratim Sarkar
Institute of Electrical and Electronics Engineers (IEEE)
In this paper, we propose a neuro-fuzzy (NF) classification technique to determine various soil classes from large imagery soil databases. The technique looks at the feature-wise degree of belongings of the imagery databases to obtainable soil classes using a fuzzification method. The fuzzification method builds a membership matrix with an element count equal to the mathematical product of the number of data records and soil classes present. The elements of this matrix are the input to a neural network model. We apply our technique to three UCI databases, namely, Statlog Landsat Satellite, Forest Covertype, and Wilt for soil classification. The paper aims to find out soil classes using the proposed technique, and then compare its performance with four well-known classification algorithms, namely, radial basis function network, k-nearest neighbor, support vector machine, and adaptive NF inference system. Numerous measures, for example, root-mean-square error, kappa statistic, accuracy, false positive rate, true positive rate, precision, recall, F-measure, and area under the curve, are used for evaluating the quantitative analysis of the simulated results. All these evaluation measures approve the supremacy of the proposed NF method.
Soumadip Ghosh, Sushanta Biswas, Debasree Chanda Sarkar, and Partha Pratim Sarkar
Indian Society for Education and Environment
Background/Objectives: Detection and analysis of critical diseases such as breast cancer is a significant domain of data mining analysis and research. In this research study, we propose a neuro-fuzzy classification method for breast cancer detection. Methods/Statistical Analysis: The proposed neuro-fuzzy method considers the pattern-wise degree of memberships of breast cancer databases to the existing data classes that are accomplished using a fuzzification method. The method produces a membership matrix with an element count identical to the product of the number of data records and data classes present. These matrix elements are then used as input to a neural network. Findings: We apply our method to three UCI databases namely WBC, WDBC and MM. The research work aims to recognize breast cancer disease using the proposed method and then compare its performance with two well-known classification algorithms namely Multilayer Perceptron and Support Vector Machine. We use here 10-fold cross validation technique for performing simulation. Different measures, for instance, root-mean-square error, kappa statistic, accuracy, false-positive rate, truepositive rate, precision, recall and f-measure are used to perform numerical analysis of the simulated results. All these evaluation measures support the supremacy of our proposed method. Application/Improvements: The suggested method has great potential in terms of classification capability and predictive power to use in the fields of Medical Science and Bioinformatics.
Soumadip Ghosh, Sushanta Biswas, Debasree Sarkar, and Partha Pratim Sarkar
Elsevier BV
Soumadip Ghosh, Sujoy Mondal, and Bhaskar Ghosh
IEEE
The breast cancer is a severe disease found among females all over the world. This is a type of cancer disease arising from human breast tissue cells, usually from the lobules or the inner lining of the milk ducts that provide the ducts with milk. A recent medical survey reveals that throughout the world breast cancer occurs in 22.9% of all cancers in women and it also causes 13.7% of cancer deaths in them. Breast cancer, being very harmful to all women, may cause loss of breasts or may even cost their life. Diagnosis of breast cancer disease is an important area of data mining research. In our work, different classification techniques are applied on the benchmark Breast Cancer Wisconsin dataset from the UCI machine language repository for detection of breast cancer. Principal component analysis (PCA) technique has been used to reduce the dimension of the dataset. Our objectives is to diagnose and analyze breast cancer disease with the help of two well-known classifiers, namely, MLP using Backpropagation NN (MLP BPN) and Support Vector Machine (SVM) and, thereafter assess their performance in terms of different performance measures like Accuracy, Precision, Recall, F-Measure, Kappa statistic etc.
Soumadip Ghosh, Susanta Biswas, Debasree Sarkar, and P. P. Sarkar
IEEE
Generally frequent itemsets are extracted from large databases by applying association rule mining (ARM) algorithms like Apriori, Partition, Pincer-Search, Incremental, and Border algorithm etc. Genetic Algorithm (GA) can also be applied to discover the frequent patterns from databases. The main advantage of using GA in the discovery of frequent patterns or itemsets is that they can perform global search and its time complexity is lesser compared to other Apriori-based algorithms as because it is based on the greedy approach. But the FP-tree algorithm is considered to be the best among the ARM algorithms, because its candidate sets generation procedure is completely different from Apriori-based algorithms. The major aim of this paper is to present a comparative study among ARM-based and GA-based approaches to data mining.
Soumadip Ghosh, Amitava Nag, Debasish Biswas, Jyoti Prakash Singh, Sushanta Biswas, Debasree Sarkar, and Partha Pratim Sarkar
IEEE
Weather Data Mining is a form of Data mining concerned with finding hidden patterns inside largely available meteorological data, so that the information retrieved can be transformed into usable knowledge. A variety of data mining tools and techniques are available in the industry, but they have been used in a very limited way for meteorological data. In this paper, a neural network-based algorithm for predicting the atmosphere for a future time and a given location is presented. We have used Back Propagation Neural (BPN) Network for initial modelling. The results obtained by BPN model are fed to a Hopfield Network. The performance of our proposed ANN-based method (BPN and Hopfield Network based combined approach) tested on 3 years weather data set comprising 15000 records containing attributes like temperature, humidity and wind speed. The prediction error is found to be very less and the learning converges very sharply. The main focus of this paper is based on predictive data mining by which we can extract interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of meteorological data.
Amitava Nag, Debasish Biswas, Soumadip Ghosh, Sushanta Biswas, Debasree Sarkar, and Partha Pratim Sarkar
Springer Berlin Heidelberg