Sobhan Sarkar

@iimranchi.ac.in

Assistant Professor, Information Systems & Business Analytics
Indian Institute of Management Ranchi



                             

https://researchid.co/sobhan.sarkar

I am currently serving as an Assistant Professor at IIM Ranchi, India in Information Systems & Business Analytics area. I served as a Post-doctoral Fellow (PDF) in the Management Science Division of Business School at the University of Edinburgh, UK. I received my Ph.D. degree from the Department of Industrial & Systems Engineering of IIT Kharagpur (India), and both ME and BE degrees from the Department of Production Engineering of Jadavpur University (India). My domain of research includes theoretical improvement and applications of data analytics using machine learning (ML), data mining (DM), and Operations Research (OR) approaches. So far, I have published 19 journal papers, 21 book chapters, and 23 conference papers. I have been serving as a reviewer in 48 peer-reviewed top-tier journals, including Information Sciences, Automation in Construction, Applied Soft Computing, International Journal of Industrial Ergonomics, Computers & Industrial Engineering, and Safety Science.

EDUCATION

Ph.D. (Jul, 2014 - Aug, 2019) - Department of Industrial & Systems Engineering, IIT Kharagpur, India

M.E. (Aug, 2012 - May, 2014) - Department of Production Engineering, Jadavpur University, India.

B.E. (Aug, 2005 - May, 2009) - Department of Production Engineering, Jadavpur University, India.

RESEARCH INTERESTS

Information Systems, Machine Learning, Operations Research

60

Scopus Publications

1425

Scholar Citations

22

Scholar h-index

36

Scholar i10-index

Scopus Publications





  • A Two-phase Approach to Determine User-Preference and Feature Importance in Pricing of Cryptocurrencies using Twitter Data
    Saptashwa Maity, Soujatya Khan, and Sobhan Sarkar

    IEEE
    Twitter is one of the best places to learn how people feel about current affairs related to the forecast of cryptocurrencies. There have been unavailability of any robust models for determining the user-preference of the major cryptocurrencies in the market on the basis of the tweets made by the major investors. Morever, the statistical dependency and the feature importance of the various contributing features in determining the prices of these cryptocurrencies has been a major concern too. This study proposes a novel two-phase robust hybrid approach for determining both user preference and feature importance of various contributing features on pricing of top cryptocurrencies in the market. It determines the user-preference on the basis of the subjectivity and polarity score obtained from the sentiment polarity classification of the tweets. It also determines both statistical dependency and feature importance of the various contributing features involved in pricing of the top cryptocurrencies in the market with the help of SHapley Additive exPlanations (SHAP) score. We have used two types of datasets for our detailed study. Additionally, knowledge graphs have been used to describe the capacity to recognise semantic data. With accuracy, precision, recall, F-1 score, and AUC-ROC values of 92%, 88%, 85%, 95%, and 94%, respectively our proposed approach outperformed conventional machine learning techniques.

  • Quantifying data imbalance using Exponential f-Divergence
    Sobhan Sarkar and Anima Pramanik

    IEEE
    In this study, a new measure of imbalance is introduced in order to compute the extent of imbalance for multiclass data. In the case of binary datasets, the Imbalance Ratio (IR) can be used to measure the amount of imbalance. However in the case of multi-class datasets, since it only takes into account the frequency of the most frequent majority class and the least frequent minority class, it fails to encapsulate any properties from the intermediate classes. An imbalance Degree (ID) was proposed to overcome the issues of IR by considering information from the intermediate classes as well. Nevertheless, it required us to choose a distance metric that largely influenced the results and could lead to unfavorable results. It is also assumed that the number of minority classes impacted the extent of the imbalance without considering their individual contributions, which is not correct. Thus, ID cannot be chosen as an authentic metric if this assumption is breached. Furthermore, another metric called Likelihood Ratio Imbalance Degree (LRID) was proposed to make the metric independent of the number of minority classes in the data. However, it considered the imbalance to be directional and assumed both positive and negative values for individual contributions from classes. In this study, we obtain a more authentic procedure to measure the extent of imbalance extent using statistical divergence from balanced class distributions.

  • Risk Modeling Framework for Strategic and Operational Intervention to Enhance the Effectiveness of a Closed-Loop Supply Chain
    Shisam Bhattacharyya, Sobhan Sarkar, Bishal Dey Sarkar, and Ramkrishna Manatkar

    Institute of Electrical and Electronics Engineers (IEEE)

  • Artificial intelligence-driven supply chain resilience in Vietnamese manufacturing small- and medium-sized enterprises
    Prasanta Kumar Dey, Soumyadeb Chowdhury, Amelie Abadie, Emilia Vann Yaroson, and Sobhan Sarkar

    Informa UK Limited




  • Classification and pattern extraction of incidents: a deep learning-based approach
    Sobhan Sarkar, Sammangi Vinay, Chawki Djeddi, and J. Maiti

    Springer Science and Business Media LLC
    AbstractClassifying or predicting occupational incidents using both structured and unstructured (text) data are an unexplored area of research. Unstructured texts, i.e., incident narratives are often unutilized or underutilized. Besides the explicit information, there exist a large amount of hidden information present in a dataset, which cannot be explored by the traditional machine learning (ML) algorithms. There is a scarcity of studies that reveal the use of deep neural networks (DNNs) in the domain of incident prediction, and its parameter optimization for achieving better prediction power. To address these issues, initially, key terms are extracted from the unstructured texts using LDA-based topic modeling. Then, these key terms are added with the predictor categories to form the feature vector, which is further processed for noise reduction and fed to the adaptive moment estimation (ADAM)-based DNN (i.e., ADNN) for classification, as ADAM is superior to GD, SGD, and RMSProp. To evaluate the effectiveness of our proposed method, a comparative study has been conducted using some state-of-the-arts on five benchmark datasets. Moreover, a case study of an integrated steel plant in India has been demonstrated for the validation of the proposed model. Experimental results reveal that ADNN produces superior performance than others in terms of accuracy. Therefore, the present study offers a robust methodological guide that enables us to handle the issues of unstructured data and hidden information for developing a predictive model.



  • Parametric and Non-Parametric Analyses for Pedestrian Crash Severity Prediction in Great Britain
    Maria Rella Riccardi, Filomena Mauriello, Sobhan Sarkar, Francesco Galante, Antonella Scarano, and Alfonso Montella

    MDPI AG
    The study aims to investigate the factors that are associated with fatal and severe vehicle–pedestrian crashes in Great Britain by developing four parametric models and five non-parametric tools to predict the crash severity. Even though the models have already been applied to model the pedestrian injury severity, a comparative analysis to assess the predictive power of such modeling techniques is limited. Hence, this study contributes to the road safety literature by comparing the models by their capabilities of identifying the significant explanatory variables, and by their performances in terms of the F-measure, the G-mean, and the area under curve. The analyses were carried out using data that refer to the vehicle–pedestrian crashes that occurred in the period of 2016–2018. The parametric models confirm their advantages in offering easy-to-interpret outputs and understandable relations between the dependent and independent variables, whereas the non-parametric tools exhibited higher classification accuracies, identified more explanatory variables, and provided insights into the interdependencies among the factors. The study results suggest that the combined use of parametric and non-parametric methods may effectively overcome the limits of each group of methods, with satisfactory prediction accuracies and the interpretation of the factors contributing to fatal and serious crashes. In the conclusion, several engineering, social, and management pedestrian safety countermeasures are recommended.

  • Deep Network-based Slow Feature Analysis for Human Fall Detection
    Anima Pramanik, Kavya Venkatagiri, Sobhan Sarkar, and Sankar K. Pal

    IEEE
    One of the most concerning safety hazards for elderly people is abnormal falls in public places. Vision-based fall detection using ambient cameras is a popular non-intrusive solution. Recent research uses Slow Feature Analysis (SFA), which can learn the slow invariant varying shape features obtained from input signals and is efficient. Another recent famous approach in motion detection is deep learning. However, the fall event in actual cases is diverse, resulting in complications in the detection task. Additionally, it is difficult to acquire fall-related data; hence, simulation is done on fall events to generate a training dataset, resulting in smaller data. Considering these complications, we have presented a novel method by combining SFA, deep learning models, namely Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM), and rule-base. CNN is used to extract the object region, thereby reducing the region of interest (RoI). Two shape features, such as aspect ratio and area of RoI are considered as input to the LSTM for retrieving the temporal information which is further used for rule generation, thereby increasing the detection accuracy. The efficacy of the proposed method for various features, such as aspect ratio, area, and aspect r$a$ tio+area is demonstrated over the UR Fall data with an accuracy of 95.2%, 93.8%, and 96.36%, respectively.

  • G-AUC: An improved metric for classification model selection
    Shashank Sadafule, Sobhan Sarkar, and Shaomin Wu

    IEEE
    The performance of classification models is often measured using the metric, area under the curve (AUC). The non-parametric estimate of this metric only considers the ranks of the test instances and fails to consider the predicted scores of the model. Consequently, not all the valuable information about the model’s output is utilized. To address this issue, the present paper introduces a new metric, called Gamma AUC (G-AUC) that can take into account both ranks as well as scores. The parameter G tackles the problem of overfitting scores into the metric. To validate the proposed metric, we tested it on 20 UCI datasets with 10 state-of-the-art models. Out of all the values of the parameter G that we tested, four of them got p-value less than 0.05 for the alternative hypothesis that, on the training sets, G-AUC has a greater correlation than AUC itself, with AUC on test sets. Furthermore, for all values of G considered, G-AUC always won majority of the times than AUC for selecting better models.

  • Handling sparsity and seasonality problems simultaneously in session-based recommender systems using graph collaborative filtering
    Subhajit Bag, Anmol Kumar, and Sobhan Sarkar

    IEEE
    Session-based recommender systems have evolved as a new paradigm in recent years, intending to capture short-term yet dynamic user preferences to give more timely and accurate suggestions that are responsive to the change in their session contexts. However, sparse data for user-item interaction has been one of the significant essential issues as we need a colossal amount of memory to store those sparse data. Seasonality is another major issue in recommendation systems as there are many variations in the pattern of customers’ interests at different time intervals. In our study, we resolve the above mentioned issues by using graph collaborative filtering and creating feature bins. As a case study, we used sequential data from YooChoose customers to validate the efficacy of our proposed methodology. Further, we use five state-of-the-art graph neural network models to get the best recommendation. The performance of those models is evaluated using the NDCG (Normalized Discounted Cumulative Gain) and ROC-AUC (Area under the Receiver operating characteristic curve) metrics. In our study, we find out that Residual Gated Convolutional Neural Network with four layers and Adam optimizer gave the best recommendations.

  • Identifying spammer groups in consumer reviews using meta-data via bipartite graph approach
    Varun Balakrishna, Subhajit Bag, and Sobhan Sarkar

    IEEE
    Nowadays, online product reviews are more common on e-commerce platforms. Before making a purchase, people frequently consult product reviews to assess the quality of the item. However, the review system has been seriously harmed by a huge number of review spammers, who frequently cooperate to promote or denigrate specific products. Earlier research uses machine learning techniques to identify singleton suspicious reviews and reviewers without considering the meta-data. In this study, we utilise the meta-data of the consumer’s reviews to identify review spammer organisations using the state-of the-art community detection techniques. Due to the diversity of behavioural indicators, group spammers are challenging to identify. In this study, we propose that clustering the singleton spammers using the meta-data (location and time) of the reviews is the key to identifying group spammers (and their fraudulent reviews). We propose filling out the review-product matrix using the product and review information and text. We then use this to deduce the hidden reviewer-product connections to address the issue of the absence of explicit behavioural signals for singleton reviewers. Subsequently, we build a bipartite graph using the review-product matrix. Using the meta-data of the reviews, which are frequently overlooked by existing algorithms, experiments on a real-world Yelp dataset demonstrated the effectiveness of our methodology in detecting group spammers.

  • Unsupervised and Categorical Sentiment Segmentation of Customer Product Reviews
    Aditya Kumar Singh, Rahul Golder, and Sobhan Sarkar

    IEEE
    In Consumer Review Analysis (CRA), identification of the context of reviews holds paramount importance. In this purview, it is the responsibility of all businesses to suffice their underlying sectors with a structured and classified list of consumer feedback, available on various online platforms. However, generally, reviews and feedbacks are available in a very unorganized manner and need to be tagged and distributed properly to appropriate sectors. To address the problem, we propose a comprehensive model, employing sequential Clustering, Sentiment prediction and subsequent ranking of reviews. To validate the proposed model, data from a Samsung smartphone manufacturing firm was used. The robustness and stability of our model have been examined through different performance indices-Silhouette Index (SI), Davies-Bouldin Index (DBI) and Calinski Harabasz Score (CHS) Score. Our analysis shows a distinct categorization of reviews based on their contexts with minimal noise in the classification measures. Our custom declared coefficient, Relevant Voting Score (RVS) has been found to rank the reviews in an accurate priority list thereby helping the sectors to contemplate only the most important customer feedback.

  • Crash severity analysis in distracted driving using unlabeled and imbalanced data: A novel approach using Robust Two-Phase Ensemble Predictor
    Subhajit Bag, Saptashwa Maity, and Sobhan Sarkar

    IEEE
    Distracted driving plays a pivotal role in road accidents. Therefore, prediction of the crash severity due to distracted driving is essential. Although several machine learning techniques exist for such prediction, it is difficult to use them in case of the unavailability of class labels and class imbalance issues. Moreover, there is a severe lack of research considering environmental factors and driver’s behaviour to predict the crash severity. To address the issues, in this study, a robust two-phase ensemble prediction model has been developed, considering the geolocation information and driver’s behaviour. An analysis of the unlabeled and high-dimensional data is generally challenging. We perform dimensionality reduction using t-SNE, followed by agglomerative hierarchical clustering to get labelled data. We have used Synthetic Minority Over-sampling Technique (SMOTE) to mitigate the class imbalance issue. Subsequently, we observe that some localities have much more severe crashes, so we develop a feature considering the geolocation information. Then, we create a novel predictor called Robust Two-Phase Ensemble Predictor (R2PEP) to predict the crash severity. The performance of the proposed model has been compared with five state-of-the-art algorithms using a dataset we obtained from the Nevada Department of Transportation. The comparison demonstrates the superiority of our model over the other models, with an accuracy of 99.6%.

  • Implementation of a Priority Queue to Optimize Resources during Manual Verification of Fake News
    Piran Karkaria, Rahul Golder, and Sobhan Sarkar

    IEEE
    Combating fake news on social media is a critical challenge in today's digital age, especially when misinformation is spread regarding vital matters such as the Covid-19 pandemic. Manual verification of all content is infeasible; hence, Artificial Intelligence is used to classify fake news. Our ensemble model uses multiple Natural Language Processing techniques to analyze the truthfulness of the text in tweets. We create custom parameters that analyze the consistency and truthfulness of domains contained in hyperlinked URLs. We then combine these parameters with the results of our deep learning models to achieve classification with greater than 99% accuracy. We have proposed a novel method to calculate a custom coefficient, the Combined Metric of Prediction Uncertainty (CMPU), which is a measure of how uncertain the model is of its classification of a given tweet. Using CMPU, we have proposed the creation of a priority queue following which the tweets classified with the lowest certainty can be manually verified. By manually verifying 3.93% of tweets, we were able to improve the accuracy from 99.02% to 99.77%.

  • Real-Time Detection of Traffic Anomalies Near Roundabouts
    Anima Pramanik, Sobhan Sarkar, Chawki Djeddi, and J. Maiti

    Springer International Publishing

  • A Novel Optimized Method for Feature Selection Using Non-linear Kernel-Free Twin Quadratic Surface Support Vector Machine
    Saptashwa Maity, Arjav Rastogi, Chawki Djeddi, Sobhan Sarkar, and J. Maiti

    Springer International Publishing

  • COVID-19 outbreak: A data-driven optimization model for allocation of patients
    Sobhan Sarkar, Anima Pramanik, J. Maiti, and Genserik Reniers

    Elsevier BV

RECENT SCHOLAR PUBLICATIONS

  • A Z-Number Slacks-Based Measure DEA model-based Framework for Sustainable Supplier Selection with Imprecise Information
    S Sarkar, AR Paramanik, B Mahanty
    Journal of Cleaner Production 436, 140563 2024

  • A Two-phase Approach to Determine User-Preference and Feature Importance in Pricing of Cryptocurrencies using Twitter Data
    S Maity, S Khan, S Sarkar
    2023 IEEE International Conference on E-Business Engineering (ICEBE) 2023

  • Video surveillance-based fall detection system using object-level feature thresholding and numbers
    A Pramanik, S Sarkar, SK Pal
    Knowledge-Based Systems 280, 110992 2023

  • Quantifying data imbalance using Exponential f-Divergence
    S Sarkar, A Pramanik
    2023 Joint International Conference on Digital Arts, Media and Technology 2023

  • SENE: A Novel Manifold Learning Approach for Distracted Driving Analysis with Spatio-Temporal and Driver Praxeological Features
    S Bag, R Golder, S Sarkar, S Maity
    Engineering Applications of Artificial Intelligence 123 (Part C), 106332 2023

  • Deep Network-based Slow Feature Analysis for Human Fall Detection
    A Pramanik, K Venkatagiri, S Sarkar, SK Pal
    2022 International Conference on Computational Modelling, Simulation and 2023

  • Risk modeling framework for strategic and operational intervention to enhance the effectiveness of a closed-loop supply chain
    S Bhattacharyya, S Sarkar, B Sarkar, R Manatkar
    IEEE Transactions on Engineering Management 71, 7015-7028 2023

  • Artificial Intelligence-Driven Supply Chain Resilience in Vietnamese Manufacturing Small-and Medium-Sized Enterprises
    P Dey, S Chowdhury, A Abadie, EV Yaroson, S Sarkar
    International Journal of Production Research 2023

  • G-AUC: An improved metric for classification model selection
    S Sadafule, S Sarkar, S Wu
    2022 26th International Computer Science and Engineering Conference (ICSEC), 1-6 2023

  • Implementation of a Priority Queue to Optimize Resources during Manual Verification of Fake News
    P Karkaria, R Golder, S Sarkar
    2022 International Conference on Data Analytics for Business and Industry 2023

  • Unsupervised and Categorical Sentiment Segmentation of Customer Product Reviews
    AK Singh, R Golder, S Sarkar
    2022 International Conference on Data Analytics for Business and Industry 2023

  • Identifying spammer groups in consumer reviews using meta-data via bipartite graph approach
    V Balakrishna, S Bag, S Sarkar
    2022 International Conference on Data Analytics for Business and Industry 2023

  • Crash severity analysis in distracted driving using unlabeled and imbalanced data: A novel approach using Robust Two-Phase Ensemble Predictor
    S Bag, S Maity, S Sarkar
    2022 International Conference on Data Analytics for Business and Industry 2023

  • Handling sparsity and seasonality problems simultaneously in session-based recommender systems using graph collaborative filtering
    S Bag, A Kumar, S Sarkar
    2022 International Conference on Data Analytics for Business and Industry 2023

  • A Two-stage Improved Base Point Slacks-Based Measure of Super-efficiency for Negative Data Handling
    AR Paramanik, S Sarkar, B Sarkar
    Computers & Operations Research 150, 106057 2023

  • An Integrated Approach using Rough Set Theory, ANFIS, and Z-number in Occupational Risk Prediction
    S Sarkar, A Pramanik, J Maiti
    Engineering Applications of Artificial Intelligence 117 (Part A), 105515 2023

  • Safety Analytics
    J Maiti, S Sarkar, J Haight
    Maynard's Industrial and Systems Engineering Handbook, 405-420 2022

  • News media mining to explore speed-crash-traffic association during COVID-19
    S Das, S Sarkar
    Transportation Research Record 2022

  • Impact of operating speed measures on traffic crashes: Annual and daily level models for rural two-lane and rural multilane roadways
    S Das, P Eun, S Sarkar
    Journal of Transportation Safety & Security 2022

  • OSWMI: An Objective-Subjective Weighted method for Minimizing Inconsistency in multi-criteria decision making
    AR Paramanik, S Sarkar, B Sarkar
    Computers & Industrial Engineering 169, 108138 2022

MOST CITED SCHOLAR PUBLICATIONS

  • Application of Optimized Machine Learning Techniques for Prediction of Occupational Accidents
    S Sarkar, V Sammangi, R Raj, J Maiti, P Mitra
    Computers & Operations Research 106, 210-224 2019
    Citations: 208

  • Predicting and analyzing injury severity: A machine learning-based approach using class-imbalanced proactive and reactive data
    S Sarkar, A Pramanik, J Maiti, G Reniers
    Safety science 125, 104616 2020
    Citations: 102

  • Machine learning in occupational accident analysis: A review using science mapping approach with citation network analysis
    S Sarkar, J Maiti
    Safety Science 131, 104900 2020
    Citations: 95

  • An integrated fuzzy multiple criteria supplier selection approach and its application in a welding company
    S Sarkar, DK Pratihar, B Sarkar
    Journal of Manufacturing Systems 46, 163-178 2018
    Citations: 78

  • An optimization-based decision tree approach for predicting slip-trip-fall accidents at work
    S Sarkar, R Raj, V Sammangi, J Maiti, DK Pratihar
    Safety Science 118, 57-69 2019
    Citations: 75

  • A real-time video surveillance system for traffic pre-events detection
    A Pramanik, S Sarkar, J Maiti
    Accident Analysis & Prevention 154 (5), 106019 2021
    Citations: 58

  • Prediction of occupational accidents using decision tree approach
    S Sarkar, A Patel, S Madaan, J Maiti
    INDICON 2017, 1-6 2017
    Citations: 56

  • Text mining based safety risk assessment and prediction of occupational accidents in a steel plant
    S Sarkar, S Vinay, J Maiti
    ICCTICT 2017, 439-444 2016
    Citations: 51

  • Predictive model for incident occurrences in steel plant in India
    S Sarkar, V Pateshwari, J Maiti
    ICCCNT 2017, 1-5 2017
    Citations: 41

  • OSWMI: An Objective-Subjective Weighted method for Minimizing Inconsistency in multi-criteria decision making
    AR Paramanik, S Sarkar, B Sarkar
    Computers & Industrial Engineering 169, 108138 2022
    Citations: 37

  • Genetic Algorithm-Based Association Rule Mining Approach Towards Rule Generation of Occupational Accidents
    S Sarkar, A Lohani, J Maiti
    Communications in Computer and Information Science 776, 517-530 2017
    Citations: 35

  • Prediction of Occupational Incidents Using Proactive and Reactive Data: A Data Mining Approach
    S Sarkar, A Verma, J Maiti
    Industrial Safety Management- 21st Century Perspective of Asia, 65-79 2018
    Citations: 31

  • Study of optimized SVM for incident prediction of a steel plant in India
    S Sarkar, S Vinay, V Pateshwari, J Maiti
    INDICON 2017, 1-6 2017
    Citations: 31

  • Artificial Intelligence-Driven Supply Chain Resilience in Vietnamese Manufacturing Small-and Medium-Sized Enterprises
    P Dey, S Chowdhury, A Abadie, EV Yaroson, S Sarkar
    International Journal of Production Research 2023
    Citations: 29

  • Parametric and non-parametric analyses for pedestrian crash severity prediction in Great Britain
    M Rella Riccardi, F Mauriello, S Sarkar, F Galante, A Scarano, A Montella
    Sustainability 14 (6), 3188 2022
    Citations: 27

  • Application of hybrid clustering technique for pattern extraction of accident at work: A case study of a steel industry
    S Sarkar, N Ejaz, J Maiti
    2018 4th International Conference on Recent Advances in Information 2018
    Citations: 26

  • Segmented point process models for work system safety analysis
    S Gautam, J Maiti, A Syamsundar, S Sarkar
    Safety Science 95, 15-27 2017
    Citations: 26

  • Data-driven Mapping Between Proactive and Reactive Measures of Occupational Safety Performance
    A Verma, S Chatterjee, S Sarkar, J Maiti
    Industrial Safety Management- 21st Century Perspective of Asia, 53-63 2018
    Citations: 24

  • Measurement and modeling of job stress of electric overhead traveling crane operators
    OB Krishna, J Maiti, PK Ray, B Samanta, S Mandal, S Sarkar
    Safety and health at work 6 (4), 279-288 2015
    Citations: 24

  • Application of rough set theory in accident analysis at work: A case study
    S Sarkar, S Baidya, J Maiti
    ICRCICN 2017, 245-250 2017
    Citations: 23