Piyush Sewal

@mohali.amity.edu

Assistant Professor Computer Science
Amity University Punjab

EDUCATION

Ph.D Computer Science

RESEARCH, TEACHING, or OTHER INTERESTS

Artificial Intelligence, Computer Science, Computer Science Applications, Software
7

Scopus Publications

72

Scholar Citations

5

Scholar h-index

3

Scholar i10-index

Scopus Publications

  • Performance optimization of Spark MLlib workloads using cost efficient RICG model on exponential projective sampling
    Piyush Sewal, Hari Singh
    Cluster Computing, 2024
  • A predictive analysis of the COVID-19 pandemic for traditional and tree-based regression algorithms
    Hari Singh, Piyush Sewal, Dinesh Chander Verma
    Impact of Digital Solutions for Improved Healthcare Delivery, 2024
    A lot of works exist in the literature that compares regression algorithms on different datasets. This chapter presents a model that uses best subset selection approach for the predictors and performs an exhaustive empirical comparison of eight regression algorithms Linear Regression, Multi-Linear Regression, Polynomial Regression, K-Nearest Neighbors, Lasso, Ridge, Decision Tree, Gradient Boost Tree, and Random Forest Regression algorithms on various predictors from Covid-19 dataset. The model is evaluated for train accuracy on metrics R2, Root Mean Square Error, and Mean Absolute Error. The test R2 and adjusted-R2 metrics evaluate the model on cross-validation prediction test errors. The predicted values of dependent variables are checked for similarity and validation using statistical z-test.
  • Correction to: Analyzing distributed Spark MLlib regression algorithms for accuracy, execution efficiency and scalability using best subset selection approach (Multimedia Tools and Applications, (2023), 83, 15, (44047-44066), 10.1007/s11042-023-17330-5)
    Piyush Sewal, Hari Singh
    Multimedia Tools and Applications, 2024
  • Analyzing distributed Spark MLlib regression algorithms for accuracy, execution efficiency and scalability using best subset selection approach
    Piyush Sewal, Hari Singh
    Multimedia Tools and Applications, 2024
  • Algorithmic Proficiency in Spark Configuration Tuning: An Empirical Study using Execution Time Metrics across Varied Workloads
    Piyush Sewal, Hari Singh
    Procedia Computer Science, 2024
    In the realm of big data, where datasets of immense scale pose processing challenges, distributed processing platforms like open-source Apache Spark have emerged to address these issues. Spark’s internal configuration parameters exert varying impacts on execution times based on job characteristics, making manual optimization daunting. The core focus of this study lies in optimizing Spark’s internal configurations, with specific attention directed towards three types of workloads: Iterative-intensive, Memory-intensive, and CPU-intensive. Employing Grid Search, Random Search, and Evolutionary Optimization algorithms yields substantial execution time reductions: 23.24% with Grid Search, 19.71% with Random Search, and 23.06% with Evolutionary Optimization. Notably, Evolutionary Optimization achieves optimal configurations approximately 29% faster than Grid Search. While Random Search and Evolutionary Optimization share similar time requirements, Random Search’s execution time reduction for a given Spark workload is relatively lower. This research sheds light on algorithmic configuration tuning intricacies and its influence on Spark workload execution times, contributing to the exploration of optimizing big data processing platforms.
  • A Machine Learning Approach for Predicting Execution Statistics of Spark Application
    Piyush Sewal, Hari Singh
    Pdgc 2022 2022 7th International Conference on Parallel Distributed and Grid Computing, 2022
    Apache Spark is one of the most popular, widely used and open-source distributed processing framework that can process huge site datasets in time efficient manner due to its in-memory computational capabilities. However, there are several factors that can affect the performance of an application which include the nature and size of the input dataset, computational capability of the system and nature and design of the algorithm. Hence, there are different parameters that are required to correctly predict the execution statistics of a Spark application which include execution time of jobs, stages and tasks, memory requirement and usage at the execution level and I/O cost in the form of read/ write shuffling of data. To address these challenges, a simulation and machine learning based prediction model is presented in this paper that takes only a few initial samples of execution statistics and predicts the performance and execution statistics of the Spark application with high accuracy. The proposed model is evaluated on the Wordcount application and Spark standalone mode and accuracy metrics show that the proposed model achieves high accuracy in predicting execution statistics.
  • A Critical Analysis of Apache Hadoop and Spark for Big Data Processing
    Piyush Sewal, Hari Singh
    Proceedings of IEEE International Conference on Signal Processing Computing and Control, 2021
    The emergence of big data processing platforms that can work globally in an integrated manner and process the huge datasets efficiently has become very significant. A critical analysis of two big data processing platforms, Apache Hadoop MapReduce and Apache Spark, has been done in this paper. Earlier Hadoop MapReduce was one of the most popular platforms for batch-processing of huge size datasets but variation in the nature of data from static to dynamic, Apache Spark proves to be better for iterative jobs and live data streams. This paper aims to critically compare and analyze Hadoop-l.x, 2. x and 3. x, Spark-l.x, 2. x and 3. x on well-known key parameters like components, storage system, resource management, fault tolerance, data processing, scalability and performance etc.

RECENT SCHOLAR PUBLICATIONS

  • A Predictive Analysis of the COVID-19 Pandemic for Traditional and Tree-Based Regression Algorithms
    H Singh, P Sewal, DC Verma
    Impact of Digital Solutions for Improved Healthcare Delivery, 303-340 , 2025
    2025
    Citations: 2
  • Performance optimization of Spark MLlib workloads using cost efficient RICG model on exponential projective sampling
    P Sewal, H Singh
    Cluster Computing 27 (8), 10569-10588 , 2024
    2024
    Citations: 5
  • Utilizing Twitter data and NLP to analyze and predict public sentiment trends in mental health
    T Gupta, A Sharma, Aryan, K Rana, P Sewal
    The International Conference on Recent Trends in Communication & Intelligent … , 2024
    2024
    Citations: 1
  • Analyzing distributed Spark MLlib regression algorithms for accuracy, execution efficiency and scalability using best subset selection approach
    P Sewal, H Singh
    Multimedia Tools and Applications 83 (15), 44047-44066 , 2024
    2024
    Citations: 11
  • Performance comparison of apache spark and hadoop for machine learning based iterative GBTR on HIGGS and covid-19 datasets
    P Sewal, H Singh
    Scalable Computing: Practice and Experience 25 (3), 1373-1386 , 2024
    2024
    Citations: 12
  • Improving Execution Workloads in In-Memory Distributed Computing Platform–SPARK
    P Sewal, H Singh
    Jaypee University of Information Technology, Solan, HP , 2024
    2024
  • Algorithmic proficiency in spark configuration tuning: An empirical study using execution time metrics across varied workloads
    P Sewal, H Singh
    Procedia Computer Science 235, 2307-2317 , 2024
    2024
    Citations: 2
  • A machine learning approach for predicting execution statistics of spark application
    P Sewal, H Singh
    2022 Seventh International Conference on Parallel, Distributed and Grid … , 2022
    2022
    Citations: 6
  • A critical analysis of apache hadoop and spark for big data processing
    P Sewal, H Singh
    2021 6th International Conference on Signal Processing, Computing and … , 2021
    2021
    Citations: 33

MOST CITED SCHOLAR PUBLICATIONS

  • A critical analysis of apache hadoop and spark for big data processing
    P Sewal, H Singh
    2021 6th International Conference on Signal Processing, Computing and … , 2021
    2021
    Citations: 33
  • Performance comparison of apache spark and hadoop for machine learning based iterative GBTR on HIGGS and covid-19 datasets
    P Sewal, H Singh
    Scalable Computing: Practice and Experience 25 (3), 1373-1386 , 2024
    2024
    Citations: 12
  • Analyzing distributed Spark MLlib regression algorithms for accuracy, execution efficiency and scalability using best subset selection approach
    P Sewal, H Singh
    Multimedia Tools and Applications 83 (15), 44047-44066 , 2024
    2024
    Citations: 11
  • A machine learning approach for predicting execution statistics of spark application
    P Sewal, H Singh
    2022 Seventh International Conference on Parallel, Distributed and Grid … , 2022
    2022
    Citations: 6
  • Performance optimization of Spark MLlib workloads using cost efficient RICG model on exponential projective sampling
    P Sewal, H Singh
    Cluster Computing 27 (8), 10569-10588 , 2024
    2024
    Citations: 5
  • A Predictive Analysis of the COVID-19 Pandemic for Traditional and Tree-Based Regression Algorithms
    H Singh, P Sewal, DC Verma
    Impact of Digital Solutions for Improved Healthcare Delivery, 303-340 , 2025
    2025
    Citations: 2
  • Algorithmic proficiency in spark configuration tuning: An empirical study using execution time metrics across varied workloads
    P Sewal, H Singh
    Procedia Computer Science 235, 2307-2317 , 2024
    2024
    Citations: 2
  • Utilizing Twitter data and NLP to analyze and predict public sentiment trends in mental health
    T Gupta, A Sharma, Aryan, K Rana, P Sewal
    The International Conference on Recent Trends in Communication & Intelligent … , 2024
    2024
    Citations: 1
  • Improving Execution Workloads in In-Memory Distributed Computing Platform–SPARK
    P Sewal, H Singh
    Jaypee University of Information Technology, Solan, HP , 2024
    2024