FedEI: Fault Detection and Ensemble Iterative Inference for Low-Cost Air Quality Sensor Data Jintu Borah, Mohd Shahrul Mohd Nadzir, Mylene G. Cayetano, Shubhankar Majumdar IEEE Sensors Journal, 2026 Faulty sensor measurements and system failures introduces challenges in accurate air quality monitoring. Missing data and erroneous observations are becoming the major concerns in environmental assessment. This work proposes an imputation technique leveraging a diverse ensemble of machine learning models based on the gradient boosting optimization technique. The methodology involves dynamic adaptations to temporal variations and sensor faults, ensuring accurate imputations across diverse pollutants. The data used in this work is from a low-cost air quality sensor providing real-time concentrations of six major air pollutants, NO<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub>, O<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sub>, CO, SO<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub>, PM<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2.5</sub>, and PM<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">10</sub>. Meteorological characteristics, namely relative humidity and temperature, are included, further enhancing the understanding of air quality conditions. For assessment, deliberate disruptions are introduced by removing observations from the end of subsets. The proposed approach demonstrates its efficacy in dynamically adapting to temporal variations and effectively mitigating the impact of sensor faults on data continuity. The work showcases the versatility and reliability of the technique, comparing imputed values with actual measurements. This work signifies a crucial step towards enhancing the resilience and reliability of air quality sensors, critical for informed decision-making and sustainable urban development.
Drone-based Air Quality Monitoring: Development and Evaluation of Low-Cost PM2.5 Sensor for Remote Environmental Assessment Mohd Shahrul Mohd Nadzir, Utbah Rabuan, Sawal Hamid Md Ali, Jintu Borah, Shubhankar Majumdar, Muhammad Saufy Rohmad Sensors and Materials, 2025 a calibrated RMSE of 1.98, compliant with US EPA standards. Field tests were carried out at various altitudes and locations to evaluate system accuracy and adaptability. At higher altitudes, AirborneSense demonstrated its peak sensor performance with an R 2 value of 0.78, maintaining consistency with reference analyzers ( R 2 = 0.73). The system effectively mapped pollution, revealing elevated PM concentrations in construction zones and significantly lower levels in rural areas. The findings underscore the potential of UAV-based systems to enhance spatial and temporal environmental assessments, providing a scalable, cost-effective tool for comprehensive pollution monitoring and management.
A Novel Hybrid Approach For Efficiently Forecasting Air Quality Data Jintu Borah, Tanujit Chakraborty, Md. Shahrul Md. Nadzir, Mylene G. Cayetano, Francesco Benedetto, Shubhankar Majumdar IEEE Sensors Letters, 2025 Accurate and reliable air quality forecasting is essential for protecting public health, sustainable development, pollution control, and enhanced urban planning. This letter proposes a novel architecture namely wavelet-based CatBoost to forecast the real-time concentrations of air pollutants by combining the maximal overlapping discrete wavelet transform with the CatBoost model. This hybrid approach efficiently transforms time series of air pollution concentration levels into high-frequency and low-frequency components, thereby extracting signal from noise and improving prediction accuracy and robustness. Evaluation of two distinct regional datasets, from the Central Air Pollution Control Board sensor network and a low-cost air quality sensor system, underscores the superior performance of our proposed methodology in real-time forecasting compared to the state-of-the-art machine learning and deep learning architectures.
Timezone-Aware Auto-Regressive Long Short-Term Memory Model for Multipollutant Prediction Jintu Borah, Mohd Shahrul Mohd Nadzir, Mylene G. Cayetano, Hemant Ghayvat, Shubhankar Majumdar, Gautam Srivastava IEEE Transactions on Systems Man and Cybernetics Systems, 2025 Air pollution poses a significant threat to urban environments, and accurate prediction of multiple air pollutants is crucial for effective mitigation strategies. This study introduces a novel time-aware auto-regressive long-short-term memory (TAR LSTM) approach to address this challenge by developing a multivariate prediction model using artificial intelligence (AI) for SMART city applications. Existing models often fall short of predicting all six major criteria pollutants comprehensively. In response, this work proposes an autoregressive (AR) neural network model based on the long short-term memory (LSTM) architecture, which excels in capturing temporal dependencies within sequential data. The proposed method uses the AR model that captures the linear dependencies in the time series, while the LSTM captures the nonlinear dependencies and long-term patterns. This enables the model to consider past pollutant concentrations and their relationships, resulting in a more accurate and dynamic prediction. Rigorous testing on datasets from low-cost air quality sensors (LAQSs) validates the model’s superior performance. Datasets from diverse locations, including India, Malaysia, and the Philippines, contribute to the robustness of the model, showcasing its efficacy in varied urban environments. This research contributes to advancing predictive modeling for air quality, addressing the limitations of previous approaches, and providing a promising solution for SMART city implementations. The findings highlight the AR LSTM model’s potential as a valuable tool for precise and comprehensive air pollution forecasting, which has implications for informed decision making and better urban environmental management.
AiCareAir: Hybrid-Ensemble Internet-of-Things Sensing Unit Model for Air Pollutant Control Jintu Borah, Mohd Shahrul Mohd Nadzir, Mylene G. Cayetano, Shubhankar Majumdar, Hemant Ghayvat, Gautam Srivastava IEEE Sensors Journal, 2024 The detrimental effects on human health caused by air pollution show that being able to predict air quality is a task of utmost significance. The application of Artificial Intelligence (AI) and the Internet of Things (IoT) is seen as promising in this domain. The performances of state-of-the-art models in terms of prediction accuracy vary with different pollutants and are acceptable only for certain pollutants only. This paper uses Machine Learning (ML) and Deep Learning (DL) models to predict the concentrations of six major air pollutants. Data is collected over 8 months with 1400 daily instances from sensors deployed in Kuala Lumpur, Malaysia. As an intelligibly robust system, in this paper a hybrid ensemble model using a combination of ML models, specifically Random Forest, K-Nearest Neighbour (KNN), Extreme Gradient Boosting (XGBoost), and Neural Network models, namely Long Short Term Memory (LSTM), Gated Recurrent Units (GRU), and Convolutional Neural Networks (CNN). Here, a hybrid ensemble learning model is created using five various ML models as weak learners. In previous ensemble models, a homogeneous group of weak learners is utilized; however, this work uses a heterogeneous group of weak learners. The prediction accuracy is compared using R2 score, absolute, squared, and root mean squared errors.
AiCareBreath: IoT-Enabled Location-Invariant Novel Unified Model for Predicting Air Pollutants to Avoid Related Respiratory Disease Jintu Borah, Shashank Kumar, Nikhil Kumar, Mohd Shahrul Mohd Nadzir, Mylene G. Cayetano, Hemant Ghayvat, Shubhankar Majumdar, Neeraj Kumar IEEE Internet of Things Journal, 2024 This article presents a location-invariant air pollution prediction model with good geographic generalizability. The model uses a light GBR as part of a machine-learning framework to capture the spatial identification of air contaminants. Given the dynamic nature of air pollution, the model also uses a random forest to capture temporal dependencies in the data. Our model uses a transfer learning strategy to deal with location variability. The algorithm can learn concentration patterns because it has been trained on a vast data set of air quality measurements from various locations. The trained model is then improved using information from a particular target site, customizing it to the features of the target area. Experiments are carried out on a comprehensive data set containing air pollution measurements from various places to assess the efficacy of the proposed model. The recommended method performs better than standard models at forecasting air pollution levels, proving its dependability in various geographical settings. An interpretability analysis is also performed to learn about the variables affecting air pollution levels. We identify the geographical patterns associated with high-pollutant concentrations by visualizing the learned representations within the model, giving important information for environmental planning and mitigation methods. The observations show that the model outperforms state-of-the-art forecasting based on recurrent neural network and transformer-based models. The suggested methodology for forecasting air contaminants has the potential to improve air quality management and aid in decision-making across numerous regions. This helps safeguard the environment and public health by creating more precise and dependable air pollution forecast systems.
Utilizing a Low-Cost Air Quality Sensor: Assessing Air Pollutant Concentrations and Risks Using Low-Cost Sensors in Selangor, Malaysia Zaki Khaslan, Mohd Shahrul Mohd Nadzir, Hamimatunnisa Johar, Zhang Siqi, Nor Azura Sulong, Faizal Mohamed, Shubhankar Majumdar, Fatin Nur Afiqah Suris, Nor Syamimi Sufiera Limi Hawari, Jintu Borah, Maggie Ooi Chel Gee, Muhammad Ikram A. Wahab, Mohd Aftar Abu Bakar, Noratiqah Mohd Ariff, Ahmad Zia Ul-Saufie Mohamad Japeri, Mohd Fadzil Firdzaus Mohd Nor, Utbah Rabuan, Sawal Hamid Md Ali, Brentha Murugan, Mylene G. Cayetano Water Air and Soil Pollution, 2024
Bidirectional LSTM Model for Accurate and Real-Time Landslide Detection: A Case Study in Mawiongrim, Meghalaya, India J. Sharailin Gidon, Jintu Borah, Smrutirekha Sahoo, Shubhankar Majumdar, Masahiro Fujita IEEE Internet of Things Journal, 2024 This article presents a bidirectional long short-term memory (LSTM) model for the detection of landslides. Previous uses of machine learning (ML) in this setting have demonstrated its general potential, which necessitates the implementation of a suitable algorithm. Landslides are natural disasters that can cause significant destruction and disruption in the affected areas. Early detection is the key to minimizing the impact of landslides, so it is important to develop accurate and efficient models. An area selected for this study is located in Mawiongrim, Meghalaya, India, which is an active landslide zone. The proposed model uses a bidirectional LSTM to capture the temporal patterns of the input data collected from a long-term real-time monitoring system set up in the area. To evaluate the effectiveness of the predictions, the model is trained using a data set composed of various landslide-related characteristics, such as topography, rainfall, hydrological, and soil properties. The results show that the suggested model is capable of detecting landslides with greater accuracy and the lowest error value relative to other models. Additionally, the model is also able to provide a real-time warning system, making it a viable tool for early landslide detection. The research also highlights the prediction models for matric suction and groundwater level, which are crucial in determining slope stability.