Intelligent Virtual Sensor Generation Using KL-Divergence- Based Fusion and Deep Generative Learning for Smart Environmental Monitoring
Murad Ali Khan, Qazi Waqas Khan, Muhammad Faizan, Ji-Eun Kim, Il-yeop Ahn, Do-Hyeun KimSensor-based environmental monitoring systems are often affected by missing, noisy, and unreliable measurements caused by sensor faults, sparse deployment, calibration drift, and communication interruptions. To address these challenges, this study proposes an intelligent virtual sensor generation framework that integrates physical-constraint-based preprocessing, statistical virtual sensor modeling, KL-divergence-based fusion, deep generative augmentation, and temporal prediction. The raw weather-station data are first refined using threshold-based filtering, physical validity constraints, and Isolation Forest-based outlier detection. To handle the circular nature of wind direction, the angle is encoded using sine and cosine components during modeling and reconstructed using the atan2 function for evaluation. Multiple statistical methods, including Inverse Distance Weighting, Kernel Density Estimation, Ridge Regression, and Copula-based modeling, are employed to generate complementary virtual sensor data. These outputs are adaptively fused using KL divergence according to their distributional similarity with real sensor data. The fused datasets are further augmented using Variational Autoencoders and Conditional Tabular Generative Adversarial Networks, and then evaluated using BiLSTM and BiGRU models with MAE, MSE, and RMSE metrics. The experimental results demonstrate that the proposed framework generates physically valid and distributionally consistent virtual sensor data. Fusion-based methods outperform standalone approaches, while VAE-based augmentation generally provides better statistical fidelity and lower prediction errors than CTGAN. Additional validation using a public NOAA weather-station dataset further supports the transferability of the proposed fusion-based virtual sensing workflow. Comparisons with TimeGAN and diffusion-based temporal generative baselines, supported by Wilcoxon signed-rank testing, confirm the statistical significance and competitive performance of the proposed framework. A quantitative computational analysis also demonstrates the practical feasibility of the framework in terms of training time, inference time, memory consumption, and scalability. Overall, the proposed framework offers a reliable and scalable solution for virtual sensing in sensor-sparse and fault-prone environmental monitoring systems.