DOI: 10.1002/hsr2.72684 ISSN: 2398-8835

Enhancing COVID‐19 Forecasting Accuracy in Malaysia Using a Hybrid ARIMA‐LSTM Model With Exogenous Variables: A Time‐Series Predictive Study

Al Mahmud, Kamarul Imran Musa, Firdaus Mohamad Hamzah, Zainab Mat Yudin Badrin, Mohamad Arif Awang Nawi

ABSTRACT

Background

Accurate forecasting of COVID‐19 cases is essential for effective public health planning and resource allocation. Traditional statistical and deep‐learning models often fail to jointly capture linear dynamics, nonlinear patterns, and exogenous drivers of disease transmission. This study proposes a hybrid ARIMA‐LSTM forecasting framework incorporating four exogenous variables—daily average temperature, rainfall, vaccination rate, and population density—at both the linear (ARIMAX) and nonlinear (LSTM residual) stages.

Methods

Daily confirmed COVID‐19 cases in Malaysia from January 4 to September 18, 2021 were analyzed. A dual‐integration modeling strategy was implemented: an ARIMAX component modeled linear trends and exogenous effects (temperature, rainfall, vaccination rate, and population density), while a Long Short‐Term Memory (LSTM) network captured nonlinear residual structures. Four competing models were evaluated: standalone ARIMA, standalone LSTM, hybrid ARIMA‐LSTM without exogenous variables, and the proposed hybrid ARIMAX‐LSTM with exogenous variables. Performance was assessed using RMSE, MAE, MAPE, and R 2 , with statistical comparison via the Diebold–Mariano (DM) test.

Results

The proposed hybrid ARIMAX‐LSTM model achieved superior predictive accuracy (RMSE = 948.62; MAE = 769.49; MAPE = 6.61%; R 2  = 0.7883), representing approximately 49% lower prediction error than baseline models (RMSE = 1801.90–1857.94). The model explained 78.83% of variance compared with < 5% for models excluding exogenous variables. Improvements were statistically significant ( p  < 0.001). The hybrid model demonstrated robust performance during epidemic transitions, achieving 3.02% error during a sharp decline phase compared with 20%–23% for baseline approaches.

Conclusions

Integrating exogenous variables within both linear and nonlinear components substantially enhances COVID‐19 forecasting accuracy. The proposed hybrid ARIMAX‐LSTM framework provides a reliable tool for epidemic prediction and supports evidence‐based public health decision‐making. This approach is readily may be extensible to other infectious diseases and time‐series forecasting applications.

More from our Archive