Enhancing COVID‐19 Forecasting Accuracy in Malaysia Using a Hybrid ARIMA‐LSTM Model With Exogenous Variables: A Time‐Series Predictive Study
Al Mahmud, Kamarul Imran Musa, Firdaus Mohamad Hamzah, Zainab Mat Yudin Badrin, Mohamad Arif Awang NawiABSTRACT
Background
Accurate forecasting of COVID‐19 cases is essential for effective public health planning and resource allocation. Traditional statistical and deep‐learning models often fail to jointly capture linear dynamics, nonlinear patterns, and exogenous drivers of disease transmission. This study proposes a hybrid ARIMA‐LSTM forecasting framework incorporating four exogenous variables—daily average temperature, rainfall, vaccination rate, and population density—at both the linear (ARIMAX) and nonlinear (LSTM residual) stages.
Methods
Daily confirmed COVID‐19 cases in Malaysia from January 4 to September 18, 2021 were analyzed. A dual‐integration modeling strategy was implemented: an ARIMAX component modeled linear trends and exogenous effects (temperature, rainfall, vaccination rate, and population density), while a Long Short‐Term Memory (LSTM) network captured nonlinear residual structures. Four competing models were evaluated: standalone ARIMA, standalone LSTM, hybrid ARIMA‐LSTM without exogenous variables, and the proposed hybrid ARIMAX‐LSTM with exogenous variables. Performance was assessed using RMSE, MAE, MAPE, and R 2 , with statistical comparison via the Diebold–Mariano (DM) test.
Results
The proposed hybrid ARIMAX‐LSTM model achieved superior predictive accuracy (RMSE = 948.62; MAE = 769.49; MAPE = 6.61%; R 2 = 0.7883), representing approximately 49% lower prediction error than baseline models (RMSE = 1801.90–1857.94). The model explained 78.83% of variance compared with < 5% for models excluding exogenous variables. Improvements were statistically significant ( p < 0.001). The hybrid model demonstrated robust performance during epidemic transitions, achieving 3.02% error during a sharp decline phase compared with 20%–23% for baseline approaches.
Conclusions
Integrating exogenous variables within both linear and nonlinear components substantially enhances COVID‐19 forecasting accuracy. The proposed hybrid ARIMAX‐LSTM framework provides a reliable tool for epidemic prediction and supports evidence‐based public health decision‐making. This approach is readily may be extensible to other infectious diseases and time‐series forecasting applications.