DOI: 10.21031/epod.1901793 ISSN: 1309-6575

From Prediction to Explanation: Integrating Ensemble Machine Learning and Structural Equation Modeling in Reading Literacy

Fatma Nur Aydın, Kübra Atalay Kabasakal
This study aimed to (1) examine the performance and variable importance rankings of bagging, random forests, and gradient boosting algorithms in predicting reading comprehension skills in the PIRLS 2021 cycle, and (2) investigate the causal relationships of influential predictor variables identified by these algorithms using Structural Equation Modeling (SEM). Data from Finland, Portugal, and Türkiye, representing different reading achievement levels, were analyzed. Results showed that all algorithms exhibited similar and moderate predictive performance across countries. Variable importance rankings were consistent across algorithms within each country, and the key predictors were largely similar across countries. In Finland, the most influential predictors were students’ reading confidence, early literacy task performance before primary school, and home learning resources. In Portugal, reading confidence and home resources were most influential, whereas in Türkiye, home resources and reading confidence were dominant. Partial dependence analyses indicated that these variables had positive effects across all countries. Notably, reading confidence had a dominant effect in Finland and Portugal, which have higher reading achievement, while home resources were particularly influential in Türkiye, with comparatively lower reading achievement. SEM results supported these findings. The SEM models showed acceptable fit for Türkiye and Portugal, and good fit for Finland. According to the models, contextual and familial factors were more prominent in Türkiye, individual factors were influential in Portugal, and early literacy tasks had a noteworthy impact in Finland. These findings highlight the value of combining machine learning algorithms and SEM to investigate reading literacy. The integration of predictive modeling and causal analysis can provide a comprehensive understanding of reading literacy performance and its underlying factors across different educational contexts.

More from our Archive