DOI: 10.2478/jses-2026-0005 ISSN: 2285-388X

AI Techniques for Survey Data Quality: Transformers and GANs

Simona Cafieri, Gianmarco Borrata

Abstract

This study develops a comparative framework for missing data imputation in social surveys, integrating classical statistical methods, machine learning algorithms, and advanced deep learning methods, including Transformers and Generative Adversarial Networks (GANs). The analysis specifically addresses the imputation of categorical and ordinal variables, ensuring full alignment between imputation strategies and classification-based evaluation metrics, thereby overcoming a key limitation of existing approaches. An empirical evaluation on real data from the Italian “Aspects of Daily Life” (AVQ) survey—covering approximately 50,000 individuals and 735 variables—demonstrates that deep learning models, particularly Transformer-based architectures, consistently outperform traditional and machine learning methods in terms of predictive accuracy, generalization, and preservation of distributional properties. The findings show that attention-based models effectively capture complex dependencies in high-dimensional survey data and can improve the quality and reliability of official statistics.

More from our Archive