Matrix-based factor analysis on the prediction of insurance claims probability
Minseog Oh, Himchan Jeong, Donggyu Kim, Kwangmin JungAbstract
We propose a matrix-based factor analysis model for predicting the probability of insurance claims. The model employs projected principal component analysis (PPCA), which enhances the estimation of unobserved latent factors by projecting a data matrix onto a linear space spanned by insured-specific features. This approach addresses the overparameterization problem when the number of insured-specific features and insurance coverages is large, enabling more accurate estimation of claim probability than conventional methods. Using a large-scale health insurance dataset from a leading life insurer in South Korea, we demonstrate that the proposed model outperforms conventional and machine-learning benchmarks, such as logistic regression and XGBoost, in predicting claim probabilities. We further determine that our model can reduce computational time by approximately 86% and 98% compared to logistic regression and XGBoost, respectively. The proposed model provides a unified and scalable framework for modeling high-dimensional claim probabilities, offering practical value for underwriting, risk management, and personalized insurance product design.