DOI: 10.3390/mca31030111 ISSN: 2297-8747

Evaluating Regularization Estimators Under Severe Multicollinearity: A Simulation and Empirical Study on Housing Prices

Osman Ufuk Ekiz, Meltem Ekiz

Accurate housing price prediction is important for market efficiency and purchasing decisions. However, multicollinearity among independent variables remains a major challenge in linear regression, causing variance inflation and reducing the reliability of the ordinary least squares (OLS) estimator. Although regularization methods such as ridge regression, least absolute shrinkage and selection operator (LASSO), and elastic net (EN) are widely used, evidence regarding their variance behavior under controlled multicollinearity structures remains limited. This study addresses this gap through simulation experiments conducted under controlled correlation structures with sample sizes ranging from 100 to 2000, 5 to 70 independent variables, and correlation coefficients between 0.50 and 0.99. The findings are further validated using the California Housing Dataset, where mean squared prediction error (MSPE) is computed on the full dataset, while root mean squared error (RMSE), mean absolute error (MAE), and the coefficient of determination (R2) are evaluated on a hold-out test set. Simulation results show that LASSO generally yields the lowest variance estimates under moderate multicollinearity, whereas EN becomes more competitive as multicollinearity and dimensionality increase. In the California Housing application, EN reduces MSPE by approximately 95.5% relative to OLS. These findings provide insight into the behavior of linear regression estimators and offer practical guidance for researchers in selecting appropriate models for housing price modelling.

More from our Archive