DOI: 10.1093/biomet/asag044 ISSN: 0006-3444

Identify the source of spikes: factor or mixture?

Zeqin Lin, Yiming Liu, Guangming Pan, Chi Yao, Jia Zhou

Summary

We consider the problem of identifying the pattern of latent variables in high-dimensional linear latent variable models, which can also be interpreted as determining the source of spiked singular values in the data matrix. Specifically, we test whether the latent variables are continuous or categorical, a distinction which is crucial for data interpretation but challenging in the high-dimensional regime. To address this inference problem, we analyze the asymptotic behavior of empirical measures associated with singular vectors corresponding to large spiked singular values. Leveraging these insights,we propose novel test statistics based on the eigenvector quantile differences and establish their theoretical performance under the null hypothesis. Simulation studies and real data analyses demonstrate the effectiveness and practical utility of our method.

More from our Archive