Using dimensionality reduction to detect financial crime
Mark Eshwar LokananPurpose
This study aims to examine dimensionality reduction techniques – principal component analysis (PCA), linear discriminant analysis (LDA) and t-distributed stochastic neighbor embedding (t-SNE) – and their application in detecting financial crime. The objective is to demonstrate how these methods address feature correlations, reduce data complexity and retain critical fraud indicators in high-dimensional data sets.
Design/methodology/approach
This study reviews the principles and challenges of PCA, LDA and t-SNE and applies them to a real-world data set of financial crime cases drawn from the SEC’s Accounting and Auditing Enforcement Releases. Models are trained and evaluated using stratified cross-validation, with performance compared across dimensionality reduction methods using multiple metrics.
Findings
Results indicate that PCA provides efficient linear reduction while preserving variance, LDA enhances supervised classification by maximizing class separability and t-SNE uncovers local patterns useful for anomaly detection. Together, these methods demonstrate measurable improvements in interpretability, computational efficiency and fraud detection performance.
Originality/value
This paper extends prior work by offering a comparative analysis of PCA, LDA and t-SNE in financial crime detection, bridging theoretical foundations with practical application. It highlights the implications for regulators and auditors, providing a replicable framework for applying dimensionality reduction in fraud analytics.