Customer Segmentation and Creditworthiness Profiling via Multi-view Clustering and Predictive Risk Scoring
Boumedyen Shannaq, Kaneez Fatima Sadriwala, Marwan Alshar’e, Nour Eldin Elshaiekh Osman, Basel Bani-Ismail, Oualid AliIn the modern artificial intelligence-driven banking regime, the world is witnessing exponential growth of transactional and behavioural data, and therefore, for establishing and sustaining financial institutions, it is imperative that higher-order customer risk management and creditworthiness evaluation are done. This requires not only developing knowledge architecture but also establishing decision-aware threshold optimisation. This study proposes a unified model that combines customer segmentation with creditworthiness profiling based on multi-view clustering applied together with predictive risk scores. The proposed framework uses the gradient boosting decision tree (LightGBM) classifier to identify the level of customer credit risk using a combination of consolidated demographic, financial and behavioural variables. This algorithm is commonly used in financial risk analytics because it has demonstrated good performance in dealing with nonlinear relationship modelling and in dealing with disproportionate structured banking data. The empirical analysis is based on a large real-world dataset. It comprises 1,013,193 transactions, 2,000 customers and 6,146 card payments. It also captures detailed transactional, behavioural, card-level and demographic characteristics. The representations of the customers are built on three complementary views, namely (i) demographic and financial health (income, debt, age, credit score); (ii) card and credit utilisation; and (iii) transactional behaviour and expenditure dynamics. The multi-view embedding strategy and unsupervised clustering resulted in two separate groups of customers bordering the distinction between a high-value cohort and a higher-risk cohort. The accuracy of the baseline model is 86.5% and its ROC-AUC is 0.935, which highlights its discriminatory effectiveness. Further optimisation through decision-threshold optimisation, improves operational performance and eventually yields the highest F1-score of 0.843, balanced precision of 0.838 and recall of 0.848. This analysis also indicates that the model can be implemented on various risk considerations; it is scalable and precisely guides risk-conscious financial decision-making, reconciling more advanced machine learning technologies with viable banking and credit-management needs.