A Hybrid Topological–Metric Clustering Framework Based on Persistent Homology: TCSI, HTCI, and NHTSI
Nurhan Halisdemir, Yunus Güral, Mehmet GürcanWhile classical clustering methods, particularly k-means, produce powerful and practical solutions based on metric distances between data points, they can be limited in complex, nonlinear, and structurally disordered datasets. This study proposes a hybrid topological–metric clustering framework, referred to as Hybrid-NHTSI, that integrates persistent homology-based structural information into the clustering update process. The method is based on the Topological Cluster Separation Index (TCSI), a persistent homology (PH)-based metric for topological separation. In addition to TCSI, the proposed framework uses the Normalized Topological Cluster Separation Index (NTCSI), the Hybrid Topological Clustering Index (HTCI), and the Normalized Hybrid Topological Separation Index (NHTSI) to evaluate clustering performance from both geometric and topological perspectives. In the proposed approach, while the topological separation between clusters is increased, intra-cluster geometric scattering is controlled by a regularization term. This formulation enables the extraction of clusters that are consistent not only topologically but also geometrically. The performance of the method was evaluated on synthetic circles-and-moons benchmark datasets under different noise and overlap levels, and on the UCI Human Activity Recognition real sensor dataset. The experimental results showed that DBSCAN achieved the strongest overall performance on the density-favorable synthetic benchmark, which is consistent with the nonconvex and density-separable structure of the data. However, Hybrid-NHTSI produced higher NTCSI, HTCI, and NHTSI values than classical metric/geometric baselines such as k-means, Spectral Clustering, and Agglomerative Clustering. Pairwise statistical comparisons based on NHTSI confirmed that these improvements were significant against several competing methods. In the real-data experiment, although Spectral Clustering achieved the highest ARI value, Hybrid-NHTSI obtained the highest NTCSI, HTCI, and NHTSI values and significantly outperformed all competing methods in terms of NHTSI. The findings demonstrate that considering both metric and topological information together, rather than relying solely on metric or topological information, provides a more structurally informed evaluation and optimization mechanism for complex clustering problems. Accordingly, the proposed method should not be interpreted as a universally superior clustering algorithm across all metrics, but rather as a topology-aware hybrid refinement framework that enriches metric-based clustering with persistent homology.