Clustering Performance of a Recombinator Hartigan–Wong Algorithm
Libero Nigro, Franco CicirelliThe work described in this paper continues basic research aimed at improving clustering algorithms such as K-Means and Random Swap through careful seeding and genetic concepts. This paper, in particular, develops a variation in the Hartigan–Wong (HW) algorithm, which, although computationally more expensive, is recognized as a better solution than K-Means. The new algorithm is named Recombinator Hartigan–Wong (Rec-HW). Rec-HW first builds a population of candidate solutions, each tailored to the minimization of the Sum-of-Squared-Errors (SSE) objective function cost. Candidate solutions are then systematically recombined by exploiting the standard behaviour of HW, which performs crossover and mutation operations. Recombinations, as experimentally confirmed, reduce the number of iterations required by basic HW and tend to favour the emergence of a solution close to the optimal one. The paper describes the design of Rec-HW, whose current implementation depends on parallel Java. Good clustering performance is demonstrated by using both benchmark and real-world datasets.