The Assessment of Selected Measures of Clustering Quality

Abstract:

Cluster analysis is a popular multivariate method used for comparing and finding homogeneous groups  of economic objects such as enterprises, customers, employees, shops, cities, provinces, countries, etc. The aim of the paper is to present the results of simulation studies on three measures for assessing the quality of clustering: Calinski-Harabasz, Krzanowski-Lai and Silhuette. They seem to be the most popular among many proposals presented in the literature. Several simulation models have been used with a given number of four groups. We study the correctness of number of groups suggested by three analysed measures. Special attention has been paid to structures with unequal group size, one group with a large distance to others and random disturbances. Generally, the Calinski-Harabasz measure was the best one in identifying the correct group structure of simulated data sets.

nsdlogo2016