Abstract:
This paper presents comparatively three unsupervised learning algorithms and analyzes their applicability for a credit scoring problem. Performance of the data mining algorithms Principal component analysis, k-Means and hierarchical clustering is quantified in terms of data variance, dispersion, percentage of point variability and existence of outlier values. Statistical analysis of processed data demonstrates the effectiveness of clustering algorithms, a higher degree of homogeneity of the clusters, a small variance of data and fewer outlier values for intra-cluster distances.