k-Means Clustering of Stores in Retail Industry Using Predictive Analysis Library on SAP HANA XS Advanced

Abstract:

The Retail Industry is quite rotary, where there are many changes in a short period of time. In large international retailers there is a need to be able to differentiate the various existing stores, in order to perceive those that are more and less relevant to the business. For this, it is useful to use Data Mining algorithms for Clustering, such as k-Means, which is widely used in this market, combining two very important variables to observe the activity of retail stores: revenue and number of transactions. Thus, with this method, it is possible to obtain a good differentiation between stores as to their activity, since more active stores require more attention from managers. The study included the Clustering of more than 1000 stores of a large Food Retail Chain and revealed that there are some differences between the various stores, being some at a very high level of activity and others at a much lower level. The Clustering process using the k-Means algorithm was fully implemented in SAP HANA XS Advanced utilizing the Predictive Analysis Library (PAL). In a first phase, the Clustering process involved the combination of the Elbow Method and Silhouette Coefficient to choose the optimal number of clusters. In a second phase, the capabilities of the Predictive Analysis Library for the k-Means algorithm were used to automatically choose the best number of clusters, based on the value of the Silhouette Coefficient. In the end, in both phases, two clusters were formed: one cluster housing the less active stores and the other encompassing the more active stores in terms of revenue and number of transactions, and it was possible to distinguish the stores under analysis, perceiving those that are more and less important, those that have more and less revenue and those that perform more and less transactions, grouping them in a correct way.

nsdlogo2016