Abstract:
Benchmarks are important in research to evaluate proposed approaches and works. In many fields such as information processing and retrieval, they rely on datasets composed of training and test sub-datasets. In the recommendation field, some benchmarks do exist for various type of information. However, no dataset is dedicated to Open Source Software (OSS). The aim of this paper is to create a first benchmark specific to OSS, which may be used in evaluating different algorithms recommending OSS. To reach this aim, we designed the structure of the dataset by studying OSS characteristics and we mined both SourceForge and Github repositories in order to constitute the data collection. We then proceeded to the evaluation step by running a set of well-known recommendation algorithms within Recommender101 and Librec frameworks on the OSS dataset. The obtained benchmark may serve as a basis for any future work about OSS recommendation by either extending the dataset, or evaluating and comparing new algorithms.