Data Processing Performance of Apache Spark on Beowulf Clusters. An Overview

Marius-Iulian CLUCI, Marin FOTACHE and Valerică GREAVU-ȘERBAN

Abstract:

Despite the advent of cloud computing and the democratisation of data processing through parallel and distributed computing, Big Data systems may incur considerable costs that make them inaccessible to small and medium sized companies, and also to organizations that are financially stretched (such is the case with many universities). This paper presents preliminary results of data processing tasks (queries) for the Apache Spark framework deployed on a commodity Beowulf cluster. Association between query duration (the outcome) and some predictors, such as the cluster number of nodes, the cluster manager, cluster available RAM, database size, was examined.

34th IBIMA Conference: 13-14 November 2019, Madrid, Spain

Data Processing Performance of Apache Spark on Beowulf Clusters. An Overview

Abstract: