Performance Evaluation of In-Database Statistical Methods Using SAP HANA Predictive Analysis Library

Abstract:

The increasing demand for real-time analytics has driven the development of in-database analytical frameworks that integrate statistical processing directly into database systems. SAP HANA Predictive Analysis Library (PAL) represents a prominent implementation of this paradigm. However, there is limited empirical evidence evaluating its performance compared to traditional external analytical tools.
This paper presents an empirical evaluation of selected statistical methods implemented in SAP HANA PAL, including the Chi-Square Goodness-of-Fit test, the Chi-Square Test of Independence, and the Cumulative Distribution Function (CDF). The study compares in-database analytics with Python-based implementations (SciPy/Pandas) across datasets of varying sizes.
The results demonstrate that SAP HANA PAL significantly outperforms external analytical environments in terms of execution time and scalability, particularly for large datasets, while maintaining statistical consistency. These findings confirm the effectiveness of in-database analytics for enterprise-scale data processing.