Statistical Analysis in SAP HANA Predictive Analysis Library (PAL)

Abstract:

The increasing demand for real-time analytics and scalable statistical processing has driven the development of in-database analytical frameworks that integrate statistical methods directly into database platforms. SAP HANA Predictive Analysis Library (PAL) exemplifies this approach by providing a collection of statistical and machine learning algorithms executed within an in-memory database environment. This article presents a concise overview of selected statistical techniques available in PAL, including the Chi-Square Goodness-of-Fit test, the Chi-Square Test of Independence, and the Cumulative Distribution Function (CDF).
The paper outlines the theoretical foundations of these methods and demonstrates their practical implementation using in-database analytics. The Chi-Square Goodness-of-Fit test is discussed as a tool for evaluating conformity between observed and expected distributions, while the Chi-Square Test of Independence is presented as a method for analyzing relationships between categorical variables. Additionally, the role of CDF in probabilistic analysis and distribution characterization is examined. Practical examples with SQL and Python (hana-ml) illustrate how statistical analysis can be performed directly within SAP HANA, reducing data movement and improving performance. The study highlights the advantages of embedded analytics for scalable enterprise data analysis.

47th IBIMA Computer Science Conference: 29-30 June 2026, Madrid, Spain

Abstract: