SAP HANA is an in-memory data platform that can be deployed on premise or on demand. At its core, it is an innovative in-memory relational database management system.
SAP HANA can make full use of the capabilities of current hardware to increase application performance, reduce cost of ownership, and enable new scenarios and applications that were not previously possible. With SAP HANA, you can build applications that integrate the business control logic and the database layer with unprecedented performance. As a developer, one of the key questions is how you can minimize data movements. The more you can do directly on the data in memory next to the CPUs, the better the application will perform. This is the key to development on the SAP HANA data platform.
SAP HANA In-Memory Database
SAP HANA runs on multi-core CPUs with fast communication between processor cores, and containing terabytes of main memory. With SAP HANA, all data is available in main memory, which avoids the performance penalty of disk I/O. Either disk or solid-state drives are still required for permanent persistency in the event of a power failure or some other catastrophe. This does not slow down performance, however, because the required backup operations to disk can take place asynchronously as a background task.
Columnar Data Storage
A database table is conceptually a two-dimensional data structure organized in rows and columns. Computer memory, in contrast, is organized as a linear structure. A table can be represented in row-order or column-order. A row-oriented organization stores a table as a sequence of records. Conversely, in column storage the entries of a column are stored in contiguous memory locations. SAP HANA supports both, but is particularly optimized for column-order storage.
Columnar data storage allows highly efficient compression. If a column is sorted, often there are repeated adjacent values. SAP HANA employs highly efficient compression methods, such as run-length encoding, cluster coding and dictionary coding. With dictionary encoding, columns are stored as sequences of bit-coded integers. That means that a check for equality can be executed on the integers; for example, during scans or join operations. This is much faster than comparing, for example, string values.
Columnar storage, in many cases, eliminates the need for additional index structures. Storing data in columns is functionally similar to having a built-in index for each column. The column scanning speed of the in-memory column store and the compression mechanisms – especially dictionary compression – allow read operations with very high performance. In many cases, it is not required to have additional indexes. Eliminating additional indexes reduces complexity and eliminates the effort of defining and maintaining metadata.
SAP HANA was designed to perform its basic calculations, such as analytic joins, scans and aggregations in parallel. Often it uses hundreds of cores at the same time, fully utilizing the available computing resources of distributed systems.
With columnar data, operations on single columns, such as searching or aggregations, can be implemented as loops over an array stored in contiguous memory locations. Such an operation has high spatial locality and can efficiently be executed in the CPU cache. With row-oriented storage, the same operation would be much slower because data of the same column is distributed across memory and the CPU is slowed down by cache misses.
Compressed data can be loaded into the CPU cache faster. This is because the limiting factor is the data transport between memory and CPU cache, and so the performance gain exceeds the additional computing time needed for decompression.
Column-based storage also allows execution of operations in parallel using multiple processor cores. In a column store, data is already vertically partitioned. This means that operations on different columns can easily be processed in parallel. If multiple columns need to be searched or aggregated, each of these operations can be assigned to a different processor core. In addition, operations on one column can be parallelized by partitioning the column into multiple sections that can be processed by different processor cores.
Traditional business applications often use materialized aggregates to increase performance. These aggregates are computed and stored either after each write operation on the aggregated data, or at scheduled times. Read operations read the materialized aggregates instead of computing them each time they are required.
With a scanning speed of several gigabytes per millisecond, SAP HANA makes it possible to calculate aggregates on large amounts of data on-the-fly with high performance. This eliminates the need for materialized aggregates in many cases, simplifying data models, and correspondingly the application logic. Furthermore, with on-the fly aggregation, the aggregate values are always up-to-date unlike materialized aggregates that may be updated only at scheduled times.
SAP HANA Database Architecture
A running SAP HANA system consists of multiple communicating processes (services). The following shows the main SAP HANA database services in a classical application context.
SAP HANA Database High Level Architecture
Such traditional database applications use well-defined interfaces (for example, ODBC and JDBC) to communicate with the database management system functioning as a data source, usually over a network connection. Often running in the context of an application server, these traditional applications use Structured Query Language (SQL) to manage and query the data stored in the database.
The main SAP HANA database management component is known as the index server, which contains the actual data stores and the engines for processing the data. The index server processes incoming SQL or MDX statements in the context of authenticated sessions and transactions.
The SAP HANA database has its own scripting language named SQLScript. SQLScript embeds data-intensive application logic into the database. Classical applications tend to offload only very limited functionality into the database using SQL. This results in extensive copying of data from and to the database, and in programs that slowly iterate over huge data loops and are hard to optimize and parallelize. SQLScript is based on side-effect free functions that operate on tables using SQL queries for set processing, and is therefore parallelizable over multiple processors.
In addition to SQLScript, SAP HANA supports a framework for the installation of specialized and optimized functional libraries, which are tightly integrated with different data engines of the index server. Two of these functional libraries are the SAP HANA Business Function Library (BFL) and the SAP HANA Predictive Analytics Library (PAL). BFL and PAL functions can be called directly from within SQLScript.
SAP HANA also supports the development of programs written in the R language.
SQL and SQLScript are implemented using a common infrastructure of built-in data engine functions that have access to various meta definitions, such as definitions of relational tables, columns, views, and indexes, and definitions of SQLScript procedures. This metadata is stored in one common catalog.
The database persistence layer is responsible for durability and atomicity of transactions. It ensures that the database can be restored to the most recent committed state after a restart and that transactions are either completely executed or completely undone.
The index server uses the preprocessor server for analyzing text data and extracting the information on which the text search capabilities are based. The name server owns the information about the topology of SAP HANA system. In a distributed system, the name server knows where the components are running and which data is located on which server.
SAP HANA Extended Application Services
Traditional database applications use interfaces such as ODBC and JDBC with SQL to manage and query their data. The following illustrates such applications using the common Model-View-Controller (MVC) development architecture.
SAP HANA greatly extends the traditional database server role. SAP HANA functions as a comprehensive platform for the development and execution of native data-intensive applications that run efficiently in SAP HANA, taking advantage of its in-memory architecture and parallel execution capabilities.
By restructuring your application in this way, not only do you gain from the increased performance due to the integration with the data source, you can effectively eliminate the overhead of the middle-tier between the user-interface (the view) and the data-intensive control logic, as shown in the following figure.
In support of this data-integrated application paradigm, SAP HANA Extended Application Services provides a comprehensive set of embedded services that provide end-to-end support for Web-based applications. This includes a lightweight web server, configurable OData support, server-side JS execution and, of course, full access to SQL and SQLScript.
These SAP HANA Extended Application Services are provided by the SAP HANA XS server, which provides lightweight application services that are fully integrated into SAP HANA. It allows clients to access the SAP HANA system via HTTP. Controller applications can run completely natively on SAP HANA, without the need for an additional external application server.The following shows the SAP HANA XS server as part of the SAP HANA system.
The application services can be used to expose the database data model, with its tables, views and database procedures, to clients. This can be done in a declarative way using OData services or by writing native application-specific code that runs in the SAP HANA context . Also, you can use SAP HANA XS to build dynamic HTML5 UI applications.
In addition to exposing the data model, SAP HANA XS also hosts system services that are part of the SAP HANA system. The search service is an example of such a system application. No data is stored in the SAP HANA XS server itself. To read tables or views, to modify data or to execute SQLScript database procedures and calculations, it connects to the index server (or servers, in case of a distributed system).
SAP HANA-Based Applications
The possibility to run application-specific code in SAP HANA raises the question: What kind of logic should run where? Clearly, data-intensive and model-based calculations must be close to the data and, therefore, need to be executed in the index server, for instance, using SQLScript or the code of the specialized functional libraries.
The presentation (view) logic runs on the client – for example, as an HTML5 application in a Web browser or on a mobile device.
Native application-specific code, supported by SAP HANA Extended Application Services, can be used to provide a thin layer between the clients on one side, and the views, tables and procedures in the index server on the other side. Typical applications contain, for example, control flow logic based on request parameters, invoke views and stored procedures in the index server, and transform the results to the response format expected by the client.
The communication between the SAP HANA XS server and index server is optimized for high performance. However, performance is not the only reason why the SAP HANA XS server was integrated into SAP HANA. It also leads to simplified administration and a better development experience.
The SAP HANA XS server completes SAP HANA to make it a comprehensive development platform. With the SAP HANA XS server, developers can write SAP HANA-based applications that cover all server-side aspects, such as tables and database views, database procedures, server-side control logic, integration with external systems, and provisioning of HTTP-based services. The integration of the SAP HANA XS server into the SAP HANA system also helps to reduce cost of ownership, as all servers are installed, operated and updated as one system.