HANA smart data integration (SDI) is a native technology part of your HANA database to handle all styles of data integration. It can do data federation (aka smart data access), real-time data replication and also apply complex transformations on your data. SDI is part of each and every HANA database since HANA SPS09, and is available on the HANA Cloud Platform (HCP) as well. This blog series will focus on using SDI for HCP, in the "Useful links" section below you will find links to other generic SDI resources.
Use cases
- Data Federation: with data federation, you can expose onPremise sources like databases or even Hadoop to your HANA database in the cloud. The data is not physically moved to the cloud, but remains in its original source. Via virtual tables, the data becomes available in HANA queries. Obviously there can be latency issues, and this federation scenario is only useful for infrequent queries and low amounts of data. But it can be a first step before moving to replication.
- Data Replication: through a scheduled process, or in real-time physically move data from the original source into the HANA database on HCP. Real-time replication is possible for selected sources like databases (Oracle, DB2, SQL Server, ...) and Twitter. A change in these sources is replicated in (near) real-time to the HANA database in the cloud.
- Data Transformation: apply complex transformations on your data before storing it in HANA, or for data that is already available in HANA and needs further transformation. These transformations include SQL-like operations like join, filter and aggregate, as well as more complex operations like pivot your data (transform rows to columns and vice versa) or build in conditional logic (CASE transform). Another common data transformation scenario is history preserving to build up a history of changes (e.g. a delete in your source, would be transformed to an update to set a flag to inactive, but keep the record for history).
Architecture
SDI has two main components: the data provisioning server (dpserver) and the data provisioning agent.
The data provisioning server is a native server in your HANA database. All you need to do is activate the dpserver in your HANA configuration.
The data provisioning agent is a small component you need to install on-premise, on his agent the adapters are deployed, who will take care of the communication with the source systems. The agent will get the data from the source, next compress and send over HTTPS (encrypted) to HCP. In order to establish the communication between agent and server, a third small component is needed: a proxy server packages as a delivery unit to be imported in your HANA database. The same delivery unit will also provide you with monitoring capabilities.
Note that bi-directional data transfer is possible, so not only loading data into HCP, also write back to external targets.In both cases it is the agent that initiates the communication, so from a network point of view, the agent will always do outbound HTTPS calls. The response of such a call can be the data to be written back to the onPremise system. This means the agent can communicate with HCP without VPN tunnels or reverse proxy setup.
The picture below visualizes this architecture.
Supported sources
The SDI agent includes many adapters out-of-the-box. Some of these are real-time enabled, others can only do batch extraction. In addition to the out-of-the-box adapters, there's also a JAVA based adapter SDK available, which can be used to create custom adapters.Below is a list of available adapters, based on HANA SPS11 (latest HANA release available on HCP). This list is not exhaustive and will keep growing with every release, both with built-in adapters as with adapters delivered through the partner ecosystem.
Real-time enabled adapters:
- SAP ECC based on Oracle, DB2, SQL Server or ASE
- SAP (Sybase) ASE
- SAP HANA
- IBM DB2
- Microsoft SQL Server
- Oracle
- Teradata
- Twitter
Batch adapters:
- ABAP adapter
- Microsoft Excel
- File (delimited and fixed width)
- Hadoop Hive
- Odata
User Interface
As stated earlier, SDI is fully integrated in HANA, so also the user interfaces are the standard HANA design tools. This is primarily the HANA Web IDE where you can create the SDI objects like remote sources, virtual tables, flowgraphs, replication tasks etc. But also HANA Studio (with the HCP plugin) can be used to create SDI objects.
Licensing
The licensing model is always subject to change, so please check with your SAP account team for the latest status. However in general we can say that SDI licenses are included in in the "HCP, integration service" licenses. You will always need HANA on HCP as a pre-requisite, the license will enable you to download the data provisioning agent and deployment unit.
Trial access for SDI is not available today (June 2016), but is being worked on. I'll provide an update once this is available.
Useful links
- Official help portal on Smart Data Integration on http://help.sap.com/hana_options_eim
- Tutorial videos on HANA Academy: SAP HANA Academy - Smart Data Integration/Quality: An Overview Demo [SPS09] - YouTube
- Blog series on SDI : Hana Smart Data Integration - Overview
Where to go next:
- Step-by-step: Setup SDI for your HCP account (non-trial) - cominig soon
- Step-by-step: Setup SDI for your HCP trial account - coming soon