MultiProvider vs CompositeProvider

Classic MultiProvider vs. CompositeProvider

This article comes as a continuation of the previous article based on the difference between the old objects in the SAP and the new objects created in SAP BW on HANA. The last article talked about the classic DSO (DataStore Object) and the new ADSO (Advanced DSO), that replaces 4 of the classic objects in SAP: InfoCube, DSO, Hybrid Provider and PSA.

In this article we continue the same theme referring to different topics. The subject of this article will be the difference between Multi Provider and CompositeProvider.

At first, I will talk about the classic Multi Provider and its functionalities. Then I’ll talk about the CompositeProvider and the differences between it and the Multi Provider.

The Classic MultiProvider

What is a MultiProvider?

A Multi Provider is an InfoProvider that combines data from multiple InfoProviders and makes it available for reporting. It does not contain any data: its data comes entirely from the InfoProviders on which it is based.

The MultiProvider is most used to create queries based on multiple InfoProvider. Sap BI supports queries based on a single InfoProvider. The best way to avoid loading data from an InfoProvider to another and so on, is to create a MultiProvider based on the Provider you need for the query.

Operation used in a MultiProvider

There are 2 main operations which can be run when using a Multiprovider. They are listed below accompanied by the necessarily explanations:

1.Union

Is used to combine data from InfoProviders in a MultiProvider. The system constructs the union set of the data sets involved. All the values of this data sets are combined. The MultiProvider can be based on combinations of 2 types of InfoProviders:

  • InfoProviders that contain data: InfoObject, InfoCube, DataStore Object
  • InfoProviders that doesn’t contain data: InfoSet, Aggregation Levels, Virtual Provider

MultiProvider: the Union operation

2.Scenario:

For example, you can combine data from 2 InfoCubes. One Infocube is for Actual data and the second is for Plan data.

MultiProvider: the Scenario operation

By using MultiProvider we can compare data from these two InfoProviders. When you want to create a MultiProvider you can use multiple InfoProviders and any type of InfoProvider is accepted. You can analyze and report the data based on this Multiproviders. If we want to create a query based on a MultiProvider is important to know that, this query is divided internally into subqueries. There is a subquery for each InfoProvider included in the MultiProvider. These subqueries are usually processed in parallel.

Now, that we through with the MultiProvider topic, let’s take a closer look at the other concept of this article: the CompositeProvider.

CompositeProvider

What is a CompositeProvider?

The CompositeProvider is an Info Provider, which combines data from several Info Providers and makes this data available for reporting and analysis, using an SAP HANA database. These new InfoProvider forms the Virtual Data Mart layer in the BW on HANA system.

The CompositeProvider consolidates the number of InfoProviders types and harmonizes the modeling of mixed BW on HANA scenarios:

CompositeProvider

The CompositeProvider is used for:

  • Interface for reporting objects
  • Combining providers with analytic indexes
  • As an alternative to info-sets for joining data
  • As an alternative to MultiProvider to union data

Operations used in a CompositeProvider

Just like the MultiProvider, the CompositeProvider emphasizes 2 mainly operations which can be run when using it. The operations and their features are listed below as following:

1.Union of BW InfoProviders and Hana model

If we use the union operation is no runtime difference between the CompositeProvider and the MultiProvider. The analytic manager inside BW is good optimized for this kind of operation. When referring to the Union of a CompositeProvider, it must be underlined that:

Within this operation a few restrictions have to be considered:

  • Open ODS Views that you want to use in CompositeProviders can only contain fields with InfoObject-compatible data type
  • Semantically partitioned objects can only be used for the union. They cannot be used in the join, as all semantically partitioned objects contain unions.

2.Join of BW InfoProviders and Hana model

As mentioned above the CompositeProviders can be based either on: InfoCube, DataStore object, InfoObject, or Semantically Partitioned Objects.

In order for you to better understand how join operations work, let’s consider the following Scenario:

When we create a CompositeProvider based on two or more InfoProviders, we must select, if we choose to join two InfoProvider, the type of Join you want to use. There are two types of Join we can use in a CompositeProvider: the Inner and the Left Outer Join

  1. Inner Join: Returns all records that matched in both InfoProviders.
  2. Left outer Join: Returns all records from the left InfoProvider and the matched record from the right InfoProvider.

After the choice of the joins type, in the Scenario tab of the CompositeProvider, we must define a Join Condition Field. To create a Join Condition right click on a field and select “Create Join Condition Field…”

Join of BW InfoProviders and Hana model

Now that we learned which the operations used in a CompositeProvider are, let’s see how we can create one.

How to create a CompositeProvider

The editor for creating a CompositeProvider is based on Eclipse and it is a part of BW modeling tools. The following steps will guide you accomplishing this process.

Step 1

 Open Eclipse and go to an Info Area. Then right click and select new and CompositeProvider.

Step 2

Write a name and an appropriate description to the CompositeProvider.

How to Create CompositeProvider

Step 3

 Select the InfoProviders that you want to join/unite.

CompositeProvider select InfoProviders

Step 4

 The CompositeProvider will have 3 Tabs (General Tab, Scenario Tab, Output).

  • General Tab: In this Tab you can choose the CompositeProvider description
  • Scenario Tab: In this Tab you can add InfoProviders and Hana Views to the CompositeProvider.

3 Tabs CompositeProviders

Also, in this tab you can select the primary key common joins, by right click on the field which you want to join:

CompositeProvider common joins

In the Target section are the fields that you want to see in the output.

  • In the Output Tab are the fields which we had selected to the output.

CompositeProvider output

Step 5) Activate the CompositeProvider

Fields in a CompositeProvider

The fields of CompositeProvider can be associated with an info-object or with an open ODS view. This will give you access not only to the navigation attributes available for selection for the output structure of the CompositeProvider, but it will also give you access to master data at report runtime. In conclusion, we can save time by using fields instead of modeling with InfoObjects.

When fields are assigned to the CompositeProvider, the associations are automatically set.

Modeling scenario using CompositeProvider:

Now that we managed to create a CompositeProvider, let us work based on scenario. The scenario is the following:

Track the sales information in a company based on customer and the product sold.

  1. We begin by creating a CompositeProvider to combine data from 3 DSO (Product master data DSO, Customer master data DSO and Sales information DSO)
  2. In this case we don’t need to create an additional Infocube to have all data available for reporting
  3. The data from the sales information DSO will be taken in the CompositeProvider using Union operation
  4. We can add the customer and the product master data to the CompositeProvider using Join operation.
  5. In this way, the CompositeProvider will contain all the data from Sales DSO with the corresponding attributes (Product and Customer).

Modelling Scenario CompositeProvider

Considering this scenario, using a CompositeProvider helped us to avoid additional loading of data, because we store the data once and use it in different situations.

To sum up, I have to underline that a CompositeProvider give us the huge possibility to combine data either with JOIN or Union!

How to convert a MultiProvider to a CompositeProvider

The method below helps us to convert a MultiProvider to a CompositeProvider.

The Procedure is the following:

Step 1

Go to transaction se38 and execute the Program RSO_CONVERT_IPRO_TO_HCPR

 

Convert MultiProvider CompositeProvider

Step 2

Select the MultiProvider which we intend to  convert into CompositeProvider. Then write a name for the new created CompositeProvider.

 

Name your CompositeProvider from MultiProvider

Note! The MultiProvider and the CompositeProvider will have the same name, so that no Queries or Workbooks will get affected.

Step 3

This program will only work if we create a backup to the MultiProvider.

Note! If you execute the program without a backup you will receive this error message:

“Conversion is not allowed without backup”

Step 4

Write a backup InfoProvider name and execute in Simulate mode to check if the MultiProvider can be converted to the CompositeProvider.

Backup InfoProvider Name

If you received the following message, “CompositeProvider is consistent” , it means that the conversion between the InfoProviders can be done.

Step 5) After doing the test in simulated mode we execute the program and we choose “Transfer InfoProvider and copy queries”.

Step 6) Select the queries that you want to backup.

 

Beckup queries CompositeProviders

 

Step 7) We can select a prefix for the name of the query; in this way the query will have the name of the InfoProvider as a prefix.

 

Prefix query name CompositeProvider

With this method no Queries or workbooks will be affected!

Conclusion

The new CompositeProvider brings many advantages. Reconsidering the information within this article, you may notice, that the biggest advantage is the combination of classic BW objects, Hana objects and HANA Views with either Join or Union operation. These 2 operations, Union and Join, are executed in HANA and not on the application server. Another benefit of CompositeProvider is the query execution, which is pushed down to the HANA database. Moreover, the CompositeProvider enables a faster and simpler data modeling, because it replaces the Multiprovider and the Infoset. Another point to be mentioned it the support of the input parameters when Open ODS views are used as source objects. Last, but not least, the loading time of huge amounts of data through several layers in BW is now reduced, because the CompositeProvider can be modeled using the DSOs in the EDW layer itself.

This being said, I hope this article was useful for you. Now, all you have to do is start implementing the new acquired information and also start using the CompositeProvider. Enjoy!

Source of images: SAP SE

Author
Roxana Hategan Associate
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de
organize-your-work-with-abapgit-inspiricon

Organize your work with abapGit

What is abapGit and how does it work? 

Assuming you are already familiar with ABAP programming but have not heard about abapGit, let us first shortly introduce you to the Git world.  

Git is a version control system, „VCS”, which gives us the possibility to follow changes in a file. Yet, you may be wondering, what is a VCS and what is its purpose? 

A Version Control System, also known as a Revision Control, is a system that gathers and stores the history of changes in any collection of files or source codes during their lifetime cycle. Each change is assigned to a „revision number” and has a bookmark that includes when and who made that specific change.  

Versioning procedures have become more and more necessary in the field of computer science. Significantly, a version control system allows us to go back to or restore an older version, in case there is an undesirable change in or deletion of a software program. 

ABAP itself also brings in a version control system, but this system is closed in its own environment, hence it does not allow working together with “outsiders”. This happens particularly for opensource projects, but also in general, for all source codes, and for that reason, sharing is quite complicate. 

If you are interested to learn more about Git, we highly recommend you the book “Pro Git” written by Scott Chacon and Ben Straub. It is available under https://git-scm.com/book/en/v2. 

In the landscape of SAP development, abapGit comes as the Git client written in ABAP. It is an Open Source project initiated by Lars Hvam under an MIT License. You can find his blog posts on https://people.sap.com/lars.hvam. 

More helpful information about abapGit itself and other great ABAP open source projects can be found in the great blog of Graham Robinson, which is also the basis for the article today: https://blogs.sap.com/2017/06/21/abapgit-so-easy/. 

In this article we will focus on the installation and utilization of the ABAP Git using GitHub and AWS CodeCommit. In the following procedures, the environment used was SAP NetWeaver 7.5. Now let’s take off on our journey. 

Four steps away from the abapGit experience 

In order to be well equipped to explore the abapGit features, further are the steps needed for setting up and connecting to the development platform and hosting services. 

Step 1: abapGit Server installation 

First and foremost, all we need to do is to run the abapGit project on our ABAP development system. 

Considering you already have access to an SAP NetWeaver installation, open the SAP GUI client and create an ABAP Report “ZABAPGIT” via SE38 or SE80 transactions. Then replicate the code from this link: https://raw.githubusercontent.com/abapGit/build/master/zabapgit.abap  

Activate and execute your report. Consequently, the abapGit Homepage will be displayed on screen.  

abap-Git-homepage

Please keep in mind that if you want to update your abapGit project to a newer version, you just need to replace the code in the report with the most recent settings and the upgrade is done. 

In other words, this is a straightforward manner through which you can download the most recent version of the abapGit whensoever. Easy, right? 

Step 2: GitHub Root Certificates download 

In this next step, we will carry on with the SSL setup which allows us to use the online feature of the Git server. As opposed to, offline projects operate behind firewalls and without SSL. An SSL, “Secure Sockets Layer”, certificate is a text file with encrypted data that we need to install on our server for establishing a secure, reliable and responsive communication with a web browser.  

Nowadays, the most popular web-based community hosted by Git is GitHub. To enable the communication between the abapGit and GitHub, it is necessary to install some root certificates on your ABAP system. We have several options for downloading the files depending on the web browser we use, otherwise, we can download them manually. The purpose of these root certificates is to assure the integrity and privacy of the communication data between the server and the web browser. 

We have used Google Chrome as the main web browser. Proceed with the following steps to find the necessary certificates:  

  • Go to https://github.com. 
  • Click on the locker icon near the address bar and then select “Certificate”. 
  • Go to the “Details” tab and click on “Copy to file…” to export the certificate to a .CER file. 
  • Specify the name of the file, choose the path to which the file will be saved and press “Finish”. 
  • Select the “Certification Path” tab and repeat the two previous steps starting from the parent node of the tree until the root node. 

If you want to manually download the root certificates, navigate to the GitHub platform and find the certificates that it is using, as shown for the above browser. Download these certificates from https://www.digicert.com/digicert-root-certificates.htm by accessing the “Root Certificates” section. 

You can find more information here: http://docs.abapgit.org/. 

We have now the files in the format wanted and, consequently, we can establish the connection with the SAP system. 

Step 3: SSL Setup using SAP Trust Manager 

For the previously downloaded certificates to be acknowledged, we need to install them in the SAP system using Trust Manager application. Therefore, log on to your SAP GUI system and run the STRUST transaction.  

Make a switch to the “Change” mode and from the top-left objects list, open the “SSL System Client SSL Client (Anonymous)” folder. 

SSL-SAP-Trust-Manager

Choose Import certificate” from the bottom-left side of the Certificate” box and insert the file paths of the certification files exported earlier from the web browser. 

abapGit-add-to-certificate

Select “Add to Certificate List” for each certification file separately, one by one. As a result, the “Certificate List” box should look as follows: 

abapGit-certificate-list

Save the changes made. The root certificates are now installed on the SAP development system.  

In order to check if the ABAP tools can communicate with the Git server, it is compelling to test if the connection between them works properly. To give an illustration for that, create an ABAP Report “ZABAPGIT_TEST_SSL” via SE38 or SE80 and copy the code from this link: http://docs.abapgit.org/other-test-ssl.html

Activate and execute your program. If the connection has been successfully established, following output is expected:

abapGit-test-ssl

However, if the connection is not working properly, consider the fact that you may have to set two profile parameters in the RZ10 transaction (see also SAP Note 510007 – Setting up SSL on Application Server ABAP, step 7).  

For instance, in our system we had to set the following values: 

ssl/ciphersuites = 135:PFS:HIGH::EC_P256:EC_HIGH 

ssl/client_ciphersuites=150:PFS:HIGH::EC_P256:EC_HIGH 

Please keep in mind that you need to restart the application server or at least, in the ICM Monitor, SMICM transaction, to restart the icman process! 

Dive deeper into the subject by going to https://blogs.sap.com/2008/10/31/calling-webservices-from-abap-via-httpsssl-with-pfx-certificates/. 

Step 4: Git Repositories connection 

Keep your installed abapGit code up to date with the most recent developments and extend the supported object types by downloading the required plugins. 

Particularly, when abapGit is run for the first time, the abapGit tutorial page will be displayed, as shown in Step 1. On the bottom side of the page, you can find the “abapGit related repositories” section. 

abapGit-repositories

 

Click on the „install abapGit repo” to start the process. Press „Continue” to download the current version of abapGit into package $ABAPGIT, which in this case is a local package. Choose „Overwrite” to update the „ZABAPGIT” program created previously and activate all the abapGit artifacts. 

In a similar manner, choose „install abapGit repo”, proceed the steps and you will be then ready to experience the abapGit features. 

abapGit-Funktionen

An important thing to be detailed here is that not only GitHub platform, which is a web-based hosted service for version control, can be used to connect with the abapGit, but there is also the possibility to communicate using the Amazon Web Services. 

We as Inspiricon work a lot with the Amazon Web Services, so for us, it would be the best to use the “AWS Service CodeCommit” as Git-Server for our repositories. If you want this as well, the first thing you might want to do after the installation is to connect to the Amazon Web Services, so you can leverage the online feature of the abapGit platform. 

Amazon Web Services, shortened as “AWS”, offers reliable, scalable and secure cloud computing services, database storage and other functionalities to help in the business processes. For connecting to the AWS Server, you need to follow several steps.  

The first steps are similar to the previous ones presented for connecting to the GitHub server, including the download and upload of the root certificates, as shown in Step 2 and Step 3. Next thing is to create IAM-Users in the AWS. It is important that you create in the IAM-User profile “HTTPS Git credentials for AWS CodeCommit”.  

As privileges you should assign at a minimum the following permissions: 

  • AWSCodeCommitPowerUser AWS Managed Policy 
  • IAMReadOnlyAccess AWS Managed Policy 
  • IAMSelfManageServiceSpecificCredentials AWS Managed Policy 

For more information, you can also go to https://docs.aws.amazon.com/codecommit/latest/userguide/setting-up-gc.html. 

After the initial configuration is made, you can create your own repositories in the AWS CodeCommit Dashboard.  

Online and Offline? 

First and most importantly, the abapGit project can be started in the SAP NetWeaver platform using the transaction “ZABAPGIT”. abapGit comes in with the Online and the Offline features. On the one hand, the offline feature refers to replicating a local ABAP package from your system into the abapGit workspace. On the other hand, the online feature allows us to work with repositories. Hence, what it is a Repository? 

A repository, frequently abbreviated as “repo”, is a location where all the files for a single project are stored. Each project has its own repository and it can be accessed by using a unique HTTPS-Url. We need to connect this repository to an existing ABAP package from our system or to create a new one. 

Conclusion 

An increasing number of companies and developers around the world have decided to make a switch and use Git as the main control software. In the landscape of the SAP industry, abapGit comes to the help of developers in the form of an ABAP client written in ABAP which completely benefits from the Git functionalities.  

AbapGit is free, fast and provides a consistent, scalable model. Another big plus is that abapGit automatically pushes changed objects to the remote Git repository, being, in the same time, an environment which facilitates the process of parallel development. It is best suited for code sharing and for tracking changes in the development process. 

Open source projects and code sharing beyond company boundaries are a good way to participate to the open source community, to learn and improve your own skills and find new solutions for different actual problems. Furthermore, all these are supported by Git and abapGIT! 

Do you feel enthusiastic to find out more about the subject? We would be very happy to keep you informed with the updates and to present you many more other interesting topics and projects we are also involved in. Stay in touch! 

Author
Andra Atanasoaie Associate
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de
Inspiricon-Persistence-Layers-in-SAP-HANA

The 5 W’s of the Persistence Layer in SAP HANA

Imagine this scenario: you sit down with a fresh cup of coffee in front of your nice shiny new HANA database, which has been delivering results at record-breaking speed, much to the satisfaction and dismay of all your business users…  

Then suddenly, between sips, the unspeakable happens! Disaster strikes! Lights out! Power off! A crash! HANA’s in-memory data fades out in the middle of processing… How could this have happened?!  

More importantly, how to recover all that data and restore it to its last committed state?! 

To be prepared for this eventuality, and to prevent in-memory data loss and avoid major efforts in data recovery/restoration afterwards, SAP HANA employs a so-called “Persistence Layer”.  

Wait a minute, you may be saying to yourself, what’s all this about “persistence”? Isn’t SAP HANA supposed to be so much faster exactly because it is NOT persistent, and instead relies on in-memory technology?  

Well, in fact SAP HANA does also depend on data persistence. Here’s a brief overview of the built-in SAP HANA Persistence Layer. 

The 5 W’s: Who-What-Where-When-Why? 

Who? 

SAP HANA developers and consultants are the target audience for this basic overview of the Persistence Layer in the SAP HANA database architecture. 

What? 

The fact is, the Persistence Layer is built into the HANA database. It’s important not to confuse this “Persistence Layer” with the option you have as a developer to create “Persistent HANA Views” – but that’s a topic for another day and another blog!  

The storage of data on disk is called “persistence”. This is in contrast to the data temporarily residing in RAM, the so-called “in-memory” data, which can be accessed with dramatically faster response times – which is the basis of SAP HANA’s in-memory technology!  

The Persistence Layer is used to physically store data in the form of so-called “savepoints” at regular intervals of every X seconds (default configuration = 300 seconds, which is 5 minutes). These savepoints contain the last transactions written to the persistent storage on data volumes in the HANA database. In addition, SAP HANA stores logs of ongoing data transactions. In the event of in-memory data loss, the latest savepoint is used together with the logged transactions that occurred since that savepoint to “reconstruct” or restore the database to the last committed state of the data.  

Where? 

The Persistence Layer is located inside the Index Server in the SAP HANA database. It’s best viewed in the context of the In-Memory Computing Engine (IMCE): 

Inspiricon_In-Memory-Computing-Engine

 

 

 

 

 

 

 

 

 

 

 

 

When? 

The Persistence Layer is configured by the SAP Basis Team during the original installation and setup of the SAP HANA database. The necessary parameters are stored in the global.ini file. These can be accessed directly or via the SAP HANA Cockpit under the Persistence section. It is possible to adjust these parameters at any time to optimize data recovery or system performance. For example, the frequency of the savepoint is determined by the configuration parameter savepoint_interval_s, which can be easily changed from the default value of 300 seconds (5 minutes). Since the savepoint involves writing data to disk, there is a performance cost/benefit to consider. 

Why? 

Why do we need a Persistence Layer in SAP HANA? For exactly the reasons implied in our original scenario above: To secure – or “persist” – your in-memory data in the event of a power outage or other system crash scenario. 

In addition, the SAP HANA Persistence Layer helps to fulfill the “A-C-I-D” properties for a database: Atomicity – Consistency – Isolation – Durability. The Persistence Layer ensures that the database transactions remain ACID by making sure that each transaction is either completely executed or completely rolled back (atomicity and consistency), and restoring the data to the most recent committed state after a restart (durability).  

Oh yes, there’s also the “H” question… 

How? 

This is the critical question: How does it work? 

The deeper technical details are documented in many Internet sources, but the basic steps can be outlined as follows: 

Inspiricon_SAP-HANA-Persistence-Layer

 

 

 

 

  • The SAP in-memory database holds the bulk of its data in memory for maximum performance, and writing data to disk is minimized. 
  • SAP HANA still depends on persistent storage to provide a fallback in case of power failure or other unplanned system shutdowns. 
  • The HANA system uses a combination of “write-ahead” or redo transaction logs and the last savepoint to recreate the last committed state of the HANA database after such a sudden shutdown.   
  • When a database transaction occurs “in memory”, the changed memory pages are captured by the redo logs before those changes are actually applied to the physical database volume at the next write to disk. These “write-ahead” logs can be used for both redo and undo operations, thus ensuring a complete rollforward or rollback of individual transactions. 
  • The in-memory data and the transactional log information are automatically saved to disk at regular savepoint intervals. 
  • The in-memory pages containing data are written to the data volume to persist them.  
  •  The redo log entries are written to the transactional log volumes for all changes to persistent data that occurred in memory and since the last savepoint.  
  • The data belonging to a savepoint is the last consistent state of the data stored on disk, and it remains there until the next savepoint operation has completed.  
  • In the event of a database restart, data from the last completed savepoint can be read from the data volumes, and the redo log entries are then applied or replayed, so that the last logged changes to the data are “rolled forward” and written to the log volumes to reconstruct the data to its last consistent state following a HANA database restart. 

To Sum It All Up: 

The Persistence Layer in the SAP HANA database can save you the agony of in-memory data loss after an unexpected power interruption or other type of system crash, and it plays a major role in ensuring the successful long-term fail-safe operation of your SAP HANA database!  

For any questions or help needed in implementing your SAP BW on HANA or BW/4HANA system in your organization, please feel free to contact us at Inspiricon AG!  

SAVE THE DATE: You can find out more about this and other related SAP HANA topics in our upcoming webinar on “Migration to BW/4HANA”. Please secure your space under this link today! 

Source of images: sapstudent.com, tutorialspoint.com

Author
Andrea Taylor Senior Consultant SAP BI
Phone: +49 (0) 7031 714 660 0
Email: info@inspiricon.de
Machine Learning with SAP Hana and SAP Fiori-Inspiricon

Machine Learning with SAP Hana and SAP Fiori

What is Machine Learning and why is it important?

Well, first of all it is nowadays a hype. So, it is important to at least to know what the rest of the world is talking about.

And the reason why it is a hype, is because machine learning is bringing huge advances in various fields. It gives computers the possibility not only to perform certain tasks, but it also enables a computer to first learn the rules of performing a given task (learn from experience, from historical data).

Let us take the healthcare field for example, machine learning algorithms are successfully used to spot signs of various sever illnesses (breast cancer for example) as early as possible and reduce the risk on the patients.

Financial institutions also use machine learning algorithms for fraud detection and to combat money laundering. These algorithms are able to analyse millions of transactions and point out those that indicate suspicious patterns.

In the online security field, machine learning algorithms are used to track suspicious behaviour and detect privacy intrusions.

And also, we should not forget that we all use machine learning in our daily lives. Whether it is Siri we summon on our Apple device or Alexa on our home pod, whether we use Social Networks in the internet, or the Google Maps in our car, the core of these systems is powered by machine learning algorithms.

And in the daily operating business of companies, machine learning algorithms are automating basic tasks that would otherwise be done manually, like analysing invoices to detect duplicates, orders, etc. …

In the field of BI, one of the reasons why machine learning is important is because it is a part of the techniques used in predictive analytics. This gives employees the possibilities to predict certain results in the future. Sales people for example can make predictions of their sales volume, managers can evaluate multiple predictions of how certain decisions might impact future results, and make their decision based on these.

How does it work?

Let’s take a look at this quiz:

  • 2 → 4
  • 3 → 9
  • 5 → 25
  • 6 → ?

Now why have you been able to figure out that 36 is the right answer? Because you have recognized a pattern. And that is exactly what machine algorithms are doing. They are trained on sets of sample data where they are learning to recognize patterns and match these patterns to the correct responses (they are Learning from Experience). After the training we can query the algorithm for a response by providing it with a new set of data and what we get is (hopefully) an accurate response.

The machine learning algorithms are designed to work on problems much more complex than the quiz presented above, with a great number of input dimensions. This enables them to perform complex task like image or speak recognition, or forecasting some potential sales results based on complex historical market data.

Machine Learning with SAP Hana and Fiori

Since Machine Learning is such a hot topic, it generates a lot of curiosity and desire to experiment, and that was also the case for us. We at Inspiricon became curios how this new field could bring added values to area we are already working in, which includes BI, SAP Fiori and SAP Hana.

Well, it turns out SAP Hana has already a pretty robust support for Machine Learning. SAP provides the SAP Hana Predictive Analytics Library which offers the possibility to use machine learning algorithms and even build neural networks. Combining the power of this with SAP Fiori, it is possible to build some interesting applications in the field of Predictive Analytics. For example, we were able to build a small Fiori Application to predict the daily and monthly Sales figures for individual stores within a supermarket chain. Following illustration shows a rough overview of an architecture for this application:

Architecture_Application

The Fiori Application we have developed would be targeted to managers, and they would be able to explore the forecast until the end of the year from within the Fiori Application. Even more, we are experimenting further with this scenario and investigate how to extend it with other features like the integration of What-If Scenarios, such that one can investigate how certain management or marketing decisions (like promotions) can influence the predicted sales:

Fiori_Predictive_Demo

Conclusion

Machine learning can already be tackled with a simple Hana backend!

While there are powerful big tools out there like Tensorflow for neural networks or SAP Predictive Analytics, what is important to know is, that these are not necessarily mandatory in order to approach the topic. As explained above, SAP Hana already provides the means to build such approaches and with SAP Fiori it is possible to build an UI Application tailored for the specific scenario that is implemented. And the preliminary data analysis can be performed with powerful data analytics tools that are available for free for Python (Pandas) or R. So, with no additional cost in licensing or infrastructure this is can be a very attractive approach, especially for smaller problems that do not require intensive data processing.

Which approach is finally chosen however, depends on the specific use-case and shall be properly be evaluated by the development team. The maintenance of the solution and the license cost will also be an important factor for the owner of the solution and must be taken into account when making a decision.

Image sources: Inspiricon

Author
Gerald Iakobinyi-Pich Solution Architect
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de
Classic DataStore Object

Classic DataStore Object vs. Advanced DataStore Object

There have been many architecture level changes in SAP BW/4HANA. One of this change are data modeling based.

In this article we will walk through the various features and capabilities of ADSOs, as well as explore how these capabilities help to optimize various tasks in your SAP BW environment.

At first, we will talk about the classic DSO and his features. After that I will present you the differences between the classic DSO and the new implemented ADSO.

DSO (Data Store Object)

What is DSO?

A DSO is a two-dimensional storage unit which mainly stores transaction data or master data on a lowest granularity. The data is stored at detailed level.

Types of DSO

When creating a DSO, you must choose the type:

dso

When we create a DSO, the system sets a system ID of ‘SIDs Generation upon Activation ‘by default. This option can be found in the edit mode settings of a DSO. If we checked this option, the system will check the SID values for all the characteristics in the DSO. If a SID value for the characteristic doesn’t exist, the system will then generate the SIDs. If the SIDs are generated during the Activation, this process will help the system to improve the runtime performance of a query. In this way the system doesn’t have to generate SID’s at query runtime. SID values are always stored in SID table of a InfoObject. Using this SID, the attributes and texts of a master data InfoObject is accessed. The SID table is connected to the associated master data tables via the char key.

The following Table shows you the properties of the different DSO types and architecture:

table-DSO-types

ADSO (Advanced Data Store Object)

The Advanced DSO manages to replace all these objects.

BW4HANA-Modeling-Objects

Before we create an ADSO we must know that it includes 3 main tables:

  1. Inbound Table
    • Activation queue table for classic DSO
    • Uncompressed fact table of non-SAP HANA optimized InfoCube
    • All records with are stored with a technical key
  2. Table of Active Data
    • Same as classic DSO, contains the current values after activation.  The key of the table is the DSO-Key (more about keys later)
    • Compressed fact table of non-SAP HANA optimized InfoCube
  3. Change Log
    • Same as classic DSO
    • Stores the difference between Inbound and Active-table
    • Needed for Delta-generation

Important Steps in creating a ADSO

We create an ADSO in the BWMT in Eclipse like all new Objects (in BW 7.5 you have the possibility top create the classical objects still in SAP GUI, in BW4HANA you can create only the new objects in BWMT).

In the General tab you will be able to configure activation settings and other property. At first the user must write a description. After that we have the possibility to choose a Model Template. In the ADSO you can behave like either one of the objects from classic BW:

template-adso

  • Acquisition Layer

In this layer you can create objects that cover the “write-optimized” use cases for classic DSO. It is divided into 3 other layers:

  1. Data Acquisition Layer
    • Corresponds to a persistent staging area (PSA) and acts as an incoming storage area in BW for data from source systems
    • No use of Active Table, so activation is not needed
    • Requests will be loaded into and extracted from the inbound table
    • All the records in the Inbound Table contain a Request Transaction Number (TSN), Data packet, and Data record number
    • The inbound (Old name = New Data / Activation Queue Table) table is accessed to execute a BEx query and for extraction
    • Data doesn’t get aggregated
  2. Corporate memory with compression feature
    • Requests will still be loaded into the inbound table
    • Old requests that are no longer needed on detailed level can be compressed into the active data table.
    • To save memory space, the CM – compression ADSO doesn’t use a Change Log table, only an Inbound Table and an Active Data Table.
    • As soon as a load request is activated, the system loads the data into the Active Table and deletes it from the Inbound Table
    • If there are 2 records with the same key, BW/4HANA overwrites all the characteristics of the record with the characteristics of the lastly loaded record.
  3. Corporate memory with reporting option
    • A difference between this template and the “Corporate memory with compression feature” template is that, the system does not erase data from the Inbound Table. Instead, the data also remain in the Inbound Table so that none of the technical information is lost.
    • The CM reporting template has no Change Log though
    • Another difference is that the data is not extracted from the Active Table but from the Inbound Table
    • Because the data remain in the Inbound Table after activation, these ADSOs are a good solution for you when you want to store data but save space by not using a Change Log
  • Propagation Layer
    • Provides a basis for further distribution and reuse of data
    • Corresponds to a standard DataStore object (classic)
    • Requests will be loaded into the inbound table
    • For reporting the user must activate the loaded requests
    • The data is then transferred into the active data table and the deta is stored in the change log
    • The change log is also used to rollback already activated request
  • Reporting Layer
    • Used to perform queries for analysis
    • Corresponds to a standard InfoCube
    • The inbound table acts as “F”-table and the active data table as “E”-table
    • It does not have a Change Log. If the Change log table do not exist the Delta process cannot be done.
    • After activation, the Inbound Table is empty
    • The user reports on a union of the inbound table and the active data table
  • Planning Layer

It splits in 2 other layers:

  1. Planning on Direct Update
    • Data is automatically loaded into the Active table, so no need for activation
    • It has no Change Log or Inbound Table
    • You can fill the Active table with an API
    • also load data to this type of ADSO using a DTP
    • Only have an Overwrite option. No summation of key figures like there is in the Planning on Cube-like ADSO
  2. Planning on Cube-like
    • Has an Inbound Table and an Active Table
    • All characteristic fields are marked as key fields in the Active Table, which is a necessary requirement to make it suitable for planning.
    • Only have an Summation option

Process of SID generation highly optimized for HANA

With the goal to optimize the performance, in BW/4HANA it is possible to set a flag not only on InfoProvider level, but individually per characteristic of the DSO. The data integrity check then is only executed on the selected characteristic.

InfoObjects/Fields

As a new feature, you can use fields with simple data types instead of InfoObject. To do so, go to the Details tab and click the Add Field button. Under Identify, you can specify in the “With” dropdown menu whether you want to use an InfoObject or a Field for the definition.

InfoObject

In BW the user can choose whether he is modeling with InfoObjects or fields. Modelling with InfoObjects brings extra effort, but also brings a lot of advantages. Before you choose one of this option, you should consider the advantages and the disadvantages of both of this modeling options.

In the following I will present you a part of the advantages and disadvantages when you choose the option of modeling with fields:

Advantages when using fields:

  • If the query contains fields, it can be processed key-based in SAP HANA
  • Using fields can enhance the flexibility and range of the data warehouse, when the data volume is small.

Disadvantages when using fields

  • The services for InfoObjects (attributes and hierarchies for example) are not available for fields.
  • Validity characteristics for DataStore objects (advanced) with non-cumulative key figures must be InfoObjects.
  • InfoObject attributes must be InfoObjects
  • A field-based key figure cannot be an exception aggregation
  • Planning queries on DataStore objects (advanced) are only supported with fields as read-only
  • If fields are used in the query, the InfoProviders can only be read sequentially
  • In the query on a CompositeProvider, not all data types for fields are supported (ex. maximum length for fields is 20 characters)

Defining Keys for a ADSO
Also, in this tab we select which fields make up the keys of our ADSO. To define a key, click on Manage Keys button.

fields-adso

Key Fields

There are 2 types of keys: Primary  and Foreign key

Advantages of using Key fields:

  • uniquely identify a record in a table.
  • Key Fields cannot be NULL
  • Used to link two tables
  • Main purpose of a foreign key is data validation.
  • Read Master Data: using the input field value as a key, you can read the value of a Characteristic attribute belonging to a specified Characteristic
  • Read from advanced DataStore: using the input field value(s) as a (compounded) key, you can read the data fields of a specified advanced DataStore Object (DSO)
  • Somethings that you don’t wish for is that, 2 records with the same key, BW/4HANA overwrites all the characteristics of the record with the characteristics of the lastly loaded record

Disadvantage of not using Key fields:

  • Records are not uniquely identified =>Duplicates records allowed
  • Performance affected

Benefits of using a ADSO instead of a classic DSO:

  • Simplification of object types
    • Can behave like 4 Objects from the classic BW
  • Flexibility in data modeling
    • Modeling your ADSO using the Reporting Layer settings
  • Performance of data loads and activation is optimized for HANA as ADSO is a HANA native object.

Source of images: SAP SE, Inspiricon AG

Author
Roxana Hategan Associate
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de
ABAB-eclipse-tool

Take advantage of ABAP Development Tools in Eclipse

When did it all start?

ABAP-1992

As time passed by, solutions have constantly improved and so did ABAP development tools.

Independent from the classical SAP GUI-based development environment, there is also the possibility to develop ABAP applications in Eclipse.

ABAP-2009

It has been provided a great development experience on boarding of an Integrated Development Environment (IDE).

ABAB-switch-to-Eclipse

In Eclipse you can use several tools in one program. Add-ons like SAP HANA Database Studio, ABAP Development, SAPUI5, BW Modeling Tools and many more were incorporated as a delight of each SAP consultant.In this article, we will focus on the installation and utilization of ABAP Development Tools (ADT) within Eclipse. Before starting with the induction, let me give you the background of our environments.

Prerequisites

  • Eclipse IDE – get a suitable installation to stand the desired tools

Note that support is no longer maintained for Mars version. It is recommended to use Oxygen or Neon.

  • JRE (Java Runtime Environment) 1.6 or higher
  • SAP GUI 7.2 or above
  • Operating system (OS): Windows 7, Mac, Linux

Useful links:

In the following procedures, it was used Eclipse Neon.3 Release (4.6.3) and SAP GUI 7.4 on Windows 7 Professional OS, 64 bit.

Setting the ABAP Development Tools (ADT)

Assuming you already have SAP GUI and Eclipse installed on your computer, let us explore the ABAP world.  In order to install the plugin needed for ABAP development, in the Eclipse starts up window, go to Help menu and choose Install New Software.

Install-new-software-eclipse-5

We should now provide the location of the ABAP Development Tool (ADT). Press Add and introduce the path of the required packages.

select-site-location-eclipse-6

browsed-location-eclipse

Considering Eclipse is a platform delivered as an integrated place for many Add-Ons, we will take advantage and download all the components needed to rejoice ABAP on HANA development.

filter-text-eclipse

All you have to do is to accept the license agreement and press Finish to start the installation. If you already have some of the components installed, an update will be performed instead.

installing-software-eclipse

After the packages are downloaded and installed, Eclipse must be restarted.

restart-eclipse

Eclipse Welcome page should appear after the Restart process.

Do not worry if the content is distinct. It usually varies from one product to the other and it only presents an overview of the product and its features.

overview-eclipse

Since the installation is complete, all we have to do is to open the ABAP perspective. From the Window menu, go to Perspective  → Open Perspective → Other.

open-abap-option

Select the ABAP option and then close the Welcome page.

select-abap-option-eclipse

So how does the ABAP perspective look like in Eclipse?

The ABAP tools are perfectly placed on screen to lighten your development. Moreover, you are free to adapt the views according to your personal needs.

views-adaptation-abap

Before starting with the real coding, we have to make sure that services are activated in the ABAP backend system.

Logon to SAP GUI and go to transaction SICF. Type DEFAULT_HOST into Virtual Host field and press Execute.

activated-services-abap

There are two things that must be checked here:

1. Check for Services

Expand the default_host node and navigate to sap → bc → abap. Right click on items docu and toolsdocu and choose Activate Service from the context menu.

 

check-services-abap

2. Check for Links

Second thing here is to verify if the paths to some particular ABAP links are available for development.

Expand the default_host node and navigate to sap à public à bc à abap. For items docu and toolsdocu right click and choose Activate Link from the context menu.

check-links-abap

You are ready to connect to the SAP system.

Now let’s get started with the ABAP development in Eclipse

As a first touch with developing ABAP in Eclipse, I kindly recommend you to start with a simple program. For this to be accomplished, we will following create a „Hello World” project.

In order to create a new Project, navigate to File → New → ABAP Project.

new-project-abap

ABAP Backend must be configured for development purposes. Here we need to define the connection system details.

There are two ways to connect to a system:

1.Creating a new connection

On the dialog window that appears on the screen, click on new system connection.

create-new-connection-abap

Your System ID, Application Server name and Instance Number should be completed here.

server-name-abap

2.Choosing an existent System Connection

If you already have connected to a system before, you just have to select it from the list of available connections.

existing-system-connection-eclipse

Type in your credentials used in SAP GUI and press. Next to log on and retrieve compatibility information from the backend system.

retrieve-compatibility-eclipse

If the connection was successfully established, type a meaningful name for your project and choose Finish.

project-naming-eclipse

Your newly created project should now appear in the Project Explorer view.

appearance-new-project-eclipse

In order to break the barrier of coding , we need a program. Right click on a package under the project name node and go to New → ABAP Program.

break-coding-abap

Choose a technical name for your program and a suitable description and go to the next step.

technical-name-abap-project

On this step, a transport on which you want to save the program must be chosen, otherwise you have to create a new request.

In our case, for the $TMP package, there is no need for selecting a transport request.

$TMP-package

The ABAP Editor will open. Introduce the below ABAP code or type your own message to be displayed on the screen.

type-message-display

Save and activate your program. Run it as an ABAP Application.

run-abap-application

We have now successfully created the ABAP „Hello World” project. Enjoy the magic!

Conclusion

Fortunately, IT professionals cooperate and look up to optimize the ABAP stack for HANA and to mutual develop and exchange ABAP code in a modern IDE, taking into consideration that SAP HANA is a technology game changer.

In the next future, I will present you an analogy of some distinct functionalities implemented in Eclipse and SAP Netweaver and their appearance.

Stay tuned for more insights and developments in the ABAP world. I would be happy to read your questions and comments.

Source of images: SAP SE, Inspiricon AG

Author
Andra Atanasoaie Associate
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de
Inspiricion_comparison_sap-bw_bw4hana

Comparison between the modelling in SAP BW and SAP BW/4HANA application

“We study the past to understand the present.” – William Lund

More and more customers approach us to learn more about BW 7.5 and BW4/HANA. All the more reason for us to start a new blog series to take a closer look at this subject. Let us start by examining the history of SAP BW and then move on to outlining the subject areas we will be covering over the coming weeks.

For this article, the past refers to the SAP BW modelling and the present to SAP BW/4HANA modelling.

Most organizations and individual users are still not sure which are the differences between SAP BW(old modelling) and SAP BW/4HANA.The purpose of this article is to put things into perspective and to provide you a clear answer regarding this topic.

SAP BW History – a short overview

Inspiricon_SAP-BW-History-Overview

What are the differences between SAP BW and SAP BW/4 HANA?

One of SAP’s main goals is to simplify the system. Consequently, it bundles together objects and processes and reduces the number of steps involved.

1.  Modelling Objects

A quick comparison between the modelling objects accessible in the classic SAP BW application and those in SAP BW/4HANA may help illustrate the level of modelling simplification accomplished.

sap_classic-sap-bw_sap-bw4hana_objects

In the upcoming articles in our series we will introduce you to the new Providers, starting with ADSOs.

2. Data Flows

The central entry point for modelling in SAP BW∕4HANA is the data flow. This defines which objects and processes are needed to transfer data from a source to SAP BW∕4HANA and cleanse, consolidate and integrate the data so that it can be made available for analysis and reporting. SAP BW∕4HANA is using a new integrated layer architecture (Layered Scalable Architecture – LSA++).

The classic SAP BW is using LSA, the old version of LSA++. This layer is more restrictive and not so flexible with the data.

Inspiricon_Comparison_Classic-BW_BW4HANA

One of the major benefits of using LSA++ is the reduction in the number of persistence layers. This has two effects:

For one, it improves data processing performance: You spend far less time saving and activating!

Second, this reduces the data volume. Given that storage place was not considered a critical factor, it used to be that redundancies were deliberately used in the BW system to improve read performance. But with the advent of HANA, things changed profoundly. Main memory is expensive, both in terms of hardware when compared to hard disk storage, and in licensing terms, as HANA is licensed as main memory. Another benefit is that the reduction in “physical” layers allows for far more flexibility in system design.

3. Source Systems

SAP is also pursuing its simplification approach when it comes to the source systems.

SAP BW∕4HANA offers flexible ways of integrating data from various sources. The data can be extracted and transformed from the source and load it into the SAP BW system, or directly access the data in the source for reporting an analysis purposes, without storing it physically in the Enterprise Data Warehouse.

sap-bw4hana-simplified-source-systemsa) SAP HANA Source System

  • this connectivity can be used for all other databases (e.g. Teradata, Sybase IQ, Sybase ASE).

b) SAP Operation Data Provisioning (ODP)

  • acts as the hub for all data flowing into BW from external sources
  • used exclusively with SAP Landscape Transformation (SLT), SAP ERP Extractor (SAP Business Suite), HANA Views and SAP BW.
  • The PSA no longer exists with the new ODP concept which provides a much faster extraction mechanism.

With those two connectivity types, data can be made available in batch mode, using real-time replication or direct access.

HANA views are automatically generated within the Sap HANA database, after you activate the objects (ex. ADSO, Composite Provider).

4. Performance

As pointed out in connection with LSA++, data processing is much faster with HANA. While data flows were all about streamlining the architecture, there are also a number of tangible benefits in terms of technical performance:

Additionally to classic SAP BW, SAP BW/4HANA offers in memory Data Warehousing:

  • No Aggregates or Roll-up Processes
  • No Performance specific Objects
  • Fewer Indexes
  • Faster Loading and Processing

SAP is going in the same direction with the ability to move transformations directly to the database, the so-called push-down.

This performance that SAP BW/4HANA offers is ensured by an algorithm push-down.

sap-bw4hana_algorithm-push-down

This is one of the subjects that we will be discussing in one of our next articles.

Source of images: SAP SE, Inspiricon AG

Author
Roxana Hategan Associate
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de
Predictive Analytics in Native Hana

How To Use Predictive Analytics In Native Hana

In this article we will be focusing on Hana’s native predictive analytics capabilities, explaining step by step how to use PAL (Predictive Analytics Library) comprised in AFL (Application Function Library) to create a multiple linear regression model and how to use that model for predictions.

You want to get to know more about Predictive Analytics? Read our two other articles about this topic:

  1. Welcome to the World of Predictive Analytics
  2. How to Create Your Own Predictive Model

WHAT IS PAL?

We have previously discussed Predictive Analytics using SAP Business Objects Predictive Analytics. Another way to create trained models and predictions based on the created models native in HANA is by utilizing PAL (Predictive analytics Library). This method is more complex and requires the user to have technical knowledge.

While the application offered by SAP is more user-friendly, the idea of implementing HANA PAL solutions offers a greater possibility of including the data into other applications by creating a process native in HANA.

The easy way to use Predictive Analytics tool brings high costs, mainly caused by the license fee – in contrast to Hana PAL where the user only requires knowledge on how to use the PAL functions.

HOW DOES PAL WORK?

There is a great number of functions offered by Hana to be implemented for different scenarios (Clustering of data, Classification, Regression, Time Series Forecast etc.).

In this article we will be focusing on one function from PAL: Multiple Linear Regression.

We will be using the same data (Rossmann dataset) that we utilized in the second article of our Predictive Analytics series (How to create your own predictive model).

The dataset is offered by the Rossmann store chain on Kaggle, a website where people and organizations can upload real datasets for competitions or for the purpose of helping people develop data science skills.

For a detailed view on the codes you will need to built a multi-linear regression we provide 2 documents that will help you get through it:

PREQUISITES AND DATA HANDLING

Let us assume that you have already installed SAP HANA STUDIO on your computer. Also, a SAP HANA server connection is required to be able to use the Hana database and engine. It is recommended to have the latest version of the HANA database (2.0). In our example both algorithms are supported by HANA 1.0 SPS 09 and HANA 2.0 SPS 01.

In our scenario we have imported the data found on the train.csv and store.csv in our HANA database as a Calculation View.

In the train.csv file we have data regarding sales of different stores and the date when the sales were made, also information regarding promotions, client numbers, day of the week etc.

In the store.csv file we have data regarding master data for store such as, the distance of the competition of the store, type of the store, and assortment of store.

In figure 1.1 we can see the two data tables:

Fig. 1.1

In our created Hana Calculation View we will join the two files, in order to create a complete dataset so that our algorithms can make an accurate model for a precise prediction (as seen in Fig.1.2).

Fig 1.2

PAL ALGORITHM IMPLEMENTATION

In our scenario we will implement the prediction using multiple linear regression.

1.WHAT IS Multiple Linear Regression (MLR)?

Linear regression is an approach to modeling the linear relationship between a variable , usually referred to as dependent variable, and one or more variables, usually referred to as independent variables, denoted as predictor vector . In linear regression, data is modeled using linear functions, and unknown model parameters are estimated from the data. Such models are called linear models.

 2. HOW TO CREATE A MLR MODEL IN 7 STEPS?

Step 1: Definition of parameters

In order to create a multiple linear model the algorithm must have 2 input tables and 4 output tables and we need to create the definition of these tables.

Input Tables:

a. Data – this contains the columns used for training the model. The data must contain a primary key column (“ID”) that must be the first column defined. The second Column must be the target/dependent column (the “Y variable” that we will be predicting). The rest of the columns will be populated with the independent variables (X1, X2,….) that will help the algorithm make an accurate model)

b. Significance – this contains the coefficients of the model and their values (the higher the value the more effect it has on the model)

c. Fitted – The fitted values (“Sales” in our case) of the target

d. Results – the results of the model (model accuracy and confidence)

e. Exported model – the model that will be used for future predictions.

f. Control – the parameter that specifies how the algorithm should work (here we can enable and calibrate the elastic net penalties, to specify the thread numbers, the algorithms used to resolve the least square problem etc. .)

Step 2: Creating the Parameter Table

After creating the definitions for the model, we have to create the table that will contain all the parameters and we need to specify whether or not the tables are input or output parameters (Data and Control Tables will be input parameters and the rest will be output).

Step 3: Wrapper Creation

The wrapper will create a procedure that will use the specified function and the earlier created parameter table. We will use this procedure later on to create the regression model based on our training data.

Step 4: Population of Data Table

First we will create a new table that will have the same columns and column types as the definition created for the parameter table.

Step 5: Population of Control Table

In the control table the user specifies how the algorithm will work. It is similar to creating settings for your program. The following control settings can be specified:

a. THREAD_NUMBER -> specifies number of threads (only when algorithm 1,5 or 6 is used)

b. ALG -> Specifies algorithms for solving the least square problem:

  1. QR decomposition (numerically stable, but fails when A is rank-deficient)
  2. SVD (numerically stable and can handle rank deficiency but computationally expensive)
  3. Cyclical coordinate descent method to solve elastic net regularized multiple linear regression
  4. Cholesky decomposition (fast but numerically unstable)
  5. Alternating direction method of multipliers to solve elastic net regularized multiple linear regression. This method is faster than the cyclical coordinate descent method in many cases and recommended.

c. VARIABLE_SELECTION -> “0” for to include all variables, “1” Forward Selection, “2” Backward Selection

d. APLHA_TO_ENTER -> P-value for forward selection

e. ALPHA_TO_REMOVE -> P-value for backward selection

f. ENET_LAMBDA -> Penalized weight

g. ENET_ALPHA -> elastic net mixing parameter

In our example we will use ALG 6 with elastic net penalties enabled (a method used for the optimization of the regression model) and we will have a thread number of 5.

Step 6: Build output tables

Now all that remains to be done is to create the output tables based on the definitions.

Step 7: Create the model

Finally we have to call the procedure that we created in the wrapper and to view the results and to create a table where we can compare the fitted values with the real values

Results

The first result that will pop up is the Coefficient Table that will tell us the impact of each coefficient upon the model where 0 means it has almost no effect on the created model.

Fig. 1.3

The next output table will show us the fitted values of the training model data:

Fig 1.4

The last table will give us the statistics of the created model

  • R2 – is the power of the model, in our example our model has a 89,996% precision.
  • F – The F value is the ratio of the mean regression sum of squares divided by the mean error sum of squares.
  • AIC – The Akaike Information Criterion (AIC) is an estimator of the relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.
  • BIC – Bayesian information criterion (BIC) or Schwarz criterion (also SBC, SBIC) is a criterion for model selection among a finite set of models; the model with the lowest BIC is preferred. It is based, in part, on the likelihood function and it is closely related to the AIC.
  • MSE – the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors or deviations—that is, the difference between the estimator and what is estimated.
  • RMSE – Root Mean Squared Error
  • MAE – The mean absolute error (MAE)
  • MAPE – the mean absolute percentage error (MAPE), also known as mean absolute percentage deviation (MAPD), is a measure of prediction accuracy of a forecasting method in statistics, for example in trend estimation.

Fig 1.5

The comparison between the fitted and real values can be displayed in a more user-friendly way by using a chart where you can compare the real value and the predicted/fitted value (the sales values in our case) over time, as shown below:

Fig 1.6

Fig. 1.7

3. HOW TO CREATE PREDICITONS BASED ON MLR MODEL’S RESULTS

(1) Definition of parameters

Similarly to our model creation we are required to create parameter definition, the difference being that the data table “type” must not contain the target variable.

(2) Forecasting Procedure Wrapper

Our generated procedure will be fed the parameter table specifications and structures and will be using the ‘FORECASTWITHLR’ function.

(3) Input and output table creation and data generation and regression specification

Similarly to the model creation we will specify how the linear regression forecast function will handle the input data. The input data must not contain the target column.

(4) Coefficient table

The coefficients table’s data must be transferred from the Coefficient table from the model’s result.

(5) Forecast Procedure Calling

All that remains to be done is to call the previously created procedure.

(6) View Results

View the predictions made by the forecast algorithm.

 

Sources:

https://help.sap.com/viewer/2cfbc5cf2bc14f028cfbe2a2bba60a50/2.0.00/en-US/eedc9094daf04419bc25f6ed097ac03b.html

https://help.sap.com/doc/86fb8d26952748debc8d08db756e6c1f/2.0.00/en-us/sap_hana_predictive_analysis_library_pal_en.pdf

Author
Gellert Peter-Siskovits Associate
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de
Who needs SAP Vora?!

Who needs SAP Vora?!

What is SAP Vora good for anyway?

SAP Vora allows you to analyze structured and semi-structured data within an existing Hadoop cluster in a modern interface and to combine these data types with one another as well as with data from SAP HANA.

From a technical standpoint, this is an extension of the Apache Spark execution framework, which has long been established in the world of Hadoop.

This way, SAP Vora gives you a distributed in-memory query engine for both structured and semi-structured data in a Hadoop cluster.

How can I use SAP Vora to my advantage?

There are several options for you to benefit from SAP Vora; it goes without saying that SAP would like to see you use it in the cloud – thus keeping in line with its own cloud strategy:

On-premise:

  • By downloading the Developer Edition from the SAP Development Center (http://developers.sap.com), which is available free of charge for SAP partners
  • By downloading the Production Edition from the SAP Support Portal (https://support.sap.com)

Cloud-based:

  • By using the Developer Edition, which is free of charge for SAP partners, through Amazon Web Services (AWS) or SAP Cloud Appliance Library (SAP CAL)
  • By using the paid Production Edition through AWS
  • By using it as a service through SAP Big Data Services
  • By using a bring your own license (BYOL) model (SAP Cloud Platform, AWS)

The SAP Vora Developer Edition in AWS provides complete functionality and with just a few clicks, the environment can be custom-configured according to pre-established parameters.

The underlying Hadoop cluster is a Hortonworks distribution (HDP) with the corresponding tools/software solutions such as Spark, Ambari, Zeppelin, etc. and has a maximum of 4 nodes.

The variant offered by SAP through the SAP Cloud Appliance Library (CAL) is delivered as a pre-configured appliance with functionality that is very similar to AWS. It is best suited for anyone already using SAP CAL.

The Production Edition differs only in terms of upward scalability of the cluster and, of course, in terms of cost.

How does SAP Vora work?

Once you have made your decision regarding a deployment model (on-premises or cloud) you then go on to – depending on your choice – installation and configuration.

The installation process involves three steps:

  1. Determining the number of nodes required for Vora in the Hadoop cluster depending on your
    • availability requirements
    • sizing requirements (CPU, disk vs. RAM, control nodes vs. compute nodes, different sizing for each specific Spark engine, etc.)
    • expected data growth
  2. Deploying SAP Vora Manager Services on the required cluster nodes
  3. Configuring and starting the SAP Vora Services on the cluster nodes using the SAP Vora Manager UI

Once you have successfully completed the installation and configuration in a Hadoop cluster (the HDP, Cloudera and MapR distributions are supported), you can start using SAP Vora. In addition to the above-mentioned SAP Vora Manager for the more administrative side of things, end users are provided with a central GUI by means of a set of tools known as the SAP Vora Tools.

The following tools are available in the GUI:

  • Data Browser: view the contents of tables and views
  • SQL Editor: create and execute SQL statements
  • Modeler: create and modify tables and views
  • User Management: manage user accounts and access to the SAP Vora Tools

The end users can leverage the SAP Vora Tools to analyze data that differs in structure and data type found in the Hadoop cluster. In the next section, we will take a closer look at the analytics options.

What can I analyze with SAP Vora?

Vora enables you to interpret JSON documents, conduct time series and graph analytics, and use SQL to also analyze data that is conventionally structured in a relational way.

In doing so, Vora uses a specific Spark engine with optimized processing for each of the different types of analytics.

The “Doc Store” – NoSQL analytics of semi-structured JSON documents

Starting with version 1.3, SAP introduced the “Doc Store”. With it, you can store modified documents as schema-free tables, which in turn allows you to scale out and flexibly handle document fields (delete, add).

Once you have created a document store (= collection) based on JSON documents existing in the cluster in Vora, it serves as the basis for the creation of a view that can also be expanded with the familiar JSON expressions. This view is then stored in Vora’s own Doc Store and can be processed both in the table and the JSON format.

Time series analytics – leveraging efficient compression and data distribution

The Spark engine available for time series analytics exhibits its full strength when the underlying data is spread across as many cluster nodes as possible and can be efficiently compressed.

Based on the time series data stored in the cluster, a “times series table” is created within Vora, for which a unique column with time ranges (= range type) must exist. Along with various other options, you can also enter equidistance properties and additional compression parameters.

In order to be able to analyze time series data, you also need to create a view that can be enhanced with specific table functions (e.g. cross/auto correlation).

With this, you can then conduct the corresponding analyses such as regression, binning, sampling, similarity, etc.

Real-time graph analytics – analyzing very large graphs

Vora comes with its own in-memory graph database that was specifically developed for the real-time analysis of large graphs. Accordingly, the modelling of the graphs in the graphical metadata viewer is supported by a path expression editor.

With an in-memory engine available, it is capable of highly complex graph-related queries and you can count on the visualization of the graphs to be state of the art.

The graph analytics engine is particularly suited for supply chain management applications or to visualize elaborate organizational and project hierarchies or business networks.

Relational engine – using SQL to analyze relations

Last but not least, Vora also lets you use SQL to represent and query structured data in the cluster in the form of relational, column-based tables. This approach also uses in-memory data compression.

For relational data that does not need to be kept in memory, Vora also comes with a disk engine. It stores the data in a file on the local node on which the engine runs. As with the dynamic tiering option in HANA, you can also easily join the column-based relational disk tables with the in-memory tables.

Also worth mentioning

  • Once you have completed the registration in the registry, Vora also allows you to use SAP HANA tables along with any views and tables created in Vora. From Vora, you can also write data to SAP HANA.
  • The creation of both level and parent-child hierarchies and the use of joint fact tables is supported.
  • You can use currency translation (standard or ERP) in tables and views.
  • There are specific partitioning functions and types for each engine, that is, for the specific data structures created in Vora that allow you to optimally distribute or partition them in the cluster (hash, block, range).

What data sources and formats are currently supported?

With the SAP Vora Tools, you can process the following files in Vora:

  • .CSV
  • .ORC
  • .PARQUET
  • .JSON
  • .JSG

In addition to the standard data type HDFS and the ORC and PARQUET types (option (format “orc” / format “parquet”)), it is also possible to load the following additional types in the “CREATE TABLE” statement in Vora:

  • Amazon S3 (option (storagebackend “s3”))
  • Swift Object (option (storagebackend “swift”))

Conclusion and outlook

It is hardly surprising that SAP Vora’s main strength lies in the combination with SAP HANA, as this enables you to analyze relational data from HANA along with semi-structured data from your Hadoop cluster. What’s more, Vora gives you an array of analysis options (graphs, documents, etc.) combined into a single tool that would otherwise require you to rely on multiple tools (or different databases) from different Hadoop distributors or third-party vendors.

SAP is planning to support the transaction concept (ACID) in Vora to improve on its consistent data storage capabilities. For 2018, initial support for insert/update/delete statements is already in the works. SAP furthermore plans to add support for SQL Pass-through from SAP HANA to SAP Vora.

All friends of SAP BW will also be glad to hear that SAP plans to support DSOs beyond 2018.

If you’re an SAP partner, you can easily get started with the free Developer Edition to familiarize yourself with the subject – it’s the perfect place to learn more about its configuration and use cases.

Or you can just ask us – we’ll be happy to help!

Author
Andreas Keller Associate Partner
Phone: +49 (0) 7031 714 660 0
Email: info@inspiricon.de
inspiricon-text-analysis-sap-hana_jack-moreh

How to Get More Insights With Text Analysis with SAP HANA

We all know that knowledge is power. Every day we accumulate information, we share it or we post on social networks. Everything around us is pure information. What can we do with all this news? Simply, we can store it as data.

Data can be classified in two categories: unstructured and structured data.

Figure1 Unstructured Structured Data

Figure 1. Difference between unstructured and structured Data

What is unstructured Data?

The phrase “unstructured data” usually refers to information that does not reside in a traditional row-column database. For example Facebook, Twitter or Emails.

The benefits of structured data is that it can be identified and processed by machine. After the storage, data is so much easier to search, combine and filter for one’s own purpose.

What is structured Data?

Data that resides in a fixed field within a record or file is called structured data. For example a database.

So with Text Analysis we take unstructured data, transform it to structured data and analyze it.

What is Text Analysis?

The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for exploratory data analysis, research, or investigation.

Text Analysis powered by SAP HANA applies full linguistic and statistical techniques to extract and classify unstructured text into entities and domains.

With SAP HANA platform, you can gain real insights from your unstructured textual data. The platform provides search, text analysis, and text mining functionality for unstructured text sources.

There are several techniques used in Text Analysis in order to find a specific string or to perform linguistic searches.

  • Full Text Indexing:
    A full-text index is a special type of token-based functional index that is built and maintained by the Full-Text Engine for SQL Server.
  • Full Text Search:
    Full text search is designed to perform linguistic (language-based) searches against text and documents stored in your database.
  • Fuzzy Search:
    Fuzzy search is the technique of finding strings that match a pattern approximately. It is a type of search that will find matches even when users misspell words or enter in only partial words for the search.

In this article we will talk about Full Text Indexing.

Full Text Indexing: Table $TA

Creating a full-text index with parameter TEXT ANALYSIS ON triggers the creation of a table named $TA_<indexname> containing linguistic or semantic analysis results.

The set of columns in the table $TA is always the same regardless of the text analysis configuration used with the full-text index:

inspiricon-figure2-ta-table

Figure 2. Example for a TA Table

  • Key columns from source table (ID):
    The first columns in table $TA are a direct copy of the key columns from the source table.
  • TA_RULE:
    The rule that created the output. Generally, this is either LXP for linguistic analysis or Entity Extraction for entity and fact analysis.
  • TA_COUNTER:
    A unique sequential ID for each token extracted from the document.
  • A_TOKEN:
    The term, entity, or fact extracted from the document.
  • TA_LANGUAGE:
    The language of the document.
  • TA_TYPE:
    The type of the token. In linguistic analysis, this is the part of speech. In semantic analysis, it is the entity type or fact. (‘noun’, ‘StrongPositiveSentiment’, ‘Person’)
  • TA_NORMALIZED:
    The normalized version of the token. Inflection is maintained, but capitalization and diacritics are removed. This column is null for entity extraction.
  • TA_STEM:
    The stemmed version of the token. This field is fully un inflected and normalized. If the stem is identical to the token, this column is null. It is also null for entity extraction.
  • TA_PARAGRAPGH:
    The paragraph in the document that contains the token.
  • TA_SENTENCE:
    The sentence in the document that contains the token.
  • TA_CREATED_AT:
    Creation time of the record.
  • TA_OFFSET:
    Character offset from the beginning of the document.
  • TA_PARENT:
    The TA_COUNTER value of the parent of this token.

Built-in Configurations

Sap Hana has seven built-in configurations that are used to analyze the behavior and output of the text:

  • LINGANALYSIS_BASIC:
    it’s the most basic linguistic analysis, that tokenizes the file, but the normalization and stemming are not used, so the columns remain empty for TA_NORMALIZED and TA_STEM.
  • LINGANALYSIS_STEMS:
    Normalizes and stems the tokens so the TA_NORMALIZED and TA_STEM fields will be populated.
  • LINGANALYSIS_FULL:
    Uses full linguistic analysis, so all the columns in the $TA_Table will be populated.
  • EXTRACTION_CORE:
    It extracts entities from the text. For example: people, places, URLs.
  • EXTRACTION_CORE_VOICEOFCUSTOMER:
    It extracts entities and facts to identify positive and negative emotions associated with the tokens.
  • EXTRACTION_CORE_ENTERPRISE:
    It extracts data for enterprise. For example: mergers, acquisitions, organizational changes, and product releases
  • EXTRACTION_CORE_PUBLIC_SECTOR:
    It extracts security-related data about public persons, events and organizations.

Creating a table and index

We will make a practical example for the EXTRACTION_CORE_ VOICEOFCUSTOMER configuration, which identifies the sentiments (positive or negative feeling):

We have to create a table, insert values and create an index.

Open a SQL Console and write the following command:

inspiricon-figure3-create-table

Figure 3. Create Table

 inspiricon-figure4-create-Index-insert-values

Figure 4. Create Index and insert values

The created table will look like this:

inspiricon-figure5-created-table

Figure 5. Table

Finally we got the text analysis. As you can see “likes” or “enjoys” appear as a “weak positive sentiment”.

inspiricon-figure6-text-analysis

Figure 6. Text Analysis

This article was inspired from this blog: https://blogs.sap.com/2017/05/21/sap-hana-ta-text-analysis/

Author
Adelina Ramona Popa and
Lucian Tomeac
Associates SAP BI
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de