Keep your code clean with ABAP Unit Test Framework in Eclipse

KEEP YOUR CODE CLEAN WITH ABAP UNIT TEST FRAMEWORK IN ECLIPSE – PRACTICAL EXAMPLE

The today’s article is the second from a two part series which focuses on leveraging the ABAP Unit framework in Eclipse. In this second part we will demonstrate the exact procedure in a practical exampleThe first theoretical part offers you the necessary know-how to create Unit Test Cases for your code.

Practical example

As an example, we will take a global class of a calculator which defines the four basic mathematical operations: addition, subtraction, multiplication and division. This will be our codebase for which we will apply the Unit Testing technique inside the Eclipse platform.

Eclipse does not have a wizard for generating automated unit tests, but it does provide some templates you can make use of. Considering the fact that you may be new to this subject, it is far better to start writing tests by hand first. That being said, do not wait for the right moment anymore. Thus, open your Eclipse IDE, switch to the ABAP perspective and create an ABAP class with the following source code:

ABAP coding: exception divide function

Picture 1: Generating an ABAP class

ABAP class creation in eclipse

Picture 2: Generating an ABAP class (extension)

Intuitively, the smallest component of code that can be tested here is a method and this will be our ”unit” too. For instance, we will take the method ”divide” as a method under test and we will cover different cases and exceptions for it.

As presented in the theoretical part, the first step in defining the test cases is to create a test class. For that purpose, choose the ”Test Classes (non existent)” tab from the bottom of the editor view.

defining the test cases in ABAP Unit Framework

Abbildung 3: Setting up Test Classes

In the new window that will appear on your screen press the ”Create Test Classes…” button and an editor page will be displayed in changed mode. You are now ready to start coding your test class.

start coding a test class

Picture 4: Programming a Test Class

If you think of which cases to cover, you should consider different categories of input values that need to be testesd as: positive cases, negative cases, boundary values or combinations of input values. Definitely, each case will be treated in a separate test method. The best practice is to start with the simplest scenarios and than to continue adding more complex sets of values to be tested.

For executing the test methods we can use the shortcut keys CTRL + SHIFT + F10 or we can press the drop-down button ”Run As” and choose the ”ABAP Unit Test” option from the list. The status and the duration of each test case will be shown in the ”ABAP Unit” view.

In case of any failures, the test method that fails will have a different icon, to be easily spotted, and, what is more important, the error messages will be added to the ”Failure trace” section. If we double click on the path under the ”Stack” node, it will directly point us to the block of code that actually went wrong. Unit tests enable changes and that makes our testing safe.

coding failure of a test class

Picture 5: Error message

After the necessary adjustments of the code, we just have to rerun the tests and check if their statuses have changed and no failures are actually present.

Test class coding failure in ABAP Unit Test Framework

Picture 6: Status check of already run tests

Let us add a few more test cases and handle also an exception of the ”dividefunction, the well-known division by zero. Subsequently, our test class should look as follows:

Divide function in ABAP Unit Test Framework

Picture 7: Exception of the ”divide” function

ABAP Coding: divide function exception

Picture 8: Exception of the ”divide” function (extension)

ABAP Coding: exception divide function

Picture 9: Division by zero

Please notice that tests are executed in an indefinite sequence and not in the order they have been defined in the test class:

ABAP Test executed in an indefinite sequence

Picture 10: Indefinite executed tests

We invite you to create even more test cases, for the purpose of learning and for a better understanding of the ABAP Unit framework in Eclipse. Anyway, we have not come to an end yet. Our test code is functional but is it clean and optimized?

In other words, after all tests are green, we are ready to apply some refactoring to the code. Then again, what is refactoring? Refactoring is the process of changing and improving the internal structure of the existing code by reducing the complexity without altering its external behaviour. In this way, the next tests will be shorter to implement by means of code reusability and legibility.

Furthermore, we will leverage the special methods provided by the Unit framework and we will use the ”SETUP” method to implement the ”given” part defined in our test methods. This method will be executed before each test case and its ultimate purpose is to reference the code under test. On the consideration that writing tests should be efficient, we should also use ”helper methods” to make the test code readable, clean and small.

All things considered, after refactoring, the test class will have the following design:

Refactoring a test class in ABAP Unit Test Framework

Picture 11: Test class design

ABAP Unit Test: refactored test class

Picture 12: Test class design (extension)

Our test class has now a better code structure, efficiency, clarity and quality. Finally, we can get another sip of coffee.

Do you think it is unrealistic having your product code and tests ready at the same time? Not anymore..

Test Driven Development – TDD

Nevertheless, when talking about Unit Testing, it is essential to mind the Test Driven Development (TDD) technique. Also known as the ”Test First” procedure, it implies starting with small requirements and creating the unit tests in the first place. To be more specific, we should outline the behaviour of the application and what should it do from a user perspective, and write the small features in a text document or even as comments in the test classes, to be further used in the development. These requirements will subsequently be converted into specific test cases.

Using the TDD technique, you will always get an answer to the question ”What code should I write next?”. For instance, each new feature will be added by firstly creating a test for a single feature and not for multiple ones at a time. Obviously, running this test will demonstrate a failure, due to the absence or deficiency of the code under test. At this point, we should write just enough of the production code to pass the failed unit test. If the test will not pass, the production code must be adjusted until it does.

Considering this technique is a development cycle, we should repeat the above steps for all the required functionalities in order to have a major code coverage. The next test will drive the next step, in other words, test development drives feature development. At the end of the cycle, when all tests are green, the production code and the test code are ready for refactoring. This phase is fundamental in order to increase readability, to reduce duplicates and to have a clean code. For more information about Unit Testing and Test Driven Development, I kindly recommend you to enroll to the OpenSAP course ”Writing Testable Code for ABAP”, which also represents a strong base of the article today.

Conclusion

Furthermore, unit testing is usually automated. As a matter of fact, Eclipse provides a Unit Test framework which is embedded in the ABAP Development Tools plugin and facilitates the process of defining and executing automated unit tests. Unit tests are simple, stable and provide a major code coverage when having a right setup. Tests can be built during the development process and they can even be transported together with the source code. Unit tests are self-dependent and their execution takes place in a random order. Keep in mind when writing tests that they should not go beyond their class edges and interact with other components. They should only test the behaviour of that class or of the small component of code under test.

As you may already be familiar with, many typical software projects are formed of multiple modules often coded by various developers. The individual modules are first tested in isolation and here is the moment when the ABAP Unit comes to help, as a first level of testing our code. After all these units are developed and tested, we can cluster the ”unit tested” components and test their behaviour using the Integration technique. But this is another story to talk about..

Looking forward to more knowledge each day? We would be very happy to see you on our Inspiricon blog again. Until then, feel free to explore the ABAP Unit framework in Eclipse. And do not forget.. ”Truth can only be found in one place: the code”. Happy coding! 😊

 

Author
Andra Atanasoaie Associate
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de
Keep your code clean with ABAP Unit Test Framework in Eclipse

Keep your code clean with ABAP Unit Test framework in Eclipse – THEORETICAL INTRODUCTION

Have you ever thought of how important is testing in our life? For instance, imagine we build a car and fit together all the pieces without even testing a single part of it. Then, when the construction is finished, we enroll in a car race, expecting it to function as designed. Do you think the engine will even start? I think we have reason to doubt it.

Therefore, what can be more satisfying than testing our products and fixing the errors to ensure the delivered quality to the customers? In the field of software development this activity is known as the so-called error analysis.

The today’s article is the first from a two part series which focuses on leveraging the ABAP Unit framework in Eclipse. This first theoretical part offers you the necessary know-how for  building Unit Test Cases for the source code using a practical example. In the second part we will demonstrate the exact procedure based on a practical example.

But first…

Why test an application?

A project lifecycle involves many procedures and stages to be followed in a methodical way. As software developers, we should be aware, that there is always the need of testing a product during the development and integration phases of an application. Our goal for it is to last for a long period of time and to perform resourcefully.

Testing is essential and offers several advantages:

  • less defects leading to the improvement of overall capacity and accuracy
  • it is more cost effective related to future developments
  • improves the quality and ensures the application performance
  • encourages customer’s satisfaction and confidence
  • keeps your position in the business field competition

Computer systems are often complex and hard to understand. As a result, various methodologies have been implemented to evaluate the behaviour of a product. In the light of testing our software code, there are several testing practices used along the lifecycle of an application, as presented below:Testing practices development

Figure 1: Testing practices used along the lifecycle of an application

1.Unit Testing

It is the first level of software testing where individual units of code are tested to see if they work as expected. Consequently, each piece of code is subjected to a series of test cases. It can be performed manually, but usually tests are automated.

2.Integrated/Component Testing

It is a technique focused to test the integration of different modules that were previously unit tested by grouping them in multiple ways and to check their behaviour and data communication. Tests are defined in an integration test plan and applied to these aggregated components.

3.System/UI Testing

It is a testing technique that focuses on the UI (User Interface) design structure. Can be executed manually or by using automated tools. It is a practice used to confirm the properties and acknowledge the state of the UI elements. This can be effectively achieved by creating a diversity and combination of test cases.

4.Exploratory/Acceptance Testing

It unscripts the Quality Assurance (QA) technique used to investigate and discover what works and what does not work in an application. Testers have the possibility to explore, learn and check the application in real-time. Test cases are not necessary beforehand, usually testers decide on the fly what to test next and how much time to spend on a particular functionality.

Except for the Unit Testing technique, when testing an application the inside details of it are seen as a „black box” for the tester. Nonetheless, as prerequisites there should exist a basic understanding of the system design.

Automated Testing versus Manual Testing

First and foremost, what is automated testing and how can we define the manual testing?

Automated testing is a process of testing the software code by running a minimum set of scripted tests. These tests are executed using an automated tool that will eventually report the actual values and compare the results with the expected outcome. The aim of the automated testing is to minimize as much as possible the effort and the time consumed. This technique is appropriate and convenient for large projects, particularly if they are too complex to rely only on manual testing.

Manual testing , on the other hand, is a technique of finding errors and defects in a software application by performing a suite of test cases by a human. It is not necessary to have knowledge of an intermediary automated tool, but it may be very time-consuming taking into consideratio, that, to ensure the completeness of testing, testers need to a create a suitable schema that will lead to a detailed set of important test cases to be covered.

It would be ideal in the first place to manually test the code of a new application or after any major change of it and then to create new automated tests for a better coverage and understanding of the implemented functionalities.

Having said that, chances are that there we will be the need to combine automated testing with manual testing in order to provide a complete and fully covered test plan. The reality is, there is no ”better” or ”worse” procedure, they are just different.

What is Unit Testing and when should we use it?

Let us start by answering the second question. The suitable response would be: everytime we want to test a small unit and its behaviour. You may be wondering what is a unit then?

A ”unit” is defined as the smallest testable component of an application, such as a class or a method, that can be isolated from the rest of the code and controlled to investigate if it fits for use.

As opposed to the other testing techniques, there is a need for a solid understanding of the system architecture when using ABAP Unit Testing. Therefore, Unit Testing is also referred to as a „white” or „gray box” testing and consequently, unit tests should be created and defined by a developer.

Most programming languages have designated unit testing frameworks and so does ABAP. Here, all the background processes required for testing are built into an IDE (Integrated Development Environment), a fact that enables us to execute the unit tests with every compile of the test program.

The most suitable tool in SAP for Unit Testing is the ABAP Unit Framework in Eclipse IDE. This framework is part of the ABAP Development Tools (ADT), therefore, the only thing you need to do is to install the add-on into your Eclipse platform. For those who do not have it installed, yet, you can find more information along with a step-by-step tutorial on this blog article.

Unit tests do not interact with the environment or with external systems of the codebase. They are always running in a simulated and isolated environment, in most of the cases, as a separated program in which unit tests are defined as test methods in local test classes. Thus, when the application changes in the real, productive environment, the unit part will not be automatically informed due to isolation from the system.

In order to make your tests effective, you need to structure the code in a way that makes it easy to unit test. Nevertheless, it requires a bit of practice to get good at it and to understand what to test in isolation.

Unit testing should be applied where tests have a clear meaning. The Business Layer code is commonly suitable for running tests and, hence, it represents the main area where unit testing is focused.

Our goal is to write efficient automated tests alongside with testable code. For instance, when running unit tests we have the flexibility to change the source code and to adjust it to fix the errors and to make it testable. Above all, the most obvious benefit is knowing down the road that when a change is made no other individual units of code are affected.

writing efficient automated tests

Figure 2: Writing efficient automated tests

Advantages and benefits

Naturally, as the first level of performing test cases in an application, unit testing provide some notable advantages. Not only that, when having a proper setup, unit tests are:

  • Fast – typically can be run thousands of tests/second
  • Simple – unit tests focus on small parts of the application
  • Timely – tests are written alternatively with the product code

But also, Unit Tests are very effective because they provide:

  • Major code coverage – in many cases up to 80-90% code coverage
  • Easy error analysis – unit tests point exactly to the place where the code goes wrong
  • Stability – unit test repeatability result in a constant behaviour, fact which makes them stable.

Creating a Test Class

In the SAP field, unit tests have been introduced as part of the Object-Oriented design. On that consideration, we can intuitively notice that unit tests work perfectly with classes. In a scenario where we are not able to test all of the report flow, we should at least try to unit test small parts of it.

Moreover, we should only test the public interface of an application and not the private parts. Normally, we are not allowed to test directly blocks of a report, as INITIALIZATION or START-OF-SELECTION, but we can implement the same logic by converting these into object methods and test them afterwards. Worth to notice here is that modularization is very important for a better programming.

The idea is that we define and implement test classes in the same way we define a regular ABAP Objects class. However, what makes a difference is the  ”FOR TESTING” statement we have to add to the test class definition. Keep in mind that this addition basically separates the application into two different parts: the test code and the production code.

Even though there is not a standard naming convention yet official, SAP suggests us to add a prefix to the test classes names as in ”LTC_<class_name>”, in order to emphasize that they are local test classes and to distinguish their focus. Moreover, what we also need to specify for a test class are the two class attributes, RISK LEVEL and DURATION, used by the ABAP Unit runner to interpret the properties of that class.

The RISK LEVEL attribute is referring to the side effects that could impact the system:

  • HARMLESS – no existing process will be affected by the unit test
  • DANGEROUS – the unit test could make changes to a database or persistent data
  • CRITICAL – both customization and persistent data could be modified

Kindly keep in mind that every unit test should be defined with the scope of being ”harmless”, hence not to modify the environment in any way.

The DURATION attribute is referring to the expected execution time of the unit tests and it can be:

  • SHORT – less than a minute
  • MEDIUM – less than five minutes
  • LONG – longer than five minutes, relatively one hour

Eventually, these intervals can be customized. Yet, what we need to take into consideration is, that if the execution time exceeds the specified parameter, the ABAP Unit runner will throw an error.

In the end, to give a short illustration, a test class definition should have the following format:

CLASS ltc_my_test_class DEFINITION FOR TESTING

RISK LEVEL Critical|Dangerous|Harmless

      DURATION Short|Medium|Long .

[…]

ENDCLASS.

Worth to mention is, that the part of code in an application, that is unit tested, is usually called ”CUT” = Code Under Test.

If you are eager to find more about test classes, I suggest you to read the SAP Press E-book publication ”ABAP Unit: Writing and Executing Unit Tests”, written by James Wood and Joseph Rupert.

That being said, we have now arrived at the point of…

Creating a Test Method

A test method can be implemented exclusively in a test class. As a matter of fact, during a test run, test methods are called as individual tests and that means we will only have one instance of the test class per test method.

Notably here, the method that is going to be tested, actually the unit, is usually referred to as the ”Method Under Test” and it is part of the production code.

Test methods have no parameters but, similar with test classes, they have a ”FOR TESTING” addition at the end of the DEFINITION part. Another key point to remember is that a test method is declared in the PRIVATE SECTION of the class definition to underline that it can only be used in the context of its test class and not in other derived classes, as follows:

CLASS ltc_my_test_class DEFINITION FOR TESTING

      […] .

      PRIVATE SECTION.

      METHODS:

          test_method1 FOR TESTING.

          test_method2 FOR TESTING.

ENDCLASS.

The only situation where the test methods can be declared in the PROTECTED SECTION is when these methods are part of a superclass and are inherited by another test class.

The IMPLEMENTATION part of a test method should always tell us a story about what is going to be tested and this should also be reflected in the test method name. It usually follows the ”given – when – thenpattern that associates with the initialization, execution and result phases of the test case. We should think of a test method similar to a logical story in the following way:

given -> a particular environment

This is an initialization part where our global class is instantiated.

when -> we execute this CUT (code under test)

Here we normally call the method that is going to be tested together with the specified input values.

then -> this outcome is expected

A test method has one or a few inputs and usually a single output.

Significantly, in the ”then” part of the test method, the outcome is validated through a comparison between the actual value that the method under test is returning and the real expected value. If the results do not match, the Unit test framework will throw an error. This comparison is made using one of the utility methods provided by the ABAP class ”CL_ABAP_UNIT_ASSERT”, which is called ”ASSERT_EQUALS”.

The most often used utility methods alongside with their SAP documentation are:

– ASSERT_EQUALS – ensures equality of two data objects

– ASSERT_BOUND / ASSERT_NOT_BOUND – ensures the validity / invalidity of the reference of a reference variable

– ASSERT_INITIAL / ASSERT_NOT_INITIAL – ensures that data object value is initial / is not initial

– ASSERT_TRUE / ASSERT_FALSE – ensures that a boolean equals ABAP_TRUE / ABAP_FALSE

– ASSERT_SUBRC – ensures a specific value of the return code

– FAIL – report unconditional assertion

– ABORT – abort test execution due to missing context.

They are also known as ”assertion methods” and their ultimate purpose is to pass messages to the ABAP Unit runner.

Naturally, it is not necessary for a test method to follow the above pattern, it is just a style of writing tests that helps for a better understanding of them. Instead, it is mandatory for it to include the logic inside these three parts.

Importantly to know about is that there are several special private methods used in test classes which are provided by the ABAP Unit framework. They implement a unique test behaviour and include the test objects and the connections needed for a proper execution. These are:

  • SETUP( ) – instance method that is executed before each individual test or before each execution of a test method

The ”given” part of the tests can be implemented in the SETUP method, only if the test methods have the same code under test.

  • TEARDOWN( ) – instance method that is executed after each individual test or after each execution of a test method
  • CLASS_SETUP( ) – class method which is executed once before all tests of the class.
  • CLASS_TEARDOWN( ) – class method which is executed once after every test of the class.

Actually, they are also called ”fixture methods” and have predefined names so that the ABAP Unit could recognize them at runtime. These methods are optional and should be used only if we need them.

There are already several software companies that have adopted the Unit Testing technique as part of their development process. Although it may be difficult to get used to it or to figure out what cases to cover, in a long-term perspective, unit tests lead to less maintenance efforts and costs. Unit tests are likely created by the developer, the reason for we call it ”white-box” testing, and it makes sense because the person who wrote the product code is the most qualified to know about what and how can be easily accessed and tested. Additionally, having a lot of tests for the small components of the application will demand much less debugging overall and that will also save our time. And time is very important for a developer, isn’t it?

Now, that you completed the theoretical part, you are ready to practice. Do not miss the second part of this series, which will show you how to build Unit Test Cases for the source code based on a practical example.

Sources of the images: https://open.sap.com

Author
Andra Atanasoaie Associate
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de
Sales management with the help of value driver trees

Sales Management with the help of value driver trees

The challenge

Achieving a transparent sales management across different business units continues to be a major challenge for many companies. This blog article offers a possible approach and details the selected procedure for it.

The company presented in this example operates internationally in the field of complex lighting solution projects, its national subsidiaries have great independence and the individual branches and offices are mostly self-sufficient.

Therefore, the idea was to develop a system that, on one hand, would provide more transparency in sales activities, success rates, success drivers and processes. On the other hand, the company operates in a highly competitive market. The business mainly consists of project work in which many participants are involved: architects, lighting planners, property developers, building owners, craftsmen and official institutions. So, in many cases you don’t know “the end customer”. For example, it has not yet been possible to measure customer profitability.

Due to the complexity of the lighting solution projects, it is crucial for the company to push only those projects that will actually be implemented. When the hit rate is high, the sales department works effectively.

Long project durations will make traceability even more difficult. Long after the order has been booked, employees of the company and their partners – often from different national subsidiaries or offices – are still busy implementing projects. The invoicing then takes place with a corresponding delay. How is the performance of the sales department to be controlled considering the many different participants and times at which services were provided that are necessary for success? Everyone works on a project at a certain percentage and in a different phase.

In this case, performance in sales does not only mean measuring profitability based on incoming orders. The subsequent processes and work – i.e. the traces that an employee leaves in the system – should also be taken into account when determining performance. This is because, often, credit notes are granted, or complaints processed during implementation, and even payment defaults may occur.

The solution

In the early phase of the project, a sophisticated technical concept emerged, in which the nature of the presentations, analyses and reports was described in detail. There was a very clear idea of ​​what the system should do later. Particularly important was the ability to capture performance at a glance. How good is Munich compared to Berlin, how successful are teams, what explains the differences in EBIT, why do some products achieve higher margins than others? What was the overhead cost and what is it that participants can learn to be even better on the next project?

In order to answer these questions, it had to be possible to consolidate key figures and compare performance directly at various levels: national company, region, branch, from the team to the individual employee.

The foundation for the technical part is the concept of Value Driver Trees. In a nutshell: with the help of a hierarchical structure, it is possible to analyse very precisely which levers need to be moved in order to be even more successful. A value driver tree can be used to identify dependencies on which adequate measures can be based, for example to better exploit customer potential.

Therefore 2 examples:

Example 1: From Quotation to Order

From-quotation-to-order-value-driver-trees

Each element of the driver tree always contains 4 values: minimum, maximum, average and the value of the selected organizational unit in relation to the unit to which it is compared. The green bar emphasizes that the selected organizational unit has a better performance than the average of the comparison units, the red bar indicates that the organizational unit is worse than the average.

The selection is made for: a specific period, the unit to be considered and the unit to be compared.

From the individual sales employee to the sales office, region, national subsidiary and the entire company, all values can be compared with each other.

Example 2: EBIT Analyse

EBIT Analysis value drivers trees

A special feature of the EBIT analysis is the consideration of wholesale and net sales. Gross sales are the sales invoiced in SD, while net sales also include the actual incoming payments, bad debt losses, discounts granted subsequently, etc. in the calculation.

The evaluations were presented in the enterprise portal using a JAVA application developed especially for this purpose, which displays the data of the underlying OLAP query results of the BW server. The data was transferred from the BW server to the enterprise portal using XML, which was generated by a specially developed function module on the BW server and then transferred to the portal.

The procedure

As part of the project procedure, an in-depth analysis of the technical requirements was initially carried out together with the department involved in the process.

The characteristics and key figures in the technical requirements were broken down into a set of more than 100 basic key figures and characteristic structures, which was verified together with the IT staff of the company and compared with existing data structures. Among other things, these basic key figures and feature structures formed the basis for the elements of the driver tree structures.

This was followed by the parallel development of the info providers and transformations in the BW backend system as well as the frontend JAVA application for the enterprise portal. The required key figures were largely pre-calculated to minimize OLAP runtimes. At the same time, the corresponding frontend application was designed.

After extensive tests and data verification, the application was handed over to the department and IT. As part of the Post Go Live Support, last errors were eliminated, and various optimizations were carried out.

If you have any questions regarding this topic, please do not hesitate to contact us. We at Inspiricon are looking forward to hearing from you!

Sources of the images: SAP SE

Author
Oskar Glaser Lead Consultant BI Reporting
Phone: +49 (0) 7031 714 660 0
Email: info@inspiricon.de
Advanced Analytics

Advanced Analytics: the new hype in Manufacturing

Manufacturing executives, by taking advantage of advanced analytics, can reduce process flaws, thus saving time and money.

In this article we will go through the main aspects of an Advanced Analytics project. The article addresses the executive managers, CDO’s, Data Scientists, BI Consultants, Developers and anyone interested in data science, analytics, innovation.

Let’s start by defining the Advanced Analytics concept:

As defined by Gartner, ‘Advanced Analytics’ is the autonomous or semi-autonomous examination of data or content using sophisticated techniques and tools, typically beyond those of traditional business intelligence (BI), to discover deeper insights, make predictions, or generate recommendations. Advanced analytic techniques include those such as data/text mining, machine learning, pattern matching, forecasting, visualization, semantic analysis, sentiment analysis, network and cluster analysis, multivariate statistics, graph analysis, simulation, complex event processing, neural networks.

Ok, time to draw the connection between general theory and  applicability

A deep dive in the historical process data is the right place to start an advanced analytics project. Here, patterns and relationships among process parameters should be identified. This can serve as a platform on which the factors that prove to have the greatest effect on the problematic KPI are being optimized. Data-wise, global manufacturers are in a very good position at the moment, they have huge amounts of real-time data and the capability to conduct such data science projects.

Starting an Advanced Analytics project can be overwhelming

Most companies encounter unique problems on the topic, however one of the recurring situations we noticed is that companies with long production cycles (months, maybe even years) in some cases have too little data to be statistically meaningful. One recommended approach in this situation will be to consider the situation from a long term perspective; executive managers should push to invest incrementally in systems and practices to collect more data on a particular complex process and then applying data analytics to that process. We observed first-hand that focusing on a particular process can be directly rewarding while serving also the role of the very first building block of a new, enhanced data strategy.

Let’s try to move away from theory towards practice and focus on a concrete scenario

Let’s take for example a real project we recently worked on. The objective was to discover actionable intelligence related to a specific error encountered in the production line of a major electronic components manufacturer.

As you might have been expecting, this type of project needs to be approached in a very agile manner. A hypothesis that maybe initially was part of the project core can be disproved in a matter of hours. At any moment you can be at square one again.

This aspect has repercussions on several elements like the project team, methodologies or technologies. We recommend you consider the following aspects:

  • The team should be as light and as agile as possible.
  • Ideally the technologies should also be as agile friendly as possible.

Please keep in mind that other factors like the specific scenario, budget, team skills, available infrastructure etc. could limit your options when you decide on the right team or technologies.

In our case, we were facing the situation of having knowledge both in SAP and Python based technologies, which is ideal. From an infrastructure point of view, for this specific project, we also could opt for either one. In the end, the choice was based on the solution’s agility and on the community support. Towards the end of this article I will present you the technologies used.

Methodology

If you want to use a standard process model to define your sprints there are two main options you can go with:

  • You can define your sprints based on a CRISP – DM (Cross-industry standard process for data mining)

Advanced Analytics

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 1: Cross-industry standard process for data mining (CRISP – DM)

  • A second standard process model that you can use is the ASUM – DM (Analytics Solutions Unified Method).

Analytic Solution Unified Method

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2: Analytic Solution Unified Method (ASUM)

There is no right or wrong option here, this list is not exhaustive, and a custom solution based on a standard methodology in a lot of cases can lead to better results.

Techniques

The main techniques we used for the project are summed up in the following overview defined by McKinsey:

Technologies McKinsey

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 3: McKinseys’ Techniques

Custom techniques

On top of the basic techniques you might have to go the extra mile. An example would be a Simulation vs Correlation Analysis. In our case a Correlation Analysis was looking very promising, but we were missing the data to properly isolate the correlation.

In this case we managed to figure out the function that would output the respective trend line and map it to an existing hypothesis. The hypothesis based simulation mapped the trend lines, meaning that the hypothesis was validated.

Let’s take a look on some of the results we achieved

Some of the actionable intelligence we got resulted from putting together the client’s expertise and our data science knowledge. The deliverable emphasized the following features:

  • It isolates the erroneous behavior to only three products (technique used – Data Visualizations)
  • The client managed to optimize the machines’ workload based on error rate performance indicators (technique(s) used: Data Visualizations / Significance Testing)
  • We identified trends in the relation between the packaging parameter value and the error rate (technique used: Correlation Analysis)
  • By doing simulations we validated a hypothesis pointing to the process stage and wafer where the error takes place (Technique used: Simulation vs Correlation Analysis)

Conclusions after delivery of our solution

The project made it possible for the client’s manufacturing professionals to engage in more fact-based discussions, comparing the real impact of different parameters before taking actions with the scope of improving productivity.

Most importantly, it enabled them to dynamically enhance the manufacturing process by setting up experiments for productions optimization.

In the end, our data science goals are those of bringing structure to big data, searching compelling patterns, and finally bringing changes that suit the respective business needs (Data → Knowledge → Actionable Intelligence).

As promised we will finally have a look at our technology setup

Core technologies:

  • Python
  • Jupyter Notebook – web based Python environment
  • Anaconda – package and environment manager and Python distribution
  • Pandas – Python library for data manipulation, slicing & dicing
  • NumPy – support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
  • Bokeh – visualization library (being interactive was a big plus (especially useful was the Zoom functionality)). Several other libraries are available that are simpler to use and might fit your needs but even if Bokeh is a bit complex, the features it offers are great, very customizable. We highly recommend it.
  • SciPy – a free and open-source library used for scientific computing and technical computing.
  • Scikit-learn– a free software machine learning library. It features various classificationregression and clustering algorithms including support vector machinesrandom forestsgradient boostingk-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

Secondary technologies:

  • SAP HANA
  • SAP Lumira Discovery
  • Orange – open-source data visualization, machine learning and data mining toolkit

At the very end, for those of you, who are tech savvier, feel free to download the next file in which we put together several code snippets and notes, in order for you to get a glimpse on what this type of development entails.

Would you like to find out more about our methods and projects or is there anything else we can do for you? Be sure to come back to our blog for regular updates. Also, don’t hesitate to contact us directly, we’ll be happy to hear from you!

 

Sources of the images: McKinsey & Company, IBM, Bosch Software Innovation

Resources: McKinsey, Gartner, Wikipedia

Author
Ovidiu Costea Consultant SAP BI
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de
LoB-Supply-Chain-Management

New Line of Business: Supply Chain Management

After four years of dynamic growth, Inspiricon has decided to establish a new line of business: “Supply Chain Management”. This restructuring will help Inspiricon further raise its profile, combine its process skills in the field of supply chain management with its extensive SAP Business Intelligence knowledge, and better align its expertise with market demands.

In its relatively short company history, Inspiricon has already build an impressive track record: From the introduction of a global price tracking tool for the strategic purchasing division of Giesecke & Devrient GmbH and the provision of targeted and systematic project management and coaching services to ensure the timely introduction of a production control center at Kulmbach Töpfer GmbH to the optimization of the global logistics processes for a world leader in the additive manufacturing sector – over the past few years the list of success stories has only grown longer.

The main focus of the line of business:

Supply chain analytics and performance measurement:

Supply chain analytics creates transparency and visibility along the supply chain, unlocking hidden potential in logistics and supply chain management. Typical objectives are the improvement of delivery performance, responsiveness and adaptability of supply chains, lowering order lead times, reduction of inventories, increasing capacity utilization and/or an improved planning accuracy. To achieve that performance metrics are selected and visualized.

Performance metrics help to resolve conflicting objectives and identify deviations, opportunities, and risks; provide insights on inefficiencies, help determine opportunities and allow companies to measure the performance of their processes, organizational units or employees. They provide a foundation for the implementation of required optimization measures.

To maintain the validity of performance metrics, their number should always be limited to a reasonable amount. Additionally, companies need be able to identify interrelationships and assess root causes. In a well-tuned system of performance metrics, these relationships can be broken down all the way to the level of basic operational data. The initial question in the definition of such a system is always: Which metrics are worth analyzing at all?

Supply-Chain-Analytics-Workshop-Inspiricon

 

Supply chain analytics applies analytical software and methods to supply chain processes and data – from planning, procurement of raw materials, manufacturing and distribution to final transport to the customer. In addition to big data, real-time data analytics, and self-service BI, mobility and cloud are becoming increasingly interesting in this field. When aligned to business requirements on the one hand and IT-strategic prerequisites and considerations on the other, these disruptive technologies already today are transforming the way companies plan and steer their supply chains in a meaningful way.

Our services in the new line of business: From the development of a performance measurement strategy and the definition of relevant performance metrics to concrete implementation of SAP technology – with the new line of business, Inspiricon leaves no relevant aspect of a supply chain analytics project uncovered. Once we have developed a customer-specific strategy and captured every single requirement, the next step is to discuss implementation scenarios, design customized solutions, and implement them with a constant eye on meeting the stated objectives.

By giving actionable recommendations on the selection of suitable tools and providing helpful best practice guidance on how to apply them to the implementation of visualizations as well as for analysis and assessment of data, we help our customers to take care that essential information and insights do get to the right people in the right way and at the right time. Hereby we enable our customers towards faster decisions, which do support their continuous business success.

Do you require assistance in selection of strategic performance metrics or would you like to know more about the performance metrics in SAP’s Business Intelligence software? We will be happy to bring you up to speed in a one-day workshop.

Supply chain digitalization and Industry 4.0:

The increasing pace of digitalization, which is further accelerated by the internet of things, is creating new challenges in the business intelligence space. Industry 4.0 goes beyond opening up new perspectives and opportunities in shaping the supply chain, creating never-before-seen processes and business models. How can analytics technologies support this process and how are future requirements on modern BI systems going to change? What new possibilities and insights will come from applying them to the changing process landscape?

In their joint study titled “Procurement 4.0”, the Fraunhofer Institute for Material Flow and Logistics and the German Association for Supply Chain Management, Procurement and Logistics have come to the conclusion that, given the increasing levels of digitalization, complete autonomy in operational procurement processes could already be achieved today. Essential process activities would be implemented by means of modern technologies, which would not only result in a shift of focus towards strategic procurement and along with it in increased expectations and growing complexity, but also in a fundamental change in how the procurement role is perceived. It would increasingly move towards interface management or data analytics.

At the same time, the number of modern technologies employed in modern production lines is steadily increasing. “Smart factory” is a term used by researchers to describe the vision of manufacturing environments with production lines that, for the most part, are capable of organizing themselves without human interaction. Cyber-physical systems are to communicate with one another using the internet of things while being controlled using artificial intelligence (AI) methods such as machine learning.

Several logistics-related surveys have found that the majority of forwarding agents – companies, that contract logistics service providers (LSPs) to move cargo – and almost all logistics companies already leverage various technologies to track locations and movements. On top of that, there are several other basic technologies in the field of sensor or Auto-ID (RFID), mobile communications, or planning and simulation, which are very popular and widespread. And given the fact that these technologies don’t seem to be fully rolled out yet on a broader scale or used universally, one thing quickly becomes apparent: There is even more untapped potential, when it comes to the technologies that are built on top of them, for example in the area of big data. There is no shortage of companies that fall into the trap of collecting data without having a clear idea of how they can actually leverage it to their advantage.

Major objective of the line of business SCM is therefore to bring the two worlds together – the process perspective of the supply chain and the latest analytics technology – to combine them into new and innovative use cases. We focus on today’s trends and challenges in supply chain digitalization and keep a close eye on the latest technologies surrounding big data, advanced analytics, machine learning, and predictive analytics. Current approaches are already bringing actionable improvements in the context of supply chain risk management and tracking and tracing.

Supply-Chain-Analytics-Inspiricon-Workshop

Companies from a wide range of industries, from manufacturing and trade to consumer goods, pharmaceuticals and many more, do find that the services in our new line of business are exactly, what they are looking for. Would you like to find out more about our methods and projects or is there anything else we can do for you? Be sure to come back to our blog for regular updates. Also, don’t hesitate to contact us directly, we’ll be happy to hear from you!

Sources:  Supply-Chain-Controlling mit SAP BW (SAP Press)

Author
Daniel Schauwecker Lead Consultant Visualisation
Phone: +49 (0) 7031 714 660 0
Email: info@inspiricon.de

 

Machine Learning with SAP Hana and SAP Fiori-Inspiricon

Machine Learning with SAP Hana and SAP Fiori

What is Machine Learning and why is it important?

Well, first of all it is nowadays a hype. So, it is important to at least to know what the rest of the world is talking about.

And the reason why it is a hype, is because machine learning is bringing huge advances in various fields. It gives computers the possibility not only to perform certain tasks, but it also enables a computer to first learn the rules of performing a given task (learn from experience, from historical data).

Let us take the healthcare field for example, machine learning algorithms are successfully used to spot signs of various sever illnesses (breast cancer for example) as early as possible and reduce the risk on the patients.

Financial institutions also use machine learning algorithms for fraud detection and to combat money laundering. These algorithms are able to analyse millions of transactions and point out those that indicate suspicious patterns.

In the online security field, machine learning algorithms are used to track suspicious behaviour and detect privacy intrusions.

And also, we should not forget that we all use machine learning in our daily lives. Whether it is Siri we summon on our Apple device or Alexa on our home pod, whether we use Social Networks in the internet, or the Google Maps in our car, the core of these systems is powered by machine learning algorithms.

And in the daily operating business of companies, machine learning algorithms are automating basic tasks that would otherwise be done manually, like analysing invoices to detect duplicates, orders, etc. …

In the field of BI, one of the reasons why machine learning is important is because it is a part of the techniques used in predictive analytics. This gives employees the possibilities to predict certain results in the future. Sales people for example can make predictions of their sales volume, managers can evaluate multiple predictions of how certain decisions might impact future results, and make their decision based on these.

How does it work?

Let’s take a look at this quiz:

  • 2 → 4
  • 3 → 9
  • 5 → 25
  • 6 → ?

Now why have you been able to figure out that 36 is the right answer? Because you have recognized a pattern. And that is exactly what machine algorithms are doing. They are trained on sets of sample data where they are learning to recognize patterns and match these patterns to the correct responses (they are Learning from Experience). After the training we can query the algorithm for a response by providing it with a new set of data and what we get is (hopefully) an accurate response.

The machine learning algorithms are designed to work on problems much more complex than the quiz presented above, with a great number of input dimensions. This enables them to perform complex task like image or speak recognition, or forecasting some potential sales results based on complex historical market data.

Machine Learning with SAP Hana and Fiori

Since Machine Learning is such a hot topic, it generates a lot of curiosity and desire to experiment, and that was also the case for us. We at Inspiricon became curios how this new field could bring added values to area we are already working in, which includes BI, SAP Fiori and SAP Hana.

Well, it turns out SAP Hana has already a pretty robust support for Machine Learning. SAP provides the SAP Hana Predictive Analytics Library which offers the possibility to use machine learning algorithms and even build neural networks. Combining the power of this with SAP Fiori, it is possible to build some interesting applications in the field of Predictive Analytics. For example, we were able to build a small Fiori Application to predict the daily and monthly Sales figures for individual stores within a supermarket chain. Following illustration shows a rough overview of an architecture for this application:

Architecture_Application

The Fiori Application we have developed would be targeted to managers, and they would be able to explore the forecast until the end of the year from within the Fiori Application. Even more, we are experimenting further with this scenario and investigate how to extend it with other features like the integration of What-If Scenarios, such that one can investigate how certain management or marketing decisions (like promotions) can influence the predicted sales:

Fiori_Predictive_Demo

Conclusion

Machine learning can already be tackled with a simple Hana backend!

While there are powerful big tools out there like Tensorflow for neural networks or SAP Predictive Analytics, what is important to know is, that these are not necessarily mandatory in order to approach the topic. As explained above, SAP Hana already provides the means to build such approaches and with SAP Fiori it is possible to build an UI Application tailored for the specific scenario that is implemented. And the preliminary data analysis can be performed with powerful data analytics tools that are available for free for Python (Pandas) or R. So, with no additional cost in licensing or infrastructure this is can be a very attractive approach, especially for smaller problems that do not require intensive data processing.

Which approach is finally chosen however, depends on the specific use-case and shall be properly be evaluated by the development team. The maintenance of the solution and the license cost will also be an important factor for the owner of the solution and must be taken into account when making a decision.

Image sources: Inspiricon

Author
Gerald Iakobinyi-Pich Solution Architect
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de
Predictive Analytics in Native Hana

How To Use Predictive Analytics In Native Hana

In this article we will be focusing on Hana’s native predictive analytics capabilities, explaining step by step how to use PAL (Predictive Analytics Library) comprised in AFL (Application Function Library) to create a multiple linear regression model and how to use that model for predictions.

You want to get to know more about Predictive Analytics? Read our two other articles about this topic:

  1. Welcome to the World of Predictive Analytics
  2. How to Create Your Own Predictive Model

WHAT IS PAL?

We have previously discussed Predictive Analytics using SAP Business Objects Predictive Analytics. Another way to create trained models and predictions based on the created models native in HANA is by utilizing PAL (Predictive analytics Library). This method is more complex and requires the user to have technical knowledge.

While the application offered by SAP is more user-friendly, the idea of implementing HANA PAL solutions offers a greater possibility of including the data into other applications by creating a process native in HANA.

The easy way to use Predictive Analytics tool brings high costs, mainly caused by the license fee – in contrast to Hana PAL where the user only requires knowledge on how to use the PAL functions.

HOW DOES PAL WORK?

There is a great number of functions offered by Hana to be implemented for different scenarios (Clustering of data, Classification, Regression, Time Series Forecast etc.).

In this article we will be focusing on one function from PAL: Multiple Linear Regression.

We will be using the same data (Rossmann dataset) that we utilized in the second article of our Predictive Analytics series (How to create your own predictive model).

The dataset is offered by the Rossmann store chain on Kaggle, a website where people and organizations can upload real datasets for competitions or for the purpose of helping people develop data science skills.

For a detailed view on the codes you will need to built a multi-linear regression we provide 2 documents that will help you get through it:

PREQUISITES AND DATA HANDLING

Let us assume that you have already installed SAP HANA STUDIO on your computer. Also, a SAP HANA server connection is required to be able to use the Hana database and engine. It is recommended to have the latest version of the HANA database (2.0). In our example both algorithms are supported by HANA 1.0 SPS 09 and HANA 2.0 SPS 01.

In our scenario we have imported the data found on the train.csv and store.csv in our HANA database as a Calculation View.

In the train.csv file we have data regarding sales of different stores and the date when the sales were made, also information regarding promotions, client numbers, day of the week etc.

In the store.csv file we have data regarding master data for store such as, the distance of the competition of the store, type of the store, and assortment of store.

In figure 1.1 we can see the two data tables:

Fig. 1.1

In our created Hana Calculation View we will join the two files, in order to create a complete dataset so that our algorithms can make an accurate model for a precise prediction (as seen in Fig.1.2).

Fig 1.2

PAL ALGORITHM IMPLEMENTATION

In our scenario we will implement the prediction using multiple linear regression.

1.WHAT IS Multiple Linear Regression (MLR)?

Linear regression is an approach to modeling the linear relationship between a variable , usually referred to as dependent variable, and one or more variables, usually referred to as independent variables, denoted as predictor vector . In linear regression, data is modeled using linear functions, and unknown model parameters are estimated from the data. Such models are called linear models.

 2. HOW TO CREATE A MLR MODEL IN 7 STEPS?

Step 1: Definition of parameters

In order to create a multiple linear model the algorithm must have 2 input tables and 4 output tables and we need to create the definition of these tables.

Input Tables:

a. Data – this contains the columns used for training the model. The data must contain a primary key column (“ID”) that must be the first column defined. The second Column must be the target/dependent column (the “Y variable” that we will be predicting). The rest of the columns will be populated with the independent variables (X1, X2,….) that will help the algorithm make an accurate model)

b. Significance – this contains the coefficients of the model and their values (the higher the value the more effect it has on the model)

c. Fitted – The fitted values (“Sales” in our case) of the target

d. Results – the results of the model (model accuracy and confidence)

e. Exported model – the model that will be used for future predictions.

f. Control – the parameter that specifies how the algorithm should work (here we can enable and calibrate the elastic net penalties, to specify the thread numbers, the algorithms used to resolve the least square problem etc. .)

Step 2: Creating the Parameter Table

After creating the definitions for the model, we have to create the table that will contain all the parameters and we need to specify whether or not the tables are input or output parameters (Data and Control Tables will be input parameters and the rest will be output).

Step 3: Wrapper Creation

The wrapper will create a procedure that will use the specified function and the earlier created parameter table. We will use this procedure later on to create the regression model based on our training data.

Step 4: Population of Data Table

First we will create a new table that will have the same columns and column types as the definition created for the parameter table.

Step 5: Population of Control Table

In the control table the user specifies how the algorithm will work. It is similar to creating settings for your program. The following control settings can be specified:

a. THREAD_NUMBER -> specifies number of threads (only when algorithm 1,5 or 6 is used)

b. ALG -> Specifies algorithms for solving the least square problem:

  1. QR decomposition (numerically stable, but fails when A is rank-deficient)
  2. SVD (numerically stable and can handle rank deficiency but computationally expensive)
  3. Cyclical coordinate descent method to solve elastic net regularized multiple linear regression
  4. Cholesky decomposition (fast but numerically unstable)
  5. Alternating direction method of multipliers to solve elastic net regularized multiple linear regression. This method is faster than the cyclical coordinate descent method in many cases and recommended.

c. VARIABLE_SELECTION -> “0” for to include all variables, “1” Forward Selection, “2” Backward Selection

d. APLHA_TO_ENTER -> P-value for forward selection

e. ALPHA_TO_REMOVE -> P-value for backward selection

f. ENET_LAMBDA -> Penalized weight

g. ENET_ALPHA -> elastic net mixing parameter

In our example we will use ALG 6 with elastic net penalties enabled (a method used for the optimization of the regression model) and we will have a thread number of 5.

Step 6: Build output tables

Now all that remains to be done is to create the output tables based on the definitions.

Step 7: Create the model

Finally we have to call the procedure that we created in the wrapper and to view the results and to create a table where we can compare the fitted values with the real values

Results

The first result that will pop up is the Coefficient Table that will tell us the impact of each coefficient upon the model where 0 means it has almost no effect on the created model.

Fig. 1.3

The next output table will show us the fitted values of the training model data:

Fig 1.4

The last table will give us the statistics of the created model

  • R2 – is the power of the model, in our example our model has a 89,996% precision.
  • F – The F value is the ratio of the mean regression sum of squares divided by the mean error sum of squares.
  • AIC – The Akaike Information Criterion (AIC) is an estimator of the relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.
  • BIC – Bayesian information criterion (BIC) or Schwarz criterion (also SBC, SBIC) is a criterion for model selection among a finite set of models; the model with the lowest BIC is preferred. It is based, in part, on the likelihood function and it is closely related to the AIC.
  • MSE – the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors or deviations—that is, the difference between the estimator and what is estimated.
  • RMSE – Root Mean Squared Error
  • MAE – The mean absolute error (MAE)
  • MAPE – the mean absolute percentage error (MAPE), also known as mean absolute percentage deviation (MAPD), is a measure of prediction accuracy of a forecasting method in statistics, for example in trend estimation.

Fig 1.5

The comparison between the fitted and real values can be displayed in a more user-friendly way by using a chart where you can compare the real value and the predicted/fitted value (the sales values in our case) over time, as shown below:

Fig 1.6

Fig. 1.7

3. HOW TO CREATE PREDICITONS BASED ON MLR MODEL’S RESULTS

(1) Definition of parameters

Similarly to our model creation we are required to create parameter definition, the difference being that the data table “type” must not contain the target variable.

(2) Forecasting Procedure Wrapper

Our generated procedure will be fed the parameter table specifications and structures and will be using the ‘FORECASTWITHLR’ function.

(3) Input and output table creation and data generation and regression specification

Similarly to the model creation we will specify how the linear regression forecast function will handle the input data. The input data must not contain the target column.

(4) Coefficient table

The coefficients table’s data must be transferred from the Coefficient table from the model’s result.

(5) Forecast Procedure Calling

All that remains to be done is to call the previously created procedure.

(6) View Results

View the predictions made by the forecast algorithm.

 

Sources:

https://help.sap.com/viewer/2cfbc5cf2bc14f028cfbe2a2bba60a50/2.0.00/en-US/eedc9094daf04419bc25f6ed097ac03b.html

https://help.sap.com/doc/86fb8d26952748debc8d08db756e6c1f/2.0.00/en-us/sap_hana_predictive_analysis_library_pal_en.pdf

Author
Gellert Peter-Siskovits Associate
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de

Inspiricon keeps growing!

From 1st February 2018, you can also find uns in our new office in Freiburg, Schwarzwaldstraße 78b.

Close proximity to the market and our clients are of high importance for us – thus, we establish the new office in Freiburg.

Our consultants work not only with the classical SAP BI topics, but mainly with Big Data, Machine Learning and new SAP technologies.

We are looking forward to exciting new projects!

Inspiricon Office Freiburg

Author
Linda Schumacher Marketing
Phone: +49 (0) 7031 714 660 0
Email: info@inspiricon.de
inspiricon-text-analysis-sap-hana_jack-moreh

How to Get More Insights With Text Analysis with SAP HANA

We all know that knowledge is power. Every day we accumulate information, we share it or we post on social networks. Everything around us is pure information. What can we do with all this news? Simply, we can store it as data.

Data can be classified in two categories: unstructured and structured data.

Figure1 Unstructured Structured Data

Figure 1. Difference between unstructured and structured Data

What is unstructured Data?

The phrase “unstructured data” usually refers to information that does not reside in a traditional row-column database. For example Facebook, Twitter or Emails.

The benefits of structured data is that it can be identified and processed by machine. After the storage, data is so much easier to search, combine and filter for one’s own purpose.

What is structured Data?

Data that resides in a fixed field within a record or file is called structured data. For example a database.

So with Text Analysis we take unstructured data, transform it to structured data and analyze it.

What is Text Analysis?

The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for exploratory data analysis, research, or investigation.

Text Analysis powered by SAP HANA applies full linguistic and statistical techniques to extract and classify unstructured text into entities and domains.

With SAP HANA platform, you can gain real insights from your unstructured textual data. The platform provides search, text analysis, and text mining functionality for unstructured text sources.

There are several techniques used in Text Analysis in order to find a specific string or to perform linguistic searches.

  • Full Text Indexing:
    A full-text index is a special type of token-based functional index that is built and maintained by the Full-Text Engine for SQL Server.
  • Full Text Search:
    Full text search is designed to perform linguistic (language-based) searches against text and documents stored in your database.
  • Fuzzy Search:
    Fuzzy search is the technique of finding strings that match a pattern approximately. It is a type of search that will find matches even when users misspell words or enter in only partial words for the search.

In this article we will talk about Full Text Indexing.

Full Text Indexing: Table $TA

Creating a full-text index with parameter TEXT ANALYSIS ON triggers the creation of a table named $TA_<indexname> containing linguistic or semantic analysis results.

The set of columns in the table $TA is always the same regardless of the text analysis configuration used with the full-text index:

inspiricon-figure2-ta-table

Figure 2. Example for a TA Table

  • Key columns from source table (ID):
    The first columns in table $TA are a direct copy of the key columns from the source table.
  • TA_RULE:
    The rule that created the output. Generally, this is either LXP for linguistic analysis or Entity Extraction for entity and fact analysis.
  • TA_COUNTER:
    A unique sequential ID for each token extracted from the document.
  • A_TOKEN:
    The term, entity, or fact extracted from the document.
  • TA_LANGUAGE:
    The language of the document.
  • TA_TYPE:
    The type of the token. In linguistic analysis, this is the part of speech. In semantic analysis, it is the entity type or fact. (‘noun’, ‘StrongPositiveSentiment’, ‘Person’)
  • TA_NORMALIZED:
    The normalized version of the token. Inflection is maintained, but capitalization and diacritics are removed. This column is null for entity extraction.
  • TA_STEM:
    The stemmed version of the token. This field is fully un inflected and normalized. If the stem is identical to the token, this column is null. It is also null for entity extraction.
  • TA_PARAGRAPGH:
    The paragraph in the document that contains the token.
  • TA_SENTENCE:
    The sentence in the document that contains the token.
  • TA_CREATED_AT:
    Creation time of the record.
  • TA_OFFSET:
    Character offset from the beginning of the document.
  • TA_PARENT:
    The TA_COUNTER value of the parent of this token.

Built-in Configurations

Sap Hana has seven built-in configurations that are used to analyze the behavior and output of the text:

  • LINGANALYSIS_BASIC:
    it’s the most basic linguistic analysis, that tokenizes the file, but the normalization and stemming are not used, so the columns remain empty for TA_NORMALIZED and TA_STEM.
  • LINGANALYSIS_STEMS:
    Normalizes and stems the tokens so the TA_NORMALIZED and TA_STEM fields will be populated.
  • LINGANALYSIS_FULL:
    Uses full linguistic analysis, so all the columns in the $TA_Table will be populated.
  • EXTRACTION_CORE:
    It extracts entities from the text. For example: people, places, URLs.
  • EXTRACTION_CORE_VOICEOFCUSTOMER:
    It extracts entities and facts to identify positive and negative emotions associated with the tokens.
  • EXTRACTION_CORE_ENTERPRISE:
    It extracts data for enterprise. For example: mergers, acquisitions, organizational changes, and product releases
  • EXTRACTION_CORE_PUBLIC_SECTOR:
    It extracts security-related data about public persons, events and organizations.

Creating a table and index

We will make a practical example for the EXTRACTION_CORE_ VOICEOFCUSTOMER configuration, which identifies the sentiments (positive or negative feeling):

We have to create a table, insert values and create an index.

Open a SQL Console and write the following command:

inspiricon-figure3-create-table

Figure 3. Create Table

 inspiricon-figure4-create-Index-insert-values

Figure 4. Create Index and insert values

The created table will look like this:

inspiricon-figure5-created-table

Figure 5. Table

Finally we got the text analysis. As you can see “likes” or “enjoys” appear as a “weak positive sentiment”.

inspiricon-figure6-text-analysis

Figure 6. Text Analysis

This article was inspired from this blog: https://blogs.sap.com/2017/05/21/sap-hana-ta-text-analysis/

Author
Adelina Ramona Popa and
Lucian Tomeac
Associates SAP BI
Phone: +49 (0) 7031 714 660 0
Email: cluj@inspiricon.de