Manufacturing executives, by taking advantage of advanced analytics, can reduce process flaws, thus saving time and money.
In this article we will go through the main aspects of an Advanced Analytics project. The article addresses the executive managers, CDO’s, Data Scientists, BI Consultants, Developers and anyone interested in data science, analytics, innovation.
Let’s start by defining the Advanced Analytics concept:
As defined by Gartner, ‘Advanced Analytics’ is the autonomous or semi-autonomous examination of data or content using sophisticated techniques and tools, typically beyond those of traditional business intelligence (BI), to discover deeper insights, make predictions, or generate recommendations. Advanced analytic techniques include those such as data/text mining, machine learning, pattern matching, forecasting, visualization, semantic analysis, sentiment analysis, network and cluster analysis, multivariate statistics, graph analysis, simulation, complex event processing, neural networks.
Ok, time to draw the connection between general theory and applicability
A deep dive in the historical process data is the right place to start an advanced analytics project. Here, patterns and relationships among process parameters should be identified. This can serve as a platform on which the factors that prove to have the greatest effect on the problematic KPI are being optimized. Data-wise, global manufacturers are in a very good position at the moment, they have huge amounts of real-time data and the capability to conduct such data science projects.
Starting an Advanced Analytics project can be overwhelming
Most companies encounter unique problems on the topic, however one of the recurring situations we noticed is that companies with long production cycles (months, maybe even years) in some cases have too little data to be statistically meaningful. One recommended approach in this situation will be to consider the situation from a long term perspective; executive managers should push to invest incrementally in systems and practices to collect more data on a particular complex process and then applying data analytics to that process. We observed first-hand that focusing on a particular process can be directly rewarding while serving also the role of the very first building block of a new, enhanced data strategy.
Let’s try to move away from theory towards practice and focus on a concrete scenario
Let’s take for example a real project we recently worked on. The objective was to discover actionable intelligence related to a specific error encountered in the production line of a major electronic components manufacturer.
As you might have been expecting, this type of project needs to be approached in a very agile manner. A hypothesis that maybe initially was part of the project core can be disproved in a matter of hours. At any moment you can be at square one again.
This aspect has repercussions on several elements like the project team, methodologies or technologies. We recommend you consider the following aspects:
- The team should be as light and as agile as possible.
- Ideally the technologies should also be as agile friendly as possible.
Please keep in mind that other factors like the specific scenario, budget, team skills, available infrastructure etc. could limit your options when you decide on the right team or technologies.
In our case, we were facing the situation of having knowledge both in SAP and Python based technologies, which is ideal. From an infrastructure point of view, for this specific project, we also could opt for either one. In the end, the choice was based on the solution’s agility and on the community support. Towards the end of this article I will present you the technologies used.
If you want to use a standard process model to define your sprints there are two main options you can go with:
- You can define your sprints based on a CRISP – DM (Cross-industry standard process for data mining)
Figure 1: Cross-industry standard process for data mining (CRISP – DM)
- A second standard process model that you can use is the ASUM – DM (Analytics Solutions Unified Method).
Figure 2: Analytic Solution Unified Method (ASUM)
There is no right or wrong option here, this list is not exhaustive, and a custom solution based on a standard methodology in a lot of cases can lead to better results.
The main techniques we used for the project are summed up in the following overview defined by McKinsey:
Figure 3: McKinseys’ Techniques
On top of the basic techniques you might have to go the extra mile. An example would be a Simulation vs Correlation Analysis. In our case a Correlation Analysis was looking very promising, but we were missing the data to properly isolate the correlation.
In this case we managed to figure out the function that would output the respective trend line and map it to an existing hypothesis. The hypothesis based simulation mapped the trend lines, meaning that the hypothesis was validated.
Let’s take a look on some of the results we achieved
Some of the actionable intelligence we got resulted from putting together the client’s expertise and our data science knowledge. The deliverable emphasized the following features:
- It isolates the erroneous behavior to only three products (technique used – Data Visualizations)
- The client managed to optimize the machines’ workload based on error rate performance indicators (technique(s) used: Data Visualizations / Significance Testing)
- We identified trends in the relation between the packaging parameter value and the error rate (technique used: Correlation Analysis)
- By doing simulations we validated a hypothesis pointing to the process stage and wafer where the error takes place (Technique used: Simulation vs Correlation Analysis)
Conclusions after delivery of our solution
The project made it possible for the client’s manufacturing professionals to engage in more fact-based discussions, comparing the real impact of different parameters before taking actions with the scope of improving productivity.
Most importantly, it enabled them to dynamically enhance the manufacturing process by setting up experiments for productions optimization.
In the end, our data science goals are those of bringing structure to big data, searching compelling patterns, and finally bringing changes that suit the respective business needs (Data → Knowledge → Actionable Intelligence).
As promised we will finally have a look at our technology setup
- Jupyter Notebook – web based Python environment
- Anaconda – package and environment manager and Python distribution
- Pandas – Python library for data manipulation, slicing & dicing
- NumPy – support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
- Bokeh – visualization library (being interactive was a big plus (especially useful was the Zoom functionality)). Several other libraries are available that are simpler to use and might fit your needs but even if Bokeh is a bit complex, the features it offers are great, very customizable. We highly recommend it.
- SciPy – a free and open-source library used for scientific computing and technical computing.
- Scikit-learn– a free software machine learning library. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
- SAP HANA
- SAP Lumira Discovery
- Orange – open-source data visualization, machine learning and data mining toolkit
At the very end, for those of you, who are tech savvier, feel free to download the next file in which we put together several code snippets and notes, in order for you to get a glimpse on what this type of development entails.
Would you like to find out more about our methods and projects or is there anything else we can do for you? Be sure to come back to our blog for regular updates. Also, don’t hesitate to contact us directly, we’ll be happy to hear from you!
Sources of the images: McKinsey & Company, IBM, Bosch Software Innovation
Resources: McKinsey, Gartner, Wikipedia