The Truth About High-performance Analytics
Stories about Big Data Analytics often include platitudes from customers saying they now can do things they "never dreamed of." The problem with those statements is that most companies want to buy technology to fix an existing problem -- not to take care of things they never "dreamed of." If they happen to solve a problem they didn't know they had, well, that's the cherry on the sundae.
Technologists eager to share the transformative possibilities of new data management technologies have a messaging problem. There is no question that working in-database or in-memory shaves hours, if not days, off analysis. But, so what? For manufacturers the "so what" is perhaps a little easier to understand because this is one industry that is wasting data for lack of the means to quickly and effectively analyze it.
Modern manufacturing equipment is studded with sensors. Every part of the supply chain coughs out data, every step in production has data input (much of it automatically gathered), every repair or return of a defective item comes with a gold mine of information. Manufacturers certainly don’t need to go looking for data. In its 2011 Big Data report, the McKinsey Global Institute noted that manufacturing "stores more data than any other sector -- close to 2 exabytes of new data stored in 2010." Importantly, they noted that manufacturers need productivity growth and "will need to leverage large datasets to drive efficiency across the extended enterprise and to design and market higher-quality products."
So the data is there, but it is rarely absorbed, collected or properly analyzed. Let's look at a couple of examples of high performance analytics solving real world problems -- not those in someones dream.
Before we go further, though, let's talk about the difference between big data" and "high performance analytics" (HPA). For many, big data analytics (BDA) is simply a relational data base (RDB) or an online analytical processing (OLAP) cube in memory that does simple descriptive statistics. Yes, it is faster, but without advanced analytics like data mining, forecasting and optimization, organizations just get a poor answer faster. HPA means imbedding robust analytics into the database or into memory where the data is brought in or spreading the analytics task over multiple processors in a grid environment. This means the best answers arrive faster, and this is what drives real value. The examples below illustrate best practices in HPA.
Sensing an Opportunity
Temperature sensors are readily available for factory robots and other high-tech manufacturing equipment. Sensor technology is so advanced, and cheap, that it can sometimes report temperature readings every 2 milliseconds. But the data warehouse downloading the information can’t handle that kind of volume, so readings are collected in much larger increments of every five minutes, for example. Getting a recording every five minutes is helpful -- when the temperature starts to spike randomly it could indicate a pending equipment failure. But imagine if you could analyze the available information every 5 seconds? That level of data collection could better refine analytical models that would return a more precise estimate of equipment failures. This is not the "dream" of HPA, it's the reality. HPA can take huge data loads, process them quickly and return information that matters to the bottom line.
An End to Sampling
Data sampling is a good technique to model the characteristics of a large amount of data. It was developed because analyzing the whole data-set was just not feasible -- it took too long or could not be processed at all. But sampling takes time to develop and verify, and it adds time and effort into the analytics lifecycle. Statisticians and programmers must go to great lengths to make sure their samples are truly representative. And they must extract, transform and load in ways that are time consuming even with the best technology. In the interest of getting more accurate answers more quickly, eliminating sampling and analyzing the whole data set with HPA simplifies the process and delivers the most accurate results.
Brand Damage Averted
Some manufacturers try to avoid sampling -- but the result, without HPA, is long batch runs that hurt the organization in other ways. One manufacturer currently experimenting with HPA illustrates that point well. Prior to testing HPA, the manufacturer ran all of its warranty information in a 40-hour batch over the weekend. The same run now takes 20 minutes. What happens when you don’t need to wait over a weekend to get the results? For one, you have more opportunity to iterate the models and obtain the best results. More importantly, you can run "what if" scenarios that find hidden causes for warranty issues that can quickly be conveyed to the manufacturing floor. Running a what-if analysis doesn't work when it takes 40 hours to crunch the data. What if Toyota had been able to do this as its brake situation evolved a few years back? Could it have found the problem, and stopped the damage to its brand much sooner? I think so.
Bill of Material Analysis: It's Possible Now
Bill of material data is currently too large to link with sales records and warranty claims. Proper part and supplier analysis is not possible because the part number listed on the warranty claims is the replacement part, not the original part listed on the Bill of Material. Analysts need to more accurately select data for analysis, calculate part failure rates, and understand supplier performance in order to more effectively identify emerging issues, define root-causes, and reduce warranty costs. This just isn't possible without using HPA.
HPA has great potential to be a game changer for manufacturers that embrace it.
Mike Newkirk is the Director of Manufacturing and Supply Chain Solutions at SAS.