What Kind of OLAP Do We Really Need?
The narrow-sensed OLAP
OLAP is part and parcel of a BI application. As the name suggests, the word is an acronym for online analytical processing. Users, frontline employees, to be precise, are responsible for performing various types of data processing online.
But, the concept of OLAP tends to be used in a very narrow sense. It has almost become an equivalence of multidimensional analysis. Based on a prebuilt data cubic, the analysis performs summarization according to specified dimensions/levels and presents the aggregate values as a table or a diagram. It adopts drilldown, aggregation, rotation, and slicing to change the dimensions/levels and summarization range. The idea behind multidimensional analysis is this: extensive ground-based aggregate results are too broad to get a good insight into an issue; instead, data needs to be sliced into smaller parts and drilled down to more detailed and deeper levels for achieving a more valuable analytical purpose.
The broad-sensed OLAP
Is online analytical processing all about the multidimensional analysis?
There are some data analysis scenarios where a person who has a lot of experience in a field makes some predictions about their businesses. For example:
- An equity analyst predicts that stocks meeting certain conditions are most likely to rise;
- A sales manager knows which types of sales representatives are better at dealing with difficult customers;
- A tutor knows how the results of students who have very strong subjects and very weak subjects are like;
These guesses provide basis for predictions. After operating for a certain time, a business system will generate a huge amount of data, which could verify these guesses. Verified guesses can be used as principles to guide future decisions. If the guesses are proved wrong, re-guesses will be made.
It is the guess verification that the OLAP should focus. The guess-and-verify work aims to find principles or facts that support a conclusion based on historical data. An OLAP tool helps to verify guesses via data manipulation.
Of course guesses are made by experienced people in a certain field, instead of the software. The online analysis is necessary because, most of the time, guesses are made on the spot based on some intermediate results. It is impossible and unnecessary to pre-design a complete end-to-end path, which means the pre-modelling is unfeasible. The provisionality of the action also makes the IT resources unavailable when trying to verify it.
To counter the issue technologically, frontline workers must be equipped with the capability of querying and computing data in a flexible and interactive way. In the previously mentioned scenarios, the possible computations are as follows:
- For a stock that has been rising for 3 days in a month, find the probability of continuous rising on the 4th day;
- Find the customers whose last orders were half a year ago but who placed an order after their sales representatives were changed;
- Get the rankings of the English scores of the students whose scores of both Chinese and Math are in top 10;
Limitations of multidimensional analysis
Obviously these computations can be handled based on historical data. But is a multidimensional analysis method helpful?
I’m afraid not!
The multidimensional analysis has two drawbacks: one is that the data cubic should be pre-created, giving users no opportunity of remolding it provisionally and requiring a re-creation for each new analysis; the other one is that the analytic operations over a data cubic are limited, including only drilldown, aggregation, slicing and rotation, thus it is difficult to cope with complex multi-step computations. Though the popular agile BI products in recent years that are capable of performing multidimensional analysis have much better operation fluency and far more attractive interface than the early OLAP products have, their essential functionalities remain unchanged and no improvement is made about the inabilities.
Yet multidimensional analysis has values, like locating the exact source of the high cost. But it can’t get a principle that is crucial for predicting and guiding a future move based on data. In this sense, online analytical processing should be more than multidimensional analysis.
What kind of OLAP do we need?
What functionalities the OLAP software for verifying a speculation should have?
As mentioned previously, verifying a speculation is a process of data query and computation. It is vital that the query and computation can be defined by frontline workers without the help of IT specialists. In the current application context, an OLAP platform needs to have the following two functionalities:
1. Associated query
The first thing for performing an analysis is acquiring data. Many organizations have their own data warehouses for non-IT employees to access and perform queries. An important issue is that most of the OLAP software doesn’t provide convenient associated query functionality for the frontline employees. Instead, IT specialists need to first create a model to solve the associated query (which is similar to creating a data cubic for performing multidimensional analysis). Usually not all real-life demands can be handled with this single model, and IT rescue is still needed. This makes online analytical processing not online any more.
2. Interactive computation
After data is collected, computation begins. The distinguishing characteristic of the speculation-verifying computation is that, instead of a ready-made program, the next move is determined based on the result of the previous move. The process is highly interactive, which is similar to the computation with a calculator. Furthermore, it is the structured data in batches, instead of numbers, that needs to be processed. The OLAP tool thus becomes a data calculator. Excel is interactive to some degree, making it the most popular desktop BI tool. But Excel doesn’t give sufficient support for dealing with multi-level data and regular operations, thus unable to handle the speculation-verifying computation mentioned in the previous scenarios.
In later articles, we’ll analyze the current popular computing techniques to locate problems of handling the two types of computation, and suggest solutions to them.