OLAP
|
OLAP is an acronym for online analytical processing. It is an approach to quickly provide the answer to complex database queries. It is used in business reporting for sales, marketing, management reporting, data mining and similar areas. Some people have suggested that an alternative and perhaps more descriptive term to describe the concept of OLAP is Fast Analysis of Shared Multidimensional Information, or FASMI.
The reason for using OLAP to answer queries is speed. Relational databases store entities in discrete tables if they have been properly normalized. This structure is good for operational databases but for complex multi-table queries it is relatively slow. A better model for querying, but worse for operational use, is a dimensional database.
Contents |
Functionality
OLAP takes a snapshot of a relational database and restructures it into dimensional data. The queries can then be run against this. It has been claimed that for complex queries OLAP can produce an answer in around 0.1% of the time for the same query on relational data.
An OLAP structure created from the operational data is called an OLAP cube. The cube is created from a star schema of tables. At the centre is the fact table which lists the core facts which make up the query. Numerous dimension tables are linked to the fact tables. These tables indicate how the aggregations of relational data can be analysed. The number of possible aggregations is determined by every possible manner in which the original data can be hierarchically linked.
For example a set of customers can be grouped by city, by district or by country; so with 50 cities, 8 districts and two countries there are three hierarchical levels with 60 members. These customers can be considered in relation to products; if there are 250 products with 20 categories, three families and three departments then there are 276 product members. With just these two dimensions there are 16,560 possible aggregations. As the data considered increases the number of aggregations can quickly total tens of millions or more.
The calculation of the aggregations and the base data combined make up an OLAP cube, which can potentially contain all the answers to every query which can be answered from the data. Due to the potential number of aggregations to be calculated, often only a predetermined number are fully calculated while the remainder are solved when demanded.
Types of OLAP
Beyond the basic concept there are three types of OLAP - Multidimensional OLAP (MOLAP), Relational OLAP (ROLAP), and Hybrid OLAP (HOLAP). MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP. It uses a summary database, has a specific dimensional database engine and creates the required schema as a dimensional set of both base data and aggregations. ROLAP works directly with relational databases, the base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregation information. Hybrid OLAP uses relational tables to hold base data and multi-dimensional tables to hold the speculative aggregations.
Each type has certain benefits, although there is disagreement about the specifics of the benefits between providers. MOLAP is better on smaller sets of data, it is faster to calculate the aggregations and return answers but does create enormous amounts of data. ROLAP is considered more scalable and uses the least space but is the slowest at pre-processing and query performance. HOLAP is between the two in all areas, but it can pre-process quickly and scale well. The difficulty in implementing OLAP comes in forming the queries, choosing the base data and developing the schema, as a result of which most modern OLAP products come with huge libraries of pre-configured queries. Another problem is in the base data - it must be complete and consistent.
Products
The first product which performed OLAP queries was MDS's Express which was released in 1970 (and acquired by Oracle in 1995). However, the term was not invented until 1993 when it was coined by Ted Codd, who has been described as "the father of the relational database". But Codd's paper was financed by Arbor, the company which released its own OLAP product - Essbase (now owned by Hyperion) - a year earlier. As a result Codd's "twelve laws of online analytical processing" were explicit in their reference to Essbase.
Other well known OLAP products include Microsoft Analysis Services (previously called OLAP Services, part of SQL Server), IBM's DB2 OLAP Server (an OEM version of Essbase), SAP BW, Mondrian and products from SAS Institute, Brio, Business Objects, Cognos, MicroStrategy, Sagent and others.
Business performance management software plays a major role in the OLAP space.
External links
- Dimensional Modeling and OLAP Tutorial (http://freedatawarehouse.com/tutorials/dmtutorial/Dimensional%20Modeling%20Tutorial.aspx)
- Data Warehousing and OLAP: A Research-Oriented Bibliography (http://www.ondelette.com/OLAP/dwbib.html)
- OLAP Report: In depth overview of all commercial OLAP products (http://www.olapreport.com)
- Microsoft OLAP information (http://www.mosha.com/msolap)
- Mondrian Java R-OLAP server (http://mondrian.sourceforge.net/)
- The Eclipse project's Business Intelligence and Reporting Tools Project (http://eclipse.org/birt/) (BIRT)
- A Dimensional Modelling Manifesto (http://www.dbmsmag.com/9708d15.html)de:Online Analytical Processing