OLAP & DW

OnLine Analytic Processing and data warehousing are a natural fit. Once one acquires the breadth and depth of data to model much of the business process, one needs to select some form of Business Intelligence to leverage it. A data warehouse is in itself a considerable acheivement. However, the scale of the data can grow beyond the human resources available to review and use it effectively (NCR, 2000, p11). OLAP is a fundamental part of the solution.

In 1993 Codd et al defined OLAP as processing which implements twelve concepts. These were: “multidimensional conceptual view, transparency, accesibility, consistent reporting performance, client/server architecture, generic dimensionality, dynamic sparse matrix handling, multi-user support, unrestricted cross-dimensional operations, intuitive data manipulation, flexible reporting” and “unlimited dimensions and aggregation levels” (Codd et al, 1993). How many of these should be implemented immediately and how deeply to build them into the design might be a matter for discussion. But there is no question that they will all be beneficial to the usability of the data warehouse, and the more thoroughly they can be implemented over time the greater the benefit to both IT personnel and the user community they will represent.

Coronel et al (2011, p542) define dimensions as “qualifying characteristics that provide additional perspectives to a given fact.” In the case of a data warehouse built on a relational model, then, they may often be thought of as the columns of subsidiary tables. The dimensional model involves visualizing the database as one or more table(s) of facts linked to as many tables as required to provide all the additional attributes that have been collected. Taking an employee fact, for example, we will have a dimension for time, potentially including such events as hire-date, times-off, promotions and departure. OLAP involves such conceptual views by Codd's definition. It also requires that operations be available across dimensions. For example, taking our employee fact we may also have a dimension for performance reviews. Creating reports correlating this with promotions must be possible for OLAP. In addition, there can be no limit on this dimensionality. If we need to view a job history before a promotion discretely from that after, for example, OLAP should support that. Closely related is aggregation: if an analyst wants to consolidate that dimension across employees OLAP should support it. And lastly, if we introduce such a capability for any dimension this definition says we must strive to maintain it for all of them.

In order for our user community to make effective use of our data warehouse we must make it transparent. They shouldn't need to concern themselves with the details of how the data is stored, processed or retrieved. And we need to make it accessible. Getting data from the warehouse should be straightforward, and it should be presented in a “single, coherent and consistent” view (Codd et al, 1993). Also, no matter how much this store grows, we need to maintain performance. If our users can't get data in a timely manner we might as well not have this system. Part of this performance will come from how our tools handle sparse matrices. Our physical access methods should match the structure of the query. Codd suggests “direct calculation”, “B-trees and derivatives”, “hashing” and “the ability to combine these techniques where advantageous” at minimum (Ibid.). Our OLAP should also make data manipulation intuitive and reporting flexible.

This is not such a tall order as it might seem. Leveraging our current RDBMS and client/server architecture, it might even be seen as simply best practices. OLAP is a natural way to get information to our users. The current relationality of our design lends itself easily to dimensionality. It is critical that we exploit client/server completely and correctly, however. Client computers are where our users are going to frame their queries and where the data must be presented. So they must have the proper tools and interfaces. Conversely, servers are where the actual data is stored. It is therefore imperative that they be adequately powerful, and their software sufficiently sophisticated to support most or all of these features. In addition, we must ensure that the network fabric is high-speed and reliable enough to support the resulting traffic.

The current environment shouldn't be noticably impacted by OLAP initially. There's already an RDBMS data warehouse and a high-speed network. On the server side we need to ensure that our core tables are in adequate normal form and that there are no current or anticipated bottlenecks in RAM, disk, processing or network. We'll also want to think about dimensionality in the database design going forward, but if the core tables are well structured it should be safe enough to refer to them as fact tables and surround them with a star or snowflake schema of dimensions. The bulk of the immediate action items is likely to be the client softwares. OLAP is analytic processing after all. We need to implement the tools that support this functionality for our users and ensure that our infrastructure remains capable of supporting them. This effort will include an evaluation of extract-transform-load software to properly match it with our communities' usage. For maximum beneficial effect it will certainly profit us to bear Codd's twelve precepts in mind. Therefore, among our next steps will be identifying where we already comply, where we need prompt improvement, and what features may be of less immediate importance to our environment.

Coronel, C., Morris, S. & Rob, P. (2011) Database Systems, 9^th ed. Congage Learning: Course Technology

Codd, E.F., Codd, S.B. & Salley, C.T. (1993) Providing OLAP to User-Analysts: An IT Mandate [Online]. Available from: http://www.minet.uni-jena.de/dbis/lehre/ss2005/sem_dwh/lit/Cod93.pdf (Accessed: 26 August, 2011)

NCR (2000) Data Warehousing Report [Online]. Available from: https://elearning.uol.ohecampus.com/bbcswebdav/xid-201612_4 (Accessed: 24 August, 2011)