Our website is made possible by displaying online advertisements to our visitors.
Please consider supporting us by disabling your ad blocker.

Warehousing Data: The Data Warehouse, Data Mining, and OLAP

Warehousing data is based on the premise that the quality of a manager's decisions is based, at least in part,on the quality of his information. The goal of storing data in a centralized system is thus to have the means to provide them with the right building blocks for sound information and knowledge. Data warehouses contain information ranging from measurements of performance to competitive intelligence (Tanler 1997).

Data mining tools and techniques can be used to search stored data for patterns that might lead to new insights. Furthermore, the data warehouse is usually the driver of data-driven decision support systems (DSS), discussed in the following subsection.

Thierauf (1999) describes the process of warehousing data, extraction, and distribution. First data extraction of operational production data takes place, and this data is passed on to the warehouse database. A server hosts the data warehouse and the DSS. This server then passes on the extracted data to the warehouse database, which is employed by users to extract data through some form of software.

Theirauf's model for data warehousing is as follows:

Theirauf's model for data warehousing

Warehousing Data: Design and Implementation

Tanler (1997) identifies three stages in the design and implementation of the data warehouse. The first stage is largely concerned with identifying the critical success factors of the enterprise, so as to determine the focus of the systems applied to the warehouse. The next step is to identify the information needs of the decision makers. This involves the specification of current information lacks and the stages of the decision-making process (i.e. the time taken to analyze data and arrive at a decision). Finally, warehousing data should be implemented in a way that ensures that users understand the benefit early on. The size of the database and the complexity of the analytical requirements must be determined. Deployment issues, such as how users will receive the information, how routine decisions must be automated, and how users with varying technical skills can access the data, must be addressed.

According to Frank (2002), the success of the implementation of the data warehouse depends on:

If properly designed and implemented, the goal of warehousing data is to drastically reduce the time required in the decision making process. To do so, it employs three tools, namely Online Analytical Processing System (OLAP), data mining, and data visualization (Parankusham & Madupu 2006).


OLAP

OLAP allows three functions to be carried out.

OLAP is basically responsible for telling the user what happened to the organization (Theirauf 1999). It thus enhances understanding reactively, using summarization of data and information.


What is Data Mining?

This is another process used to try to create useable knowledge or information from data warehousing. Data mining, unlike statistical analysis, does not start with a preconceived hypothesis about the data, and the technique is more suited for heterogeneous databases and date sets (Bali et al 2009). Karahoca and Ponce (2009) describe data mining as "an important tool for the mission critical applications to minimize, filter, extract or transform large databases or datasets into summarized information and exploring hidden patterns in knowledge discovery (KD)." The knowledge discovery aspect is emphasized by Bali et al (2009), since the management of this new knowledge falls within the KM discipline.

It is beyond the scope of this site to offer an in-depth look at the data mining process. Instead, I will present a very brief overview, and point readers that are interested in the technical aspects towards free sources of information.

Very briefly, data mining employs a wide range of tools and systems, including symbolic methods and statistical analysis. According to Botha et al (2008), symbolic methods look for pattern primitives by using pattern description languages so as to find structure. Statistical methods on the other hand measure and plot important characteristics, which are then divided into classes and clusters.

Data mining is a very complex process with different process models. One is the CRoss-Industry Standard Process for Data Mining (or Crisp-DM). The process involves six steps (Maraban et al, in Karahoca & Ponce 2009):

Business understanding -> data understanding -> data preparation -> modeling -> evaluation -> deployment

For more on data mining see the book "Data Mining and Knowledge Discovery in Real Life Applications", edited by Ponce & Karahoca (2009), available for free from intechopen.com where numerous other potentially relevant resources can also be downloaded.


Data Visualization

This process involves representing data and information graphically so as to better communicate its content to the user. It is a way to make data patterns more visible, more accessible, easier to compare, and easier to communicate. Data visualization includes graphical interfaces, tables, graphs, images, 3D presentations, animation, and so on (Turban & Aaronson in Parankusham & Madupu 2006).

DSS are other tools used in conjunction with warehousing data. These are discussed in the following subsection.


Alan Frost M.Sc., 2010 - Updated 2015
Like and Share!
Site News:
The site has received a much needed face lift after all these years.

A few general content updates have been implemented.

A short introductory video has been added to some pages.

More videos and content to come soon.
© 2010 - 2017. All rights reserved. Site last updated on 20 February 2017.
See our Privacy Policy.