What is Data Warehousing?
Why do contemporary organisations have so much of interest in data warehousing?
In every industry from retail to service sector, from manufacturing companies to government organisations and from finance companies to banks, data warehousing is changing the way business operations are conducted and decisions are taken in organisations.
In an organisation, different databases are used to extract current and historical data to further store it in a data warehouse. A data warehouse is a centralised hub of the data. This data is subject-oriented, integrated, time-variant and non-volatile so that anyone such as business experts and end users can utilise it.
Table of Contents
A data warehouse is a combination of data from different enterprise sources. Thus, a data warehouse can be simply defined as a collection of data to support the information system process. Data warehouse stores very large amount of data to retrieve information that helps in decision-making.
Operating systems help to extract data from the data warehouse. These operating systems support flexible access to the data required. However, in a data warehouse, the retrieval of data is fast but data insertion is comparatively slow.
A data warehouse is a key data storage mechanism and plays a very crucial role in an organization’s information system. Although data warehouses and databases both are used to store data the data warehouse is more efficient than a database.
In a data warehouse, we can store huge amounts of data as compared to a database. A data warehouse is more effective in providing the organization with the required information.
Need for Data Warehousing
Every organization must use data warehousing if it deals with a huge amount of data. In order to implement data warehousing in an organization we need some additional hardware and software tools. These tools might seem expensive at first, but they deliver more value than they cost.
An organization has a critical need to implement data warehousing. The followings are some important factors that prove the need for data warehousing:
- Tables are hard to use for data access and analysis as they are mainly used to enhance data entry and validation performance.
- Data warehousing is the best way to integrate valuable data from different sources into the database of a particular application.
- Developing and storing metadata becomes easy with data warehouses, which is otherwise a hectic process as there is no definite place to store it.
- There are so many data fields such as rolled-up general ledger balances on computer screens which are frequently needed by users. These fields are provided by the data warehouse not databases and business experts become habitual in these fields.
- Reporting and analysis functions in databases often give poor performance. Therefore, data warehousing should be used for reporting and analysis.
- BI users perform various calculations on data and might misuse or corrupt the transaction data which makes data warehousing a necessity.
These factors prove that data warehousing is much needed in organizations. Since data warehousing has become more economical in the past few years, organizations can take complete advantage of data warehousing by implementing it in business and managing their data efficiently.
Goals of Data Warehouse
A data warehouse serves many purposes in an organization. It has a number of goals that help an organization manage the business effectively.
These goals are discussed as follows:
- Data integration: The data warehouse integrates data retrieved from different subject areas across time in such a way that users of the warehouse can easily obtain facts about the organization’s business.
- Data standardization and normalization: Standardisation and normalization of data are essential ways of making a data warehouse really valuable.
- Accessible Information: The data warehouse must provide contents that are understandable and clear to the business user. The contents of the data warehouse need to be as meaningful as possible.
The tools that provide access to the data warehouse must be easy to use and their time to return query results to the user should be the minimum.
- Consistent Information: A data warehouse should present the organization’s information consistently.
It means that all the data should be complete and provide full information after processing. Consistency also indicates that common definitions for the data warehouse contents are available for users.
- Adaptive and resilient to change: Change is an inevitable part of an organization and cannot be avoided.
User requirements, business circumstances, data, and technology are all factors that are meant to change with time. The data warehouse must be designed to manage and control this unavoidable change.
- Secure support for information: The data warehouse must provide security to the organization’s information. It also must effectively manage access to the organization’s confidential information.
- Foundation for improved decision-making: The ultimate goal of data warehousing is to provide efficient support for decision-making. The data warehouse must be provided with the appropriate data as input in order to provide effective decision-making.
Constituents of Data Warehouse
A data warehouse is made of various constituents or components that are essential to make it work efficiently. These components are combined together to make an effective data warehouse.
The components of a data warehouse are discussed as follows:
- Source Systems and Databases: Source Systems are data providers of transaction/production raw data, from where the details are extracted to make it appropriate for data warehousing. The sources can be quite diverse:
- Production Databases like Oracle, Sybase, SQL
- Excel Sheets
- Database of small-time applications like in MS Access
- ASCII/Data flat files
- Data Staging Area: The data staging area is the place where cleansing and grooming of data is performed after it is extracted from the Source Systems. Data staging comprises most of the crucial activities of a data warehouse.
These activities are also characteristically the biggest analytical and technical tasks of a project. These activities are extraction and transformation.
- ETL-Data Extraction: Data extraction is an activity, which extracts data from numerous data sources. Most of these sources are production systems used for transaction-level work.
- ETL-Data Transformation: If data extraction is mining the raw material of iron, data transformation is to create the steel products from it. The transformation ensures that the transaction-level raw data is converted into a form without losing details so that it can be loaded into the data presentation area.
The data presentation area is considered to be a set of integrated data marts. A data mart is a subset of the data warehouse and represents select data regarding a specific business function (Inmon, 1999).
- ETL-Presentation Area: This area is the depository where the data is finally loaded after being processed in the process of extraction and transformation. This area turns into the final source for information for several reasons such as queries and advanced data modeling.
- Meta Data: Meta data can be described as data about data that describes the data warehouse. It is employed to build, maintain, manage, and use the data warehouse.
It includes all the business and technical designs, rules and locations, etc. of all the data starting from the extraction to final data usage.
- End User Tools and Applications: Data is prepared for usage. There are many applications in which the data can be given as input and there are also tools that make it happen. These tools consist of reporting, publishing, analysis, modeling, and mining tools.
- Data-Warehouse Administration and Tools: Data warehouse is a big platform, which has a large number of users, data sources, and data targets. It has to be administered for better performance, timelines, and availability, similar to the production systems. This also consists of activity logging, data security, backing up, and archiving.
- Data- Marts: A data mart can be considered as a data repository that holds data of only one subject area such as finance, marketing, or sale.
- OLAP Servers & Data Marts: While the data warehouse can be accessed for any end-user tool`s application, it also provides information to OLAP Layer. For example, Human Resource Department might want to have its own data mart in its own separate servers due to confidential information.