Post

What is a Data Warehouse?

A data warehouse, distilled to its simplest form, is a database that contains different tables of data that do not change over time. Query and dashboarding tools can be placed on top of the data warehouse to help the business answer questions (business intelligence).

Data is extracted from your operational databases/systems, structured or transformed into a standardized format, and then loaded into the appropriate table(s) in your data warehouse. This process is called ETL (Extract, Transform, and Load). The data is not normalized like the databases that drive operational systems.

Data from your operational systems, used to run your business, is loaded on a schedule to provide the business with as near real-time insights as possible. Depending on your business, you may only require daily insights, or you may require hourly insights. Sometimes, depending on the systems you’re using, it’s only possible to load the data into the data warehouse late at night or early in the morning. It’s a discussion item that needs to be factored into the overall equation when designing the data warehouse.

I don’t want to oversimplify data warehouses, because, depending on the size of the business, they can be massive undertakings and take a very long time to design. The way I describe what a data warehouse is, is meant to show that they aren’t really as unobtainable as some might think.

Some good reading material: The Data Warehouse Toolkit (Kimball & Ross)

This post is licensed under CC BY 4.0 by the author.