A data warehouse or data mart is simply a relational database where data is structured for MD (multidimensional) analysis. The difference between a data warehouse and a data mart is that the former are larger and more centralized, whereas the latter are smaller and more localized.

Reports, English Query and apps can also utilize data warehouses and marts directly but often the data warehouse or mart is a stepping stone to the creation of pre-aggregating cubes.

There are two kinds of tables in data warehouses and marts:

  • Fact tables two kinds of columns:
    • Dimension Foreign Keys that point to the Primary Keys of Dimension tables. EG:
      • SaleDate_FK
      • StoreID_FK
      • ProductID_FK
    • Fact Columns, aka measure columns, that store the actual fact for each measure. The data type for fact columns should be either date-time or numeric. EG:
      • UnitsSold int
      • Revenu money
  • Dimension tables are look up tables that store Primary Keys and columns for what will be the members of its dimension. EG tables with columns:
    • tblTime: SaleDate
    • tblStore: StoreID, State, City, ZipCode, and Name.
    • tblProduct: ProductID, Manufacturer, Line, and Item.

A Fact table and its Dimension tables are basically used to create the dimension hierarchies for a cube. Here are the basic structures for dimension tables good for cubes:

  • Star. A dimension (look up) table joined to one or more fact tables. This is the most common schema. EG:
    • tblProduct: ProductID, Manufacturer, Line, and Item.
  • Snowflake. A major dimension table may have its own secondary dimension (look up) tables. Only the major dimension tables are joined to one or more fact tables. EG:
    • tblProductSnowflake: ProductID, ManufacturerID_FK, Line, and Item.
    • tblManufacturer: ManufacturerID and Name.
  • Parent-Child. A dimension may have two columns (member and parent, both of the same data type) that are used to define a hierarchy tree with parent-child relationships. Note that if a member has a parent column that is its own member column, 0, null, or altogether missing in the member column, then that member is a top level member. EG:
    • tblFamily: PersonID, Name, and ParentID. GrandPa, Pa, Joey, and Susie would be in a Parent-Child dimension table like this:
      PersonID Name ParentID
      1 GrandPa 1
      2 Pa 1
      3 Joey 2
      4 Susie 2
  • Virtual. Essentially a dimension with members calculated at run time. A virtual dimensions is a logical dimension based on the contents of a physical dimension. These contents can be either existing member properties in the physical dimension or columns in the tables of the physical dimension.

Here are some additional notes:

  • The database that holds the tables for the data warehouse/mart may also house any tables used in data staging.
  • Different fact tables should share the same dimension tables for consistency.
  • Check the referential integrity of the data.
  • Each dimension table should be indexed on its primary key.
  • Each fact table should be indexed on a composite of all its foreign keys.

GeorgeHernandez.comSome rights reserved