The data required for MD (multidimensional) structures is acquired through the ETL process:

  • Extract . OLAP systems extract data on a preiodic basis from one or more production systems (usually OLTP relational databases, but can also be non-relational OLE DB readable data sources) at off-peak times.
  • Transform, aka data preparation or data staging. From a design point of view, the hard part is developing the scripts which transform (aka prepare or cleanse) the extracted data for the data warehouses or marts. This is because the warehouse or mart determines what sort of cube you can make.
    • Data must be merged from disparate systems, eg one source may store the same information with a different structure.
    • Data must be scrubbed for inconsistencies in the data, eg spelling errors or variations. It is a good idea to use surrogate keys : keys maintained at the warehouse that are independent of keys from the data sources.
    • Data must be pre-aggregated for faster analysis. This requires judicious design. The data should normally be as granular as possible since the cube will have much pre-aggregation.
  • Load. The transformed data is loaded into a relational database that is either a data warehouse or a data mart . OLAP data can further move from distributed data marts to a central data warehouse, or from a central warehouse to distributed data marts.

Parts of the ETL Process is initially done with hand made SQL scripts, DTS (Data Transformation Services), and the bcp utility. After that, the process can be automated and scheduled with SQL Server Agent and Replication.



GeorgeHernandez.comSome rights reserved