SSMS 2- Loading Data to DW (Dimension Table)

This is an exercise to practise step to follow to load data to ETL.

1. We first identify the type of loading to be done. There are two main type of loading techniques.
      - Flush and Fill technique/Full Load/Destructive Load. Here the data in the data warehouse tables are deleted and then filled up. This technique is recommended to be used
              1. When we move data to staging tables.
              2. Load Target Table for the first time

We can use DELETE operation as well as TRUNCATE. TRUNCATE is usually faster because, it instead of deleting records row by row, deallocate the entire data and reseed the identity keys.
However, there is a catch with TRUNCATE. You will have to drop the all the foreign key references before you TRUNCATE the tables.

    - Incremental Loading (Delta Load). This is a less time consuming operation. Here, only the records which are update/added/delete are synced. This is a lot complex than Full load. However, if scripted well can save a lot of time. This technique is recommended to be used
             1. Subsequent changes
             2. When we need to track changes to data which cannot be tracked in OLTP (This is the case most of the times)

Usually, the updated/new data is identified using LastModifiedDate in Source Table, however, there are other ways too. T-SQL provide lot of useful utilities that can be used. Such as EXCEPT, INTERSECT, MERGE
This methods has 3 types depending on the type and level of history to be kept
- Slow Changing Dimensions Type1
- Slow Changing Dimensions Type2
- Slow Changing Dimensions Type3

It can be more ( based on the definition)
Read this article at KimballGroup.com http://www.kimballgroup.com/2013/02/design-tip-152-slowly-changing-dimension-types-0-4-5-6-7/

If you need more reading. I would recommend http://forum.cloveretl.com/viewtopic.php?t=7153. This has some read..

For our sample project. we will use Full Load. I can post a sample for incremental load later on when I have time.
Below are the tables in my database.

There are only 2 main dimension tables
  - DimCustomers
  - DimProducts

and 1 Fact table
  - FactSales

DimDates and ETLNullStatuses are non essentials dimensions for ETL process. However, they are essential for reporting. They add more contexts to the NULL values and Dates. It helps differentiate between the same NULL for product not delivered and NULL for Wrong Product.

Similarly, DimDates, is the lookup table to provide context for type of date. Find, Quarter, Fiscal Year, Weekday and all other date related stuff. This again helps a lot in terms of reporting.


There is a common mistake that happened with me is that after running the script, I did not see the data update in table because, I was using wrong database.

Below code is to make sure that we are using the right database for our further operations.



--****************** [ DWAdventureWorksLT2012Lab1 ETL Code ] *********************--
-- This file will flush and fill the sales data mart in the DWAdventureWorksLT2012Lab1 database
--***********************************************************************************************--
Use DWAdventureWorksLT2012Lab01;
go



Since we are going to use Full load technique and that too using TRUNCATE, we would need to drop the reference first. This is how we drop the constraints


--********************************************************************--
-- We should drop the foreign key constraints before truncating the table. 
-- We need not drop the key constraints before delete operations
-- Drop Foreign Key Constraints
--********************************************************************--

ALTER TABLE dbo.FactSales DROP CONSTRAINT
 fkFactSalesToDimProducts;

ALTER TABLE dbo.FactSales DROP CONSTRAINT 
 fkFactSalesToDimCustomers;

ALTER TABLE dbo.FactSales DROP CONSTRAINT
 fkFactSalesOrderDateToDimDates;

ALTER TABLE dbo.FactSales DROP CONSTRAINT
 fkFactSalesShipDateDimDates;   

Now, TRUNCATE the table.


--********************************************************************--
-- Clear Table Data
--********************************************************************--

TRUNCATE TABLE dbo.FactSales;
TRUNCATE TABLE dbo.DimCustomers;
TRUNCATE TABLE dbo.DimProducts; 


Now, we will start with loading the dimension table first. I believe it is good idea to do that first and then add data to fact tables is common practise and my personal preference. However, you can do that simultaneously too.

Below, I am loading the dimension tables. This is the simplest way to show Extraction, Transformation and Loading.

--********************************************************************--
-- Fill Dimension Tables
--********************************************************************--

-- DimCustomers
-- <Add ETL Code Here>
insert into DimCustomers   --- Loading
 select [CustomerID] -- Extraction
   ,[CompanyName]
   ,[ContactFullName] = Customer.FirstName + ' ' + Customer.LastName --- Transformation
   from AdventureWorksLT2012.SalesLT.Customer
go

-- DimProducts
-- <Add ETL Code Here>
insert into DimProducts
 select [ProductID]
      ,[ProductName]  = Product.Name
      ,[ProductColor] = IIF(Product.Color is not null, Product.Color, '')
      ,[ProductListPrice] = Product.ListPrice
      ,[ProductSize] = IIF(Product.Size is not null, Product.Size, '0')
      ,[ProductWeight] = Product.Weight
      ,[ProductCategoryID] = Product.ProductCategoryID
      ,[ProductCategoryName] = ProductCategory.Name
   from AdventureWorksLT2012.SalesLT.Product
   JOIN AdventureWorksLT2012.SalesLT.ProductCategory
 ON Product.ProductCategoryID = ProductCategory.ProductCategoryID


go


If you are wondering about the GO command then read this https://docs.microsoft.com/en-us/sql/t-sql/language-elements/sql-server-utilities-statements-go

"GO signals the end of a batch of Transact-SQL statements to the SQL Server utilities."


No comments:

Post a Comment