Use Lookups the right Way

SSIS Lookup tasks are all too easy to abuse as they are easy to setup and not really think about much. Most commonly developers just drag the task in setup a reference table, map a column or two, add a column to be returned and click OK. Now while doing that will work no doubt it can cause a performance nightmare. So let’s setup this scenario then break down why you have to think about taking the easy way out.

Scenario:

Let’s use our favorite friend AdventureWorksDW and say we want to load the FactInternetSales table, which has 8 dimensions keys and those dimensions would be.

  1. DimProduct
  2. DimDate (OrderDate)
  3. DimDate (DueDate)
  4. DimDate (ShipDate)
  5. DimCustomer
  6. DimPromotion
  7. DimCurrency
  8. DimSalesTerritory

So if we created a SSIS package to load the using the easy way described earlier we would have a Source and 8 Lookups just to handle getting the lookup keys portion of this package. If all I did was create my lookups as table references I would have a data pull that looked something like this.

TableName Number of Rows Number of times Queried Row Size (KB) Total Data Size (KB)
DimProduct 606 1 7.79 4720.74
DimDate 2191 3 .14 920.22
DimCustomer 18484 1 1.84 34010.56
DimCurrency 105 1 .11 11.55
DimPromotion 16 1 2.12 33.92
DimSalesTerritory 11 1 .30 3.3

However, we don’t need all the fields from these lookups for we only need the fields to join on and the field to set the Dimension Key for. Each of the table queries should look more like this.

  1. DimProduct (ProductKey, ProductAlternateKey)
  2. DimDate (DateKey, FullDateAlternateKey)
  3. DimCustomer (CustomerKey, CustomerAlternateKey)
  4. DimPromotion (PromotionKey, PromotionAlternateKey)
  5. DimCurrency (CurrencyKey, CurrencyAlternateKey)
  6. DimSalesTerritory (SalesTerritoryKey, SalesTerritoryAlternateKey)

Now the Data footprint will be the below.

TableName Number of Rows Number of times Queried Row Size (KB) Total Data Size (KB) Reduction in Size %
DimProduct 606 1 .05 30.3 99.36%
DimDate 2191 3 .007 15.337 98.33%
DimCustomer 18484 1 .03 554.52 98.37%
DimCurrency 105 1 .0098 .049 99.56%
DimPromotion 16 1 .0078 .1248 99.63%
DimSalesTerritory 11 1 .0078 .0858 97.4%

As you can see the Reduction in size is significant and it is effected in three areas: The Database Engine, Network and SSIS Engine all have to deal with less data now. So please never choose a table as your source but rather take the time to write the query and save yourself from performance headaches down the road and you might make your DBA happy too.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s