This is the first in a series of dating tips, so if you need you’ve come to the right place.

 

Well ok tips about how to use Dates in a Data Warehouse, Power Pivot, and Analysis Services both Tabular and Multidimensional.

So here in part 1 we are going to start at the basics and cover the date table. The date table is where all the magic happens and without it our dates are pretty meaningless.

Why is the table so important?

  1. Without a date table almost all Time Intelligence functions will not work
    1. Year to Date
    2. Rolling Quarters
    3. Parallel Period
    4. Etc…….
  2. Can’t build Date hierarchies such as (Year, Quarter, Month, Date)
  3. Will not be able to support Fiscal Calendars

Here is a script that will generate a Date Table for you. I am populating the table using a recursive CTE which does have a limit of loading 32767 dates.

There are also 3 Variables that you need to set the values for

@StartDate default is 1/1/1999

@EndDate default is 1/1/2050

@FYStartDate default is 4/1/1999, The year of this must be the same fiscal year as the StartDate

With the introduction of the SQL 2012 SSIS Catalog there has been the question of what permissions are available and how can I secure projects deployed to the Catalog. Here is a brief run down of the available permissions as well as how to set them up.

First For a User to even see anything in the SSIS Catalog they will need permission to the SSISDB. If you want to give someone full access to both the data base and the SSIS catalog you can add them to the database role of ssis_admin. However to walk through how all the permissions work I am going to start by giving my test user just public access.

For each Folder in the SSIS Catalog you can secure as follows:

  1. Read
  2. Modify
  3. Manage Permissions
  4. Create Objects
  5. Modify Objects
  6. Execute Objects
  7. Read Objects
  8. Manage Object Permissions

The first three are related to the root folder of the Project Granting Read, Modify or Manage Permissions will only effect the root folder and in my opinion are honestly pointless. If you give a user permission to read just the folder all they can see is the project and environment folder. The modify from everything I can tell does nothing because the only option you can perform at the folder level is to rename the folder but if you try to rename the folder you will get an error saying you have to be a member of the ssis_admin role in the SSISDB and that role gives access to everything. As for manage permissions if you have that access you can give yourself access to everything else.

Read Access at the Folder Level

For Each Project the permissions can be defined as:

  1. Read
  2. Modify
  3. Execute
  4. Manage Permissions

To access permissions of the folder right click on the folder and select Properties then select permission and for a project navigate to the project and right click on the project and select Properties then select permisions.

Folder Level Access Project Level Access

To try and break down the remaining possibilities I created the chart below

Desired Access SSISDB permissions SSIS Catalog Folder permissions SSIS Individual Project Permissions
See the project exists Public Read None
Manage all permissions Public Read, Manage Permissions N/A
Execute All SSIS Packages in any Project Public Read, Execute Objects N/A
Modify All SSIS Packages in any Project Public Read, Modify Objects N/A
See all Packages in any Project Public Read, Read Objects N/A
Manage permissions to all packages in any Project Public Read, Manage Object Permissions N/A
Execute SSIS Packages in a specific Project Public Read Execute
Modify SSIS Packages in a specific Project Public Read Modify
See all SSIS Packages in a specific Project Public Read Read
Manage permissions to SSIS Packages in a specific Project Public Read Manage Permissions
Change Folder Names and Manage All objects ssis_admin N/A N/A

SSIS Lookup tasks are all too easy to abuse as they are easy to setup and not really think about much. Most commonly developers just drag the task in setup a reference table, map a column or two, add a column to be returned and click OK. Now while doing that will work no doubt it can cause a performance nightmare. So let’s setup this scenario then break down why you have to think about taking the easy way out.

Scenario:

Let’s use our favorite friend AdventureWorksDW and say we want to load the FactInternetSales table, which has 8 dimensions keys and those dimensions would be.

  1. DimProduct
  2. DimDate (OrderDate)
  3. DimDate (DueDate)
  4. DimDate (ShipDate)
  5. DimCustomer
  6. DimPromotion
  7. DimCurrency
  8. DimSalesTerritory

So if we created a SSIS package to load the using the easy way described earlier we would have a Source and 8 Lookups just to handle getting the lookup keys portion of this package. If all I did was create my lookups as table references I would have a data pull that looked something like this.

TableName Number of Rows Number of times Queried Row Size (KB) Total Data Size (KB)
DimProduct 606 1 7.79 4720.74
DimDate 2191 3 .14 920.22
DimCustomer 18484 1 1.84 34010.56
DimCurrency 105 1 .11 11.55
DimPromotion 16 1 2.12 33.92
DimSalesTerritory 11 1 .30 3.3

However, we don’t need all the fields from these lookups for we only need the fields to join on and the field to set the Dimension Key for. Each of the table queries should look more like this.

  1. DimProduct (ProductKey, ProductAlternateKey)
  2. DimDate (DateKey, FullDateAlternateKey)
  3. DimCustomer (CustomerKey, CustomerAlternateKey)
  4. DimPromotion (PromotionKey, PromotionAlternateKey)
  5. DimCurrency (CurrencyKey, CurrencyAlternateKey)
  6. DimSalesTerritory (SalesTerritoryKey, SalesTerritoryAlternateKey)

Now the Data footprint will be the below.

TableName Number of Rows Number of times Queried Row Size (KB) Total Data Size (KB) Reduction in Size %
DimProduct 606 1 .05 30.3 99.36%
DimDate 2191 3 .007 15.337 98.33%
DimCustomer 18484 1 .03 554.52 98.37%
DimCurrency 105 1 .0098 .049 99.56%
DimPromotion 16 1 .0078 .1248 99.63%
DimSalesTerritory 11 1 .0078 .0858 97.4%

As you can see the Reduction in size is significant and it is effected in three areas: The Database Engine, Network and SSIS Engine all have to deal with less data now. So please never choose a table as your source but rather take the time to write the query and save yourself from performance headaches down the road and you might make your DBA happy too.

Read the rest of this entry »

One of the common questions that I see is, “How do I make a connection dynamic?” Well there are two different ways I typically see the need for dynamic connections. The first is for different environments and the second is to actually programmatically change the connection to load multiple files.

So the first one is rather simple and the best way to implement different connection properties per each environment is to use Configuration Files (I’m not covering the SQL 2012 options in this post). Typically what I recommend there is:

  1. Create Variables to store ServerName, DatabaseName, Username and Password
  2. Enable Configuration files for your package
  3. Map the Variables to the Configuration Files

First to setup this example I set up package that has one connection in the connection manager called DemoSQLConnection and one package called ConnDemo.

To create Variables first make sure you can see the variables pane, you can do that by right clicking on the background of the Control Flow Tab and selecting Variables.


Alternatively you can go to the Menu bar and select View  Other Windows Variables


Once you can now see your variables window let’s now create Variables for our Connection.

  1. DemoSQLServer – Used to store the server name to connect to
  2. DemoSQLDatabase – Used to store the database name to connect to
  3. DemoSQLUserName – Used to store the User Name for the connection
  4. DemoSQLPassword – Used to store the Password for the connection

Make sure you always have your variables in the correct scope and have the proper data type, for this the scope should be at the package level and the type should be string. I have also put in some default values for the variables.


If you are using windows authentication for your connection you can skip 3 and 4. (Remember the Windows user running the package must have access to the database)

Now that I have created the Variables I can enable the Configuration Files and map the variables. To do this you can right click on the background and select Package configurations or go the Menu and select SSIS Package Configurations



Once selected a window should open and you want to select the checkbox for Enable package configurations and click Add


For this example I am going to add a XML Configuration File, name it democonfig and place it in the c: drive


Click next and then add your variables to the Configuration file. You might wonder why variables and not just put the connection string in the config, two reasons I do this. First is because sometimes you create different connection types to the same server such as an OLEDB and a SQL Destination and it’s easier to just map the variables to the connection strings and define once than it is to modify both connection strings. The second reason is if the password is in a variable you can encrypt that variable (I won’t be discussing how to do that in this post)


Click next and then Name the Configuration and click Finish.


Now one last step is to map the Variables to the connection string. I prefer to do this with a variable. Create a Variable called DemoSQLConnectionString and in the properties window change the Evaluate as Expression to true and click the icon in the Expression property.


Set the Expression to

“Data Source=”+ @[User::DemoSQLServer] +”;User ID=”+ @[User::DemoSQLUserName] +”;Provider=SQLNCLI10.1;Initial Catalog=” + @[User::DemoSQLDatabase] + “;Password=” + @[User::DemoSQLPassword]

And then press Evaluate Expression to test it.


One last step now and that is set connection string of the connection to the variable. To do this we do the following. We select the connection from the connection manager and edit the expression property of ConnectionString and set it to the @[User::DemoSQLConntionString] variable.


Now you can test this by creating a Data Flow and using this connection and if you want to test to make sure it’s all working correctly go and change the password or server in the configuration file and the connection should fail.

Now as for the second way of dynamically setting connections during run time. That will be part two of this as this has already got a little on the long side.

PART 2

SSIS Dynamic Connections Part 1

If you didn’t see part one I suggest you click the link above as I will be continuing from there.

For Part 2 we are going to show how to create a connection and then change the properties of the connection to process multiple files. So to set this up I created a Table called FilesToProcess in a database called ConnDemo.

Create
Table FilesToProcess

(

FileName
nvarchar(50),

FileLocation nvarchar(150)

)

With the following values.

I also created a table called StgImportFiles to import the files into.

CREATE
TABLE StgImportFiles

(

Col1 NVARCHAR(50),

Col2 NVARCHAR(50),

Col3 NVARCHAR(50)

)

What we are going to do is query this table to get all the files to process and then process each File by inserting them into the same table. Now remember to do this all the files have to have the same format.

I am going to expand on the Package from Part1 and just change the database name in the configuration file to be ConnDemo and then I will doing the following tasks.

  1. Add variables to store the file location and name as well as the return values of the query.
  2. Add Execute SQL Task to get the values from the FilesToProcess Table
  3. Add Foreach Loop Container to process the files
  4. Add DataFlow task to import the data.

The variables I am going to add are

  1. FileLocation – Store the Location of the File
  2. FileName – Store the Name of the File
  3. FileListResults – note that this is a Data Type of Object as this is required

Now I am adding an Execute SQL Task and Naming it “Get Files To Process”, setting the connection to be the DemoSQLConnection, setting the ResultSet to Full result set, then adding the following sql statement.

select FileLocation,FileName

from FilesToProcess

Then add a Result set and map it to the FileListResults variable. Your Execute SQL task should look like below and You must set the Result Name to 0 as that is the position of the result set.

Now we add the Foreach Loop Container and add the collection with enumerator “Foreach ADO Enumerator” and the map to the FileListResults variable. Then add the variable Mappings of FileLocation and FileName and should look like the screen shots below.

Next we add a Data Flow inside of the Foreach Loop and Create a File Connection Manager called Importfiles we will point this initially to one of the files we want to import so that we can properly set the format of the files.

Now that we have the Source File we can modify the connection string expression of the File to be @[User::FileLocation] + @[User::FileName]. This is what will now change the path of the file as each row of the returning query goes through the Foreach Loop.

The last step is to add to the DataFlow a Source pointing the File Connection Manger Imporfile and a destination pointing to the DemoSQLConnection and selecting the StgImportFiles table. Your package should now look like the below.

If you run the package the values in the StgImportFiles table should be there.

This process can be very useful if you receive multiple files in the same format that need to be imported and don’t want to write separate packages for each file. I have used this approach where we needed to import files for over 200 customers and the layout was a standard one that we had created and I only created one package to import all 200 files.

Recently I was talking with a customer and we were discussing what an ETL Framework was and what should a good one have. I had to pause for a second and really think about this. Overtime I have been either created or been involved with creating Frameworks for several projects and as with any code base you keep tweaking it over time but here is my initial list of what a good Framework should have.

  1. Flexible Execution Order
  2. Ability to Restart Either from the beginning or last failure point
  3. Logging of the following
    1. Row Counts
    2. Variables
    3. Errors
    4. Duration
  4. Easy to implement
  5. Easy to maintain
  6. Ability to send alerts

So what do each of these things actually mean? Well let me give a brief explanation of each one. A flexible execution order means that the order packages are executed should not be hardcoded or rigid but rather be able, with just changing a configuration setting, to change.

The ability to restart the ETL process is one that is quite important. For example if an error occurs in loading say a Fact Table would you want to have to re run the whole ETL? You shouldn’t have to go through the dimension table loads again when there was no issue loading them. Another example is what if you were loading 3 fact tables simultaneously and only one of the fact tables got an error? There would be no need to reload the other two fact tables only the one.

Logging to me is self-explanatory as you always want insight into what is going on in your ETL process. Row counts help when there is a performance issue and you can see if there was an increase in data volume size or sudden drop in row counts can show an issue with the source data. Knowing variable values help when you’re trying to debug and issue and replicate what was happening in production at that time. Errors well you always need to know that details of your errors to properly debug. Finally, duration helps you track performance over time and create performance reports that management always likes to see.

Easy to implement and maintain are vital to ensuring that the Framework will have a long shelf life and shortcuts won’t be taken to avoid using it. We all know developers are always looking for an excuse to cut corners and do things their own way.

Ability to send alerts is important because we all know we love getting woken up at 2 AM when the ETL has encountered an error. More importantly if there is an issue last thing you want is to be the last to know and walk into work in the morning with someone asking you, “why isn’t the data loaded?”

So the next question is how do you design a framework to handle all of this? Well that will be my next post! Can’t give away everything at once have to make you want to come back.