MS Ignite Day #2 Recap

Day 2 was my opportunity to be focused and dive into a few technologies that were really peaking my interest, namely Azure Synapse Analytics and SQL Server 2019. Since I heard the Synapse announcement I wanted to make sure I sought out every session on this and understood everything I could possible learn and today certainly went a long way to help me understanding this new technology and also realized there is so much more to learn, but there are still 3 more days left.

With Azure Synapse everything I keep learning about this is it’s certainly impressive and I will stand by the statement that Snowflake, Google Big Query and Amazon Redshift better be paying attention because the complete integrated story, performance benchmarks and cost model have closed some big gaps that previously existed with Azure Data Lake and Azure SQL DW. One of the sessions I had a chance to attend was “Democratizing the Data Lake with On-Demand Capabilities in Azure Synapse Analytics” with Josep Aguilar Saborit and Charles Feddersen, where I came away with two very important takeaways. First, was the ease of being able to query and import data, in a nutshell get the data into Azure Data Lake Storage and it’s available. With that said it seems that Microsoft and most of the industry are standardizing on parquet as the file format of choice. Which if you are not familiar with is an extremely easy file format to get almost any file into, Azure Data Factory can really help here. Parquet format files are first class citizens within Azure Synapse and querying or importing this data is almost effortless.

Now let’s talk a little bit about the architecture behind Synapse and while yes Azure Synapse can be thought of as Azure SQL DW 2.0 you need to take everything you may already know about Azure SQL DW architecture and throw it out the window. The most important thing to remember is that the architecture behind this new service has been reinvented from the ground up and made to be a cloud first design. If you remember Azure SQL DW was not a cloud first design as it was basically the predecessor of SQL Server PDW just ported into the cloud with very little architecture change.

Now let me caveat the next few statements with I still have a lot to learn about Azure Synapse architecture but here are a few items that I have picked up on and I have been very impressed by. Both the failure detection and hotspot detection that is baked into the solution is quite impressive. First let’s understand that queries are natively treated as distributed and alight with both Hash and User partitions with the execution nodes, see picture below (Sorry these are pictures taken during presentations)

With failure detection being handled at the node level it allows for any failure at a node level to be automatically recovered and redistributed to the other active nodes. That’s right unless for some reason you lose all nodes your queries will not fail. If you are familiar with more legacy architectures, you would then know that this was not always the case.

Now with hot spot detection imagine the scenario that a query task was partitioned in a way where one node was doing a skewed amount of the work. What Synapse will do is recognize this skewed load and automatically redistribute this load to other nodes that are currently being underutilized. Oh, wait I am not done with the amazing special sauce, if multiple nodes become overloaded and distribution can’t be done then additional nodes can just be added to achieve ultimate performance.

Now let’s move on to SQL Server 2019 and new features in Azure SQL which I am just going to lump all in together. Now this is not a full comprehensive list of all new features just some things I wanted to call out. Let’s start with Azure SQL Serverless, if you are looking for a way to use Azure SQL and not incur a crazy cost then take a look at Azure SQL Serverless for scenarios such as Dev/Test environments (I’m sure there are other use cases but this is what I am focusing on for now). Typically in Dev/Test environments you don’t have the requirement to be up 100% of the time and therefore you don’t want to have the cost of running a database 24/7, and this is where serverless comes into play as you are only charged for the time that you are actually running queries. I will go out on a limb and say this could easily be a 40-50% cost savings for existing Dev/Test environments you have that are running on current Azure SQL Database and incurring 24/7 costs.

One more feature set I want to quickly hit on is the Intelligent Query Processing. With this new feature there are quite a number of areas of improvement so let’s just hit on a quick few. The first is the memory grant feedback, where if the initial execution if the memory grant was too small and caused spill to disk the subsequent executions will be adjusted to properly set memory grants and therefore reduce spill and increase query execution time. Table Variable deferred compilation now treats a table variable just as a temp table has always been done and therefore query cardinality is now no longer assumed to be a rowcount of 1. While I am not a huge table variable fan if you do have a workload that is very table variable heavy then just upgrading to SQL Server 2019 should result in performance gains just by upgrading.

One last note is on Azure Data Studio as I had a chance to see Alan Yu’s session on Azure Data Studio focused on notebooks. If you are not using Azure Data Studio and notebooks hurry up and start learning you will thank me later. Plus, Azure Data Studio now has PowerShell notebooks and is a one stop show for almost any database developer. Azure Data Studio November release

SQL Saturday NYC Update #4

SQL Saturday NYC is going full steam ahead and we have finalized our 2 Precons for Friday May 29th.  Before I announce what they are I wanted to first say thank you to everyone who submitted it certainly made making a choice very difficult.

 

Precon #1:  Bradley Ball will be doing SQL Server Performance Tuning check out the full event description here, and hurry we only have 40 available spots.

 

Precon #2: Allan Hirt will be doing The Good, Bad and Ugly of AlwaysON Availability Group you can check out the full description here.  We only have 30 seats for this class and have a great early bird offer of $200 off the regular price at $299 that ends May 1st so don’t delay.

 

On another note we have gained a few new sponsors with New Horizons and Profisee so check out there sites and see what they have to offer.

SQL Saturday NYC Update #3

SQL Saturday NYC Planning is in going well and we are now getting ready to finalize our precons as 2 of them are on Friday May 29th and we have had many submissions so far.  We are taking submissions until 1/15/2015 so just under a week left to email them to sqlsaturday380@sqlsaturday.com,   Our attendance numbers have been doing very well and we are now up to 103 people registered and 118 sessions submitted.  The session selections will surely be a tough one with so many good sessions submitted.   We will have a great lineup of speakers and I encourage everyone to register ASAP.

Thank you to all Sponsors so far as we now have 5 and anyone thinking of sponsoring hurry up as we only have 2 gold sponsor spots left that guarantee you a slot for the lunch sessions.

 

Now on to start planning Swag and Speaker Shirts.

Table Variables, CTE and Temp Tables. Which one should I use?

There are several really good blogs that already exists that explain the differences between Table Variable, Common Table Expressions (CTE) and Temp Tables and here are a few.

However once you know the differences how do you decide on which ones to use?  Now I know there is the always dreaded it depends which is an answer that I do truly hate so instead I decided to put together some best practices to serve as a guideline on when to use what.  

CTE

In general what I like to say is only use a CTE if you need recursion, and yes I do realize that there is almost no difference between a CTE and a derived table.  However I say only use if you need recursion because there are some cases when joining a CTE multiple times can create a really nasty execution plan. 

Table Variable

Use a Table Variable in the following scenarios:

  1. Small tables
  2. Data is only inserted once
  3. Data has no need to be updated
  4. Known number of rows that doesn’t change

One of the reasons for the above recommendations is that a Table Variable is assumed to have a low number of rows usually 1 for generation of execution plans which can cause a cardinality issue.  If you read Ben Snaidero’s article above you will also see the performance hits for updates to temporary tables.

Temporary Table

Use a Table Variable in the following scenarios:

  1. Large tables
  2. Need for additional DDL, such as an index
  3. Need to access data in child stored procedures
  4. Need statistics
  5. Data needs to be inserted multiple times or updated
  6. Unknown Data size

Temporary tables as you can see have a much broader use and are more of the catch all for everything that doesn’t fall into the CTE or Temp Table bucket.  One thing to remember is that if you have a data set that needs to be generated that needs to be reused through a process of multiple stored procedures is that a Temp Table can be used in child stored procedures in the execution tree.  This is a great way to generate temporary data only once and access it in multiple stored procedures. 

Conclusion

Remember that these are just best practices or in other words a starting point and one that can be generally applied regardless of the skill set of the SQL developer.   As with anything there is always times where the data science will call for an exception to the rule and when it does that is when you go outside the best practices. 

SQL Saturday NYC Update #2

Just over 2 weeks now since SQL Saturday NYC went live and planning is in full force.  An update on numbers is now 65 people are registered to attend and if you haven’t already make sure you get your spot.  We just got our newest sponsor and first Silver level and a big thank you to Pragmatic Works.  Also new sessions keep coming in almost every day and we now have 89 session submitted.   One big update is on the PreCons and we have decided to do 2 of them on Friday May 29th 2015, so be sure to send your requests, sqlsaturday380@sqlsaturday.com,  for a PreCon before 12/31/2014 so we can start confirming.

A tip to anyone else planning SQL Saturday’s OneDrive is a great way to share documents with your Committee.

How Functions function

If you are going to use functions first you should understand how functions work as they are often over used in database applications.  Personally I will start out by saying that I only use functions when I absolutely have no other choice.  This is something that I have seen many different opinions on so let just start out by taking a look at the different types of functions and how they work.

In SQL Server there are several different types of functions so lets take a look at what they are:

  • Table Value Functions (These are functions that return a table or dataset)
    • Inline (Contains a single T-SQL Statement)
    • Multi-Statement (Contains multiple T-SQL Statements and other functions such as Cursors and While loops)
  • Scalar-Valued Functions (Returns a single value)

 

There are many different reasons that developers use functions such as repeatable code, generic datasets, common logic etc…  These are all good reasons to use functions when used in the properly.  To understand properly lets start by taking a look at how functions work.

Table Valued Functions

There is a significant difference between the Inline vs. Multi-Statement functions when it comes to performance.  To help show some of these differences I created a few functions as well as a view and stored procedure to help show some differences that you can download here and goes against the Adventure Works database.  Lets first just take a look at what happens when we call the two functions of SALES.GetSalesOrderDetail() and SALES.GetSalesOrderDetail_Multi().   If you SET STATISTICS IO ON and SET STATISTICS TIME ON you will notice that the multi statement takes almost double the CPU Time (hint also run DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE to insure data isn’t cached between tests)  Most of this overhead is contained in the insert to the temporary table which is a needed overhead of multi-statement functions. 

Now before I show comparisons to views and stored procedures lets take a look at one other significant difference between these two functions and that is when we use them in a join statement.  For this test I am going to execute the following two commands

SELECT sod.*
FROM SALES.GetSalesOrderDetail() sod
join sales.SalesOrderHeader soh
on sod.SalesOrderID = soh.SalesOrderID

SELECT sod.*
FROM SALES.GetSalesOrderDetail_Multi() sod
join sales.SalesOrderHeader soh
on sod.SalesOrderID = soh.SalesOrderID

If you take a look at the execution plans you will notice a significant difference with the inline functions as the SQL can be interpreted and joined using a Merge Join operator and also notice the estimated row count is fairly accurate.  However with the multi-statement function you will see that the SQL can not be interpreted and the function is applied via a nested loop and sequence operator as well as the row count is not accurate and set to 100.  If you also take a look at the CPU time on the multi statement used with a join you will see about 3x the cost.  

Now just for a comparison run the similar query this time using the view.

SELECT sod.*
FROM SALES.SalesOrderDetail_View sod
join sales.SalesOrderHeader soh
on sod.SalesOrderID = soh.SalesOrderID

You will notice the same execution plan as the inline functions but on average you should see about 1/2 the cost of the CPU with the view compared to the function. 

Scalar-Valued Functions

Scalar Valued Functions operate very similar to the multi-step table valued functions in that they require significant CPU cycles and typically the migration path from Scalar Functions is to use inline T-SQL.  There are multiple ways to do this but lets take a look at the following example, which you can download the script to create the function here.

This is a simple function that is just concatenating some fields to format an address.  If we call the function vs do the same query inline you will see an almost 10x cost in CPU for the function. See the examples below.

SELECT
    Person.FormatAddress(AddressLine1,AddressLine2,City,StateProvinceID,PostalCode) AS FormatAddress
FROM
    Person.Address;

GO;

SELECT
    COALESCE(AddressLine1,”) + ‘ ‘
    + COALESCE(AddressLine2,”) + ‘ ‘
    + COALESCE(City,”) + ‘ , ‘
    + COALESCE(st.StateProvinceCode,”) + ‘ ‘
    + COALESCE(PostalCode,”)
FROM
    Person.Address    Addr
JOIN
    Person.StateProvince    St
ON
    Addr.StateProvinceID    =    st.StateProvinceID;

 

Wrap-up

So what does all this mean?  In general if you want to ensure that your database isn’t using unnecessary CPU cycles avoid using functions.  In almost all performance tuning engagements that I have worked when there is a CPU utilization issue we were able to find offending Functions as the cause for this.  Functions are for lack of better words CPU hogs and the more operations that are within the function the worse it gets.  So if you already have function try and migrate them to inline T-SQL. In the case of Table Value functions look to see if you can accomplish the same thing via a stored procedure or if you have to use a Function at least make sure that it is an inline Table Value Function.

SQL Saturday NYC Update 1

It has been one week now since SQL Saturday NYC was announced and the planning committee and myself are very pleased.  We have 50 people registered to attend as of noon today and we have our first Gold Sponsor, Rehfeld and our first Bronze level sponsor with Ntirety.  The speaker community has also come out with great support and submitted 71 sessions, and I encourage to keep them coming.  If anyone is interested in doing a PreCon please let myself, AnotherSQLGeek@gmail.com or Robert Pearl, PearlKnows@yahoo.com know as we are also evaluating doing two possible precons.

One lesson learned right away was to setup a couple folders and rules in my email as I have been flooded with 100’s of emails and if I haven’t responded to anyone yet I certainly will as I have been a little busy this past week.   On top of SQL Saturday being announced my wife and I welcomed our newest edition James A Phillips Jr. on Wednesday and everyone is very healthy.

jamesjr

SQL Saturday NYC

I excited to say that PASS SQL Saturday NYC is back on the schedule and will be May 30, 2015 at the Microsoft Times Square office.  It is something that we have been talking about for some time now at the NJSSUG.  It took some time to get the right people together and I have to thank Robert Pearl, Simon Facer, Erik Darling, Karla Landrum, Alec Lazarescu and Michael Wells.  

If you haven’t attended a SQL Saturday before then come on out and join us for a free day of Training, Networking and the all important #SQLFamily.  I encourage everyone to register to attend and if you want to speak here is the Call For Speakers Link.  Stay tuned for more announcements on speakers and possible Pre-Cons.   If anyone has any topics that they would like to see at a pre-con please contact myself at AnotherSQLGeek@gmail.com or Robert Pearl at PearlKnows@yahoo.com

Dating Tips Part 1

This is the first in a series of dating tips, so if you need you’ve come to the right place.

 

Well ok tips about how to use Dates in a Data Warehouse, Power Pivot, and Analysis Services both Tabular and Multidimensional.

So here in part 1 we are going to start at the basics and cover the date table. The date table is where all the magic happens and without it our dates are pretty meaningless.

Why is the table so important?

  1. Without a date table almost all Time Intelligence functions will not work
    1. Year to Date
    2. Rolling Quarters
    3. Parallel Period
    4. Etc…….
  2. Can’t build Date hierarchies such as (Year, Quarter, Month, Date)
  3. Will not be able to support Fiscal Calendars

Here is a script that will generate a Date Table for you. I am populating the table using a recursive CTE which does have a limit of loading 32767 dates.

There are also 3 Variables that you need to set the values for

@StartDate default is 1/1/1999

@EndDate default is 1/1/2050

@FYStartDate default is 4/1/1999, The year of this must be the same fiscal year as the StartDate