Database DevOps – Where do I start?

By now you have most likely heard the term DevOps and you are most likely working for a company that has implemented DevOps is some manner, that could be infrastructure, applications or database to just name a few. However, Database DevOps is one of the least implemented processes in the entire DevOps stack as of the time of this post.

So why is database DevOps lagging so far behind?

Well for the longest time that answer was simple and it was because as compared to AppDev and Infrastructure the tooling around Database DevOps was almost non existent or seriously lacking in functionality and performance. That is no longer the case though as there are quite a number of tools on the market now, but before I begin shamelessly plugging tools lets get back to the title of this post.

Where do I start?

This seems like a rather easy question on the surface but is it really? Of course it’s not otherwise why would I be writing this. So lets start with what I call the 5 Keys to DEVOPS success.

  1. Buy-in from Upper Management
  2. Collaboration with internal teams
  3. Proven Tooling
  4. Training, Training and more Training
  5. Thick skin (You are going to face resistance)

Let’s touch on each of these briefly.

Buy-In from Upper Management

You will have obstacles and things won’t go 100% smoothly, trust me. This is where you need the buy in from all the way up so that when you do hit that bump in the road management isn’t running for the hills saying I told you so. I personally imagined some of my developers singing their own version of Hamilton’s “Washington on your side” but singing “It must be nice to have the CTO on your side”

Collaboration with internal teams

From my experience many developers still have not worked within a Database DevOps framework and collaborating with the internal teams to understand their needs and create a collaborative     environment will be crucial to getting total buy in from everyone. One of the most important teams to collaborate with is your DBA team, since they hold the keys to the castle.

Proven Tooling

Make sure that whichever tool you choose it’s one that has a proven track record of success and has good vendor support behind it. When the hits the you want someone to call and help work through any issues. If you do choose to build it out on your own just be prepared and have a solution to providing support.

Training, Training and more Training

Provide your team with training and not just generic training, training that is specific to the tooling and processes you are putting in place. I learned the hard way that generic training on source control or build tools just will not cut it. Design training sessions that will answer specific questions people have to the processes you are implementing.

Thick Skin

Here might the most important thing of all. You will need thick skin because you will get people fighting you, try to go around the process and in some cases straight up sabotage the project. Use the QTIP (Quit Taking It Personally) approach and let it roll off your back and keep fighting the good fight.

What comes next?

Now we move on to some of the slightly more technical things you want to make sure that you handle. While culture and process are a large part of the first steps and overall success there are some items you want to make sure that you are taking care of from the technical side too. While most of my experience is in launching DevOps for Microsoft SQL Server the items I am going to list out here are applicable for many different platforms. So here we go!!

  1. Removing invalid objects

In many cases where there is no DevOps or standardized procedures in place invalid objects start getting left behind. If you are not familiar with this concept it’s when Functions, Stored Procedures, Views etc.. become invalid overtime as they are not being actively used and underlying tables change. It leaves you in a state where you have invalid objects in your database, and these will cause you a huge headache when implementing Database DevOps so get rid of them before you start.

  1. Choose your branching strategy

    Branching Strategy is important and there isn’t any right one that fits all situations. First you need to understand what the business process around your promotion process is. For example, if your database is single tenant, supports a single application and has iterative releases then following a single branch git-flow model will most likely work for you. If you have a multi-tenant database, with non-iterative releases and multiple applications then a multiple fixed branch approach that allows you to cherry pick your releases may be appropriate.

  2. Create a baseline

    I see this step skipped in many implementations and it almost always leads to a lot of sleepless nights. Creating a baseline of your database and making sure that baseline is valid is extremely importing. Easiest way I have found to do this is reverse engineer your database out, put it into source control then deploy it to a blank database. This will tell you quickly if your baseline is valid.

  3. Choose the right IDE/Toolset

    Choose an IDE that your team is familiar working with and then find the DevOps framework that works best with that IDE. I know this may not be the popular opinion on this subject but from experience not making the teams switch IDE’s is going to be critical for adoption and success. For example, if you use SQL Server and most of your team is very familiar with SSMS as your IDE then take a look at Redgate’s Automated Deployment tools. I am only recommending this as it’s a toolset that I have implemented with great success.

That wraps up this Post if you want to learn more feel free to contact me or check out this video where I talk about Database DevOps.

MS Ignite Day #2 Recap

Day 2 was my opportunity to be focused and dive into a few technologies that were really peaking my interest, namely Azure Synapse Analytics and SQL Server 2019. Since I heard the Synapse announcement I wanted to make sure I sought out every session on this and understood everything I could possible learn and today certainly went a long way to help me understanding this new technology and also realized there is so much more to learn, but there are still 3 more days left.

With Azure Synapse everything I keep learning about this is it’s certainly impressive and I will stand by the statement that Snowflake, Google Big Query and Amazon Redshift better be paying attention because the complete integrated story, performance benchmarks and cost model have closed some big gaps that previously existed with Azure Data Lake and Azure SQL DW. One of the sessions I had a chance to attend was “Democratizing the Data Lake with On-Demand Capabilities in Azure Synapse Analytics” with Josep Aguilar Saborit and Charles Feddersen, where I came away with two very important takeaways. First, was the ease of being able to query and import data, in a nutshell get the data into Azure Data Lake Storage and it’s available. With that said it seems that Microsoft and most of the industry are standardizing on parquet as the file format of choice. Which if you are not familiar with is an extremely easy file format to get almost any file into, Azure Data Factory can really help here. Parquet format files are first class citizens within Azure Synapse and querying or importing this data is almost effortless.

Now let’s talk a little bit about the architecture behind Synapse and while yes Azure Synapse can be thought of as Azure SQL DW 2.0 you need to take everything you may already know about Azure SQL DW architecture and throw it out the window. The most important thing to remember is that the architecture behind this new service has been reinvented from the ground up and made to be a cloud first design. If you remember Azure SQL DW was not a cloud first design as it was basically the predecessor of SQL Server PDW just ported into the cloud with very little architecture change.

Now let me caveat the next few statements with I still have a lot to learn about Azure Synapse architecture but here are a few items that I have picked up on and I have been very impressed by. Both the failure detection and hotspot detection that is baked into the solution is quite impressive. First let’s understand that queries are natively treated as distributed and alight with both Hash and User partitions with the execution nodes, see picture below (Sorry these are pictures taken during presentations)

With failure detection being handled at the node level it allows for any failure at a node level to be automatically recovered and redistributed to the other active nodes. That’s right unless for some reason you lose all nodes your queries will not fail. If you are familiar with more legacy architectures, you would then know that this was not always the case.

Now with hot spot detection imagine the scenario that a query task was partitioned in a way where one node was doing a skewed amount of the work. What Synapse will do is recognize this skewed load and automatically redistribute this load to other nodes that are currently being underutilized. Oh, wait I am not done with the amazing special sauce, if multiple nodes become overloaded and distribution can’t be done then additional nodes can just be added to achieve ultimate performance.

Now let’s move on to SQL Server 2019 and new features in Azure SQL which I am just going to lump all in together. Now this is not a full comprehensive list of all new features just some things I wanted to call out. Let’s start with Azure SQL Serverless, if you are looking for a way to use Azure SQL and not incur a crazy cost then take a look at Azure SQL Serverless for scenarios such as Dev/Test environments (I’m sure there are other use cases but this is what I am focusing on for now). Typically in Dev/Test environments you don’t have the requirement to be up 100% of the time and therefore you don’t want to have the cost of running a database 24/7, and this is where serverless comes into play as you are only charged for the time that you are actually running queries. I will go out on a limb and say this could easily be a 40-50% cost savings for existing Dev/Test environments you have that are running on current Azure SQL Database and incurring 24/7 costs.

One more feature set I want to quickly hit on is the Intelligent Query Processing. With this new feature there are quite a number of areas of improvement so let’s just hit on a quick few. The first is the memory grant feedback, where if the initial execution if the memory grant was too small and caused spill to disk the subsequent executions will be adjusted to properly set memory grants and therefore reduce spill and increase query execution time. Table Variable deferred compilation now treats a table variable just as a temp table has always been done and therefore query cardinality is now no longer assumed to be a rowcount of 1. While I am not a huge table variable fan if you do have a workload that is very table variable heavy then just upgrading to SQL Server 2019 should result in performance gains just by upgrading.

One last note is on Azure Data Studio as I had a chance to see Alan Yu’s session on Azure Data Studio focused on notebooks. If you are not using Azure Data Studio and notebooks hurry up and start learning you will thank me later. Plus, Azure Data Studio now has PowerShell notebooks and is a one stop show for almost any database developer. Azure Data Studio November release

SQL Saturday NYC Update #4

SQL Saturday NYC is going full steam ahead and we have finalized our 2 Precons for Friday May 29th.  Before I announce what they are I wanted to first say thank you to everyone who submitted it certainly made making a choice very difficult.


Precon #1:  Bradley Ball will be doing SQL Server Performance Tuning check out the full event description here, and hurry we only have 40 available spots.


Precon #2: Allan Hirt will be doing The Good, Bad and Ugly of AlwaysON Availability Group you can check out the full description here.  We only have 30 seats for this class and have a great early bird offer of $200 off the regular price at $299 that ends May 1st so don’t delay.


On another note we have gained a few new sponsors with New Horizons and Profisee so check out there sites and see what they have to offer.

SQL Saturday NYC Update #3

SQL Saturday NYC Planning is in going well and we are now getting ready to finalize our precons as 2 of them are on Friday May 29th and we have had many submissions so far.  We are taking submissions until 1/15/2015 so just under a week left to email them to,   Our attendance numbers have been doing very well and we are now up to 103 people registered and 118 sessions submitted.  The session selections will surely be a tough one with so many good sessions submitted.   We will have a great lineup of speakers and I encourage everyone to register ASAP.

Thank you to all Sponsors so far as we now have 5 and anyone thinking of sponsoring hurry up as we only have 2 gold sponsor spots left that guarantee you a slot for the lunch sessions.


Now on to start planning Swag and Speaker Shirts.

Table Variables, CTE and Temp Tables. Which one should I use?

There are several really good blogs that already exists that explain the differences between Table Variable, Common Table Expressions (CTE) and Temp Tables and here are a few.

However once you know the differences how do you decide on which ones to use?  Now I know there is the always dreaded it depends which is an answer that I do truly hate so instead I decided to put together some best practices to serve as a guideline on when to use what.  


In general what I like to say is only use a CTE if you need recursion, and yes I do realize that there is almost no difference between a CTE and a derived table.  However I say only use if you need recursion because there are some cases when joining a CTE multiple times can create a really nasty execution plan. 

Table Variable

Use a Table Variable in the following scenarios:

  1. Small tables
  2. Data is only inserted once
  3. Data has no need to be updated
  4. Known number of rows that doesn’t change

One of the reasons for the above recommendations is that a Table Variable is assumed to have a low number of rows usually 1 for generation of execution plans which can cause a cardinality issue.  If you read Ben Snaidero’s article above you will also see the performance hits for updates to temporary tables.

Temporary Table

Use a Table Variable in the following scenarios:

  1. Large tables
  2. Need for additional DDL, such as an index
  3. Need to access data in child stored procedures
  4. Need statistics
  5. Data needs to be inserted multiple times or updated
  6. Unknown Data size

Temporary tables as you can see have a much broader use and are more of the catch all for everything that doesn’t fall into the CTE or Temp Table bucket.  One thing to remember is that if you have a data set that needs to be generated that needs to be reused through a process of multiple stored procedures is that a Temp Table can be used in child stored procedures in the execution tree.  This is a great way to generate temporary data only once and access it in multiple stored procedures. 


Remember that these are just best practices or in other words a starting point and one that can be generally applied regardless of the skill set of the SQL developer.   As with anything there is always times where the data science will call for an exception to the rule and when it does that is when you go outside the best practices. 

SQL Saturday NYC Update #2

Just over 2 weeks now since SQL Saturday NYC went live and planning is in full force.  An update on numbers is now 65 people are registered to attend and if you haven’t already make sure you get your spot.  We just got our newest sponsor and first Silver level and a big thank you to Pragmatic Works.  Also new sessions keep coming in almost every day and we now have 89 session submitted.   One big update is on the PreCons and we have decided to do 2 of them on Friday May 29th 2015, so be sure to send your requests,,  for a PreCon before 12/31/2014 so we can start confirming.

A tip to anyone else planning SQL Saturday’s OneDrive is a great way to share documents with your Committee.

How Functions function

If you are going to use functions first you should understand how functions work as they are often over used in database applications.  Personally I will start out by saying that I only use functions when I absolutely have no other choice.  This is something that I have seen many different opinions on so let just start out by taking a look at the different types of functions and how they work.

In SQL Server there are several different types of functions so lets take a look at what they are:

  • Table Value Functions (These are functions that return a table or dataset)
    • Inline (Contains a single T-SQL Statement)
    • Multi-Statement (Contains multiple T-SQL Statements and other functions such as Cursors and While loops)
  • Scalar-Valued Functions (Returns a single value)


There are many different reasons that developers use functions such as repeatable code, generic datasets, common logic etc…  These are all good reasons to use functions when used in the properly.  To understand properly lets start by taking a look at how functions work.

Table Valued Functions

There is a significant difference between the Inline vs. Multi-Statement functions when it comes to performance.  To help show some of these differences I created a few functions as well as a view and stored procedure to help show some differences that you can download here and goes against the Adventure Works database.  Lets first just take a look at what happens when we call the two functions of SALES.GetSalesOrderDetail() and SALES.GetSalesOrderDetail_Multi().   If you SET STATISTICS IO ON and SET STATISTICS TIME ON you will notice that the multi statement takes almost double the CPU Time (hint also run DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE to insure data isn’t cached between tests)  Most of this overhead is contained in the insert to the temporary table which is a needed overhead of multi-statement functions. 

Now before I show comparisons to views and stored procedures lets take a look at one other significant difference between these two functions and that is when we use them in a join statement.  For this test I am going to execute the following two commands

SELECT sod.*
FROM SALES.GetSalesOrderDetail() sod
join sales.SalesOrderHeader soh
on sod.SalesOrderID = soh.SalesOrderID

SELECT sod.*
FROM SALES.GetSalesOrderDetail_Multi() sod
join sales.SalesOrderHeader soh
on sod.SalesOrderID = soh.SalesOrderID

If you take a look at the execution plans you will notice a significant difference with the inline functions as the SQL can be interpreted and joined using a Merge Join operator and also notice the estimated row count is fairly accurate.  However with the multi-statement function you will see that the SQL can not be interpreted and the function is applied via a nested loop and sequence operator as well as the row count is not accurate and set to 100.  If you also take a look at the CPU time on the multi statement used with a join you will see about 3x the cost.  

Now just for a comparison run the similar query this time using the view.

SELECT sod.*
FROM SALES.SalesOrderDetail_View sod
join sales.SalesOrderHeader soh
on sod.SalesOrderID = soh.SalesOrderID

You will notice the same execution plan as the inline functions but on average you should see about 1/2 the cost of the CPU with the view compared to the function. 

Scalar-Valued Functions

Scalar Valued Functions operate very similar to the multi-step table valued functions in that they require significant CPU cycles and typically the migration path from Scalar Functions is to use inline T-SQL.  There are multiple ways to do this but lets take a look at the following example, which you can download the script to create the function here.

This is a simple function that is just concatenating some fields to format an address.  If we call the function vs do the same query inline you will see an almost 10x cost in CPU for the function. See the examples below.

    Person.FormatAddress(AddressLine1,AddressLine2,City,StateProvinceID,PostalCode) AS FormatAddress


    COALESCE(AddressLine1,”) + ‘ ‘
    + COALESCE(AddressLine2,”) + ‘ ‘
    + COALESCE(City,”) + ‘ , ‘
    + COALESCE(st.StateProvinceCode,”) + ‘ ‘
    + COALESCE(PostalCode,”)
    Person.Address    Addr
    Person.StateProvince    St
    Addr.StateProvinceID    =    st.StateProvinceID;



So what does all this mean?  In general if you want to ensure that your database isn’t using unnecessary CPU cycles avoid using functions.  In almost all performance tuning engagements that I have worked when there is a CPU utilization issue we were able to find offending Functions as the cause for this.  Functions are for lack of better words CPU hogs and the more operations that are within the function the worse it gets.  So if you already have function try and migrate them to inline T-SQL. In the case of Table Value functions look to see if you can accomplish the same thing via a stored procedure or if you have to use a Function at least make sure that it is an inline Table Value Function.

SQL Saturday NYC Update 1

It has been one week now since SQL Saturday NYC was announced and the planning committee and myself are very pleased.  We have 50 people registered to attend as of noon today and we have our first Gold Sponsor, Rehfeld and our first Bronze level sponsor with Ntirety.  The speaker community has also come out with great support and submitted 71 sessions, and I encourage to keep them coming.  If anyone is interested in doing a PreCon please let myself, or Robert Pearl, know as we are also evaluating doing two possible precons.

One lesson learned right away was to setup a couple folders and rules in my email as I have been flooded with 100’s of emails and if I haven’t responded to anyone yet I certainly will as I have been a little busy this past week.   On top of SQL Saturday being announced my wife and I welcomed our newest edition James A Phillips Jr. on Wednesday and everyone is very healthy.


SQL Saturday NYC

I excited to say that PASS SQL Saturday NYC is back on the schedule and will be May 30, 2015 at the Microsoft Times Square office.  It is something that we have been talking about for some time now at the NJSSUG.  It took some time to get the right people together and I have to thank Robert Pearl, Simon Facer, Erik Darling, Karla Landrum, Alec Lazarescu and Michael Wells.  

If you haven’t attended a SQL Saturday before then come on out and join us for a free day of Training, Networking and the all important #SQLFamily.  I encourage everyone to register to attend and if you want to speak here is the Call For Speakers Link.  Stay tuned for more announcements on speakers and possible Pre-Cons.   If anyone has any topics that they would like to see at a pre-con please contact myself at or Robert Pearl at