Flyway and tSQLt – migrating to warmer test climates

“If you truly have faith in your convictions, then your convictions should be able to stand criticism and testing.”
DaShanne Stokes

Welcome fellow TestDriven-Development enthusiasts… is what I would say if i actually ever did TDD and didn’t just, you know… write regular unit tests after the fact instead.

I’m going to be honest, I love the idea of TDD but have I ever actually been able to do it? No. Have competent developers been able to do it successfully? Yes, of course. Don’t know anything about TDD? You’re in luck! Click here for an introduction (don’t worry though, THIS post is not going to be about TDD anyway, so you can also keep reading).

But one thing we can all agree on is that testing is pretty important. Testing has evolved over the years though and there are a million-and-one ways to test your code, but one of the most difficult and frustrating things to test, from experience, is database code.

gilmore girls shot of cynicism GIF

Some people argue that the days of testing, indeed, the days of stored procedures themselves are gone and that everything we do in databases should be tested using a combination of different logic and scripting languages like Python or PowerShell… but we’re not quite there yet, are we?

Fortunately though we’re not alone in this endeavor, we have access to one of the best ways to test T-SQL code: tsqlt. You can read more about tsql at the site here but in short – we have WAYS to test your SQL Server* code. The only problem is, when you’re using a migrations approach… how?

*There are also many ways to unit test code from other RDBMS’ of course, like utPLSQL for Oracle Database or pgTAP for PostgreSQL – would this method work for those? Maybe! Try adapting the method below and let me know how you get on!

I’ve already talked about how implementing tests is easier for state based database source control in a previous post because we can easily filter tests out when deploying to later stage environments, however with migrations this can be a real pain because you have to effectively work on tests like you would any normal database changes, and maybe even check them in at the same time – so ultimately, they should be managed in the same way as database schema migrations… but we can’t filter them out of migrations or easily pick and choose what migrations get run against test and Prod, without a whole lot of manual intervention.

Basically. It’s a mess.

mess fail GIF

But during my last post about Flyway I was inspired. This simple and easy to use technology just seems to make things really easy and seemingly has an option for EVERYTHING, so the question I started asking myself was: “How hard would it be to adapt this pipeline to add unit tests?” and actually although there were complications, it was still easier than I thought it would be! Here’s how you can get up and running with the tSQLt framework and Flyway migrations.

1 – Download the scripts to create the tSQLt framework and tests from the site

Ok this was the easiest step of them all, largely because in the zip file you download from the tsqlt website all you have is a set of scripts, first needed to enable CLR and the second to install the tsqlt framework:

As part of my previous pipeline I’m actually using Azure SQL Database as my development environment, where RECONFIGURE is not a supported keyword and where we don’t need to run the CLR script anyway, so all I needed was the tSQLt.class.sql file.

The good thing about this is that we can copy it across into a migration and have this as our base test class migration, and then any tests we write on top of it will just extend it – so as long as we remember to update it _fairly_ frequently with any new tsqlt update, we should be fine! (Flyway won’t throw an error because these are non persistent build objects, so no awkward checksum violations to worry about!)

2 – Adapt the folder structure in the repository for tests

I added 2 new folders to my _Migrations top level folder, a Schema_Migrations folder and a Test_Migrations folder. When you pass Flyway a location for migrations, it will recursively scan folders in that location looking for migrations to run in order. I copied the migrations I had previously into the Schema Migrations folder and then my new tSQLt creating migration into the Test Migrations folder. This allows them to be easily coupled by developers, whether you’re writing unit tests or practicing TDD:

You’ll have noticed I called my base testing migration V900__ – this is because I do still want complete separation and if we have a V5 migration in schema migrations and a V5 testing migration, we’re going to have some problems.

3 – Add a callback to handle removal of the objects

As I was putting this together, I noticed that I could use flyway migrate to run the tSQLt framework against my Dev database, but every time I tried to then flyway clean that database I got a very nasty error stating that the tSQLt assembly could not be removed because of dependent objects.

Flyway does not handle complex dependencies very well unfortunately, that’s where you’d use an industry leading comparison tool like SQL Compare so, with some advise from teh wonderful Flyway team, I set to work on a callback. A callback is how you can hook into Flyway’s own processes, telling it to do something before, during or after certain commands. In my case we were going to remove all of the tSQLt objects prior to running Flyway clean to remove the rest of the schema. To make it future proof (in case objects are added or removed from the tSQLt framework), I wrote a couple of cursors to go through the different objects that were dependent on the assembly and remove them, rather than generating a script I know to have all of the tSQLt objects in right now. You can find the code for the callback in my GitHub here, you are welcome to it!

Animated GIF

All you have to do is name it beforeClean.sql and ensure it is in the directory with your other sql migrations so that it will pick this up and run it – I put it in my Test_Migrations folder, because I only want it to run this callback when cleaning the build DB, as this is the only place we’re utilizing automated unit tests… for now!

4 – Update the Azure DevOps pipeline

I’ve got my callback, I’ve got my tSQLt migration and the folder structure is all correct and is pushed to Azure DevOps but naturally it is breaking the build *sad* but fortunately all we now have to do is update the YAML pipeline file:

trigger:
- master

pool:
  vmImage: 'ubuntu-latest'

steps:
- task: DockerInstaller@0
  inputs:
    dockerVersion: '17.09.0-ce'
  displayName: 'Install Docker'

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(FLYWAY_LOCATIONS)/Test_Migrations:/flyway/sql -v $(FLYWAY_CONFIG_FILES):/flyway/conf flyway/flyway clean -enterprise
  displayName: 'Clean build schema'

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(FLYWAY_LOCATIONS)/Schema_Migrations:/flyway/sql -v $(FLYWAY_CONFIG_FILES):/flyway/conf flyway/flyway migrate -enterprise
  displayName: 'Run flyway for schema'

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(FLYWAY_LOCATIONS)/Test_migrations:/flyway/sql -v $(FLYWAY_CONFIG_FILES):/flyway/conf flyway/flyway migrate -enterprise
  displayName: 'Run flyway for tSQLt'

You will notice a couple of important things that I have highlighted above:

  1. I’m cleaning the build schema using the Test_Migrations repository – this is because that is where my callback is and I need that to run before the clean otherwise it will fail due to the tSQLt assembly issue (line 17)
  2. I am running the migrate for the tests and the schema separately in the file, instead of just calling flyway to recursively run everything in the _Migrations folder. This is because I want them to be 2 separate steps, in case I need to modify or remove either one of them, or insert other steps in between and so that I can see the testing output in a separate stage of the CI pipeline (lines 23 and 29).

Caveat: As a result of (Option 2) running the 2 processes separately, it means running Flyway twice but specifying the Schema_Build and Test_Build folders in the YAML as being mapped to Flyway’s sql directory (lines 16 and 22 in the file above) but the problem this causes is that the second time Flyway runs, when it recursively scans the Test_Migrations folder it will not find the migrations that are present in the Flyway_Schema_History table, resulting in an error as Flyway is unable to find and resolve the migrations locally.

The way to fix this though is pretty simple – you find the line in the Flyway Config file that says “IgnoreMissingMigrations” which will allow it to easily continue. We wouldn’t have to worry about this setting though, if we were just recursively looking to migrate the Schema and Test migrations in the same step (but I’m a control freak tee-hee).

Now, once committed this all runs really successfully. Velvety smooth one might even say… but we’re not actually testing anything yet.

5 – Add some tests!

I’ve added a single tSQLt test to my repository (also available at the same GitHub link), it was originally created by George Mastros and is part of the SQLCop analysis tests – checking if I have any user procedures named “SP_”, as we know that is bad practice – and I have wrapped it up in a new tSQLt test class ready to run.

You’ll notice I also have a V999.9__ migration in the folder too, the purpose of this was to ‘top and tail’ the migrations; first have a script to set up tSQLt that could be easily maintained in isolation and then end with a script that lets me do just 1 thing: execute all of the tests. You can do this by simply executing:

EXEC tSQLt.RunAll

and we should be able to capture this output in the relevant stage of the pipeline.

Some of you may be asking why I chose to have the run unit tests as part of the setting up of the testing objects – this was because I had 2 options:

  1. I’m already executing scripts against the DB with Flyway, I may as well just carry on!
  2. The only other way I could think to do it was via a PowerShell script or run SQL job in Azure DevOps but the 2 plugins I tried fell over because I was using a Ubuntu machine for the build.

So naturally being the simple person I am, I opted for 1! But you could easily go for the second if you prefer!

6 – Test, Test, Test

Once you’ve handled the setup, got the callback in place (and also followed the steps from the last blog post to get this set up in the first place!) you should be able to commit it all these changes and have a build that runs, installs tSQLt and then runs your tests:

I realize there are a lot of “Warnings” in there, but that is just Azure DevOps capturing the output, the real part of this we’re interested in is lines 31-40 and if we clean up the warnings a little you’ll get:

+----------------------+
|Test Execution Summary|
+----------------------+
|No|Test Case Name|Dur(ms)|Result |
+--+---------------------------------------+-------+-------+ 
|1 |[somenewclass].[testProceduresNamedSP_]|144|Success|
------------------------------------------------------------
Test Case Summary: 
1 test case(s) executed, 1 succeeded, 0 failed, 0 errored. 
------------------------------------------------------------------

But if I introduce a migration to Flyway with a new Repeatable Migration that creates a stored procedure named SP_SomeNewProc…

+----------------------+
|Test Execution Summary|
+----------------------+
|No|Test Case Name|Dur(ms)|Result |
+--+---------------------------------------+-------+-------+ 
|1 |[somenewclass].[testProceduresNamedSP_]|184|Failure|
------------------------------------------------------------
Test Case Summary: 
1 test case(s) executed, 0 succeeded, 1 failed, 0 errored. 
------------------------------------------------------------------

It even tells us the name of the offending sproc:

All I have to do now is make the corresponding change to remove SP_ in dev against a bug fix branch, push it, create a PR, approve and merge it in and then boom, the build is right as rain again:

Thus bringing us back into line with standard acceptable practice, preventing us from delivering poor coding standards later in the pipeline and ensuring that we test our code before deploying.

Conclusion

Just because you adopt a more agile, migrations based method of database development and deployment, doesn’t mean that you have to give up on automated testing during Continuous Integration, and you can easily apply these same principles to any pipeline. With just a couple of tweaks you can easily have a fully automated Flyway pipeline (even xRDBMS) and incorporate Unit Tests too!

I will be running for the Ridge, will you?

“Until one has loved an animal, a part of one’s soul remains unawakened.”
Anatole France

TL;DR: If you love to walk, jog or run – sign up to a fantastic cause here to support a shelter in need!

The Full Story: This is going to be a reasonably short post, because I’m hoping if you’ve made it this far, you’ll read to the end.

This August, I will be “running for the Ridge“. This is an event taking place (virtually of course) where one can run, jog or walk any distance you like, as long as it is above 5km (3.11 miles) for Adults or 2km (1.25 miles) for ages 12 and below, wherever you like, anytime throughout the months of August and September.

For the last 2 months, I have aspired to be a healthier version of myself – I have managed to adopt healthier eating habits; removing a lot of the processed sugars and oils I was eating, not drinking alcohol during the week, always exercising at least once a day etc. and as a result I have managed to lose just over 12kg (26.5 lbs) and I feel so much better for it.

Running has been an instrumental part of the improvement in my quality of life and now I’m happy to be supporting something close to my heart with it.

So why this? What makes “running for the Ridge” special?

Well, because it’s in aid of Jacob’s Ridge – an animal shelter in Spain that, thanks to the impact of one nasty global pandemic, is sorely in need of donations in order to stay open and safeguard the well-being of the animals that it looks after, a cause deserving of support all in itself.

So that’s why this is special – because you can directly do something to:

  • Positively impact the lives of rescued animals
  • Positively impact your own health and mental well-being (in the form of exercise)
  • Do something nice with friends or family (at a responsible social distance where necessary)

To me, this just sounds like the easiest decision ever; run/jog/walk a distance of my own choosing, on my own comfortable running route and animals benefit from that? Amazing.

You will even get an awesome sustainable wooden medal for participating! And the cost? 12 GBP per person. That’s it. You can even get sponsored by friends, family and/or colleagues too if you want to contribute more!

So that’s it. That’s all I wanted to say. I’ll be running for the Ridge this August and you know – if you have a spare couple hours or you enjoy walking/jogging/running alone or in a group, think about putting it to good use and let’s get out, have some fun and do some good!

You can sign up here: https://www.jacobsridge.com/product-page/run-for-the-ridge – just sign up on the site and they’ll send you all the instructions direct to your email.

If you do sign up, tweet me @PlantBasedSQL with your entry number and I’ll give it a big ol’ like! I’m number 21 🙂

xRDBMS Database Continuous Integration with Flyway, Azure DevOps and Docker… the simple way.

“Some people try to make everything complicated, be the person who tries to make everything simple.”
Dave Waters

Simplicity is in my blood. That’s not to say I am ‘simple’ in the sense I cannot grasp more than the most basic concepts, but more that I am likely to grasp more complex problems and solutions when they are phrased in simple ways.

This stems from my love of teaching others (on the rare occasion it falls to me to do so), where I find the moment that everything just ‘clicks’ and the realization comes over them to be possibly one of the most satisfying moments one can enjoy in life.

shocked star trek GIF

Now recently I’ve been enjoying getting my head around Flyway – an open source JDBC based migrations tool that brings the power of schema versioning and deployments together with the agility that developers need to focus on innovation in Development. There’s something about Flyway that just… ‘clicks’.

It doesn’t really matter what relational database you’re using; MySQL, IBM DB2, even SAP HANA! You can achieve at least the core tenants of database DevOps with this neat and simple little command line tool – there’s not even an installer, you just have to unzip!

Now I’ve had a lot of fun working with Flyway so far and, thanks to a few people (Kendra, Julia – i’m looking at you both!) I have been able to wrap my head around it to, I would say, a fair standard. Caveat on that – being a pure SQL person please don’t ask me about Java based migrations, I’m not quite there yet!! But there is one thing that I kept asking myself:

“When I’m talking to colleagues and customers about Database DevOps, I’m always talking about the benefits of continuous integration; building the database from scratch to ensure that everything builds and validates…” etc. etc. so why haven’t I really come across this with Flyway yet?

think tom hanks GIF by The Late Show With Stephen Colbert

Probably for a few reasons. You can include Flyway as a plugin in your Maven and Gradle configurations, so people writing java projects already get that benefit. It can easily form part Flyway itself by virtue is simply small incremental scripts and developers can go backwards and forwards however and as many times as they like with the Flyway Migrate, Undo and Clean commands, so is there really a need for a build? And most importantly, Flyway’s API just allows you to build it in. So naturally you’re building WITH the application.

But naturally when you’re putting your code with other people’s code, things have to be tested and verified, and I like to do this in isolation too – especially for databases that are decoupled from the application, or if you have a number of micro-service style databases you’d want to test all in parallel etc. it’s a great way to shift left. So I started asking myself if there was some way I could implement a CI build using Flyway in Azure DevOps, like I would any of the other database tooling I use on a regular basis? Below you’ll find the product of my tinkering, and a whole heap of help from Julia and Kendra, without whom I would still be figuring out what Baseline does!

Option 1) The simplest option – cmdline

Flyway can be called via the command line and it doesn’t get more simple than that.

You can pass any number of arguments and switches to Flyways command line, including specifying what config files it’s going to be using – which means that all you have to do, is unzip the Flyway components on a dedicated build server (VM or on-prem) and then, after refreshing the migrations available, invoke the command line using Azure DevOps pipelines (or another CI tool) to run Flyway with the commands against a database on the build server (or somewhere accessible to the build server) and Bingo!

No Idea Build GIF by Rooster Teeth

And that’s all there is to it! You get to verify that all of the migrations up to the very latest in your VCS will run, and even if you don’t have the VERY base version as a baseline migration, you can still start with a copy of the database – you could even use a Clone for that!

But yes, this does require somewhere for Flyway to exist prior to us running with our migrations… wouldn’t it be even easier if we could do it without even having to unzip Flyway first?

Option 2) Also simple, but very cool! Flyway with Docker

Did you know that Flyway has it’s own docker image? No? Well it does!* Not only that but we can map our own version controlled Migration scripts and Config files to the container so that, if it can point at a database, you sure as heck know it’s going to migrate to it!

*Not sure what the heck all of this Docker/Container stuff is? You’re not alone! Check out this great video on all things containers from The Simple Engineer!

This was the method I tried, and it all started with putting a migration into Version Control. Much like I did for my post on using SQL Change Automation with Azure SQL DB – I set up a repo in Azure DevOps, cloned it down to my local machine and I added a folder for the migrations:

Into this I proceeded to add my base script for creating the DMDatabase (the database I use for EVERYTHING, for which you can find the scripts here):

Once I had included my migration I did the standard

Git add .
Git commit -m "Here is some code"
Git push

and I had a basis from which to work.

Next step then was making sure I had a database to work with. Now the beauty of Flyway means that it can easily support 20+ RDBMS’ so I was like a child at a candy store! I didn’t know what to pick!

For pure ease and again, simplicity, I went for good ol’ SQL Server – or to be precise, I created an Azure SQL Database (at the basic tier too so it’s only costing £3 per month!):

Now here’s where it gets customizable. You don’t NEED to actually even pass in a whole config file to this process. Because the Flyway container is going to spin up everything that would come with an install of Flyway, you can pass it switches to override the default behavior specified in the config file. You can adapt this either by hard-coding strings or by using Environment Variables alongside the native switches – this means you could pass in everything you might need securely through Azure Pipeline’s own methods.

I, on the other hand, was incredibly lazy and decided to use the same config file I use for my Dev environment, but I swapped out the JDBC connection to instead be my Build database:

I think saved this new conf file in my local repo under a folder named Build Configuration – in case I want to add any logic later on to include in the build (like the tSQLt framework and tests! Hint Hint!)

This means that I would only need to specify 2 things as variables, the location of my SQL migrations, and the config file. So the next challenge was getting the docker container up and running, which fortunately it’s very easy to do in Azure Pipelines, here was the entirety of the YAML to run Flyway in a container (and do nothing with it yet):

trigger:
- master

pool:
  vmImage: 'ubuntu-latest'

steps:
- task: DockerInstaller@0
  inputs:
    dockerVersion: '17.09.0-ce'
  displayName: 'Install Docker'

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run flyway/flyway -v
  displayName: 'Run Flyway'

So, on any changes to the main branch we’ll be spinning up a Linux VM, grabbing Docker and firing up the Flyway container. That’s it. Simple.

So now I just have to pass in my config file, which is already in my ‘build config’ folder, and my migrations which are in my VCS root. To do this it was a case of mapping where Azure DevOps stores the files from Git during the build to the containers own mount location in which it expects to find the relevant conf and sql files. Fortunately Flyway and Docker have some pretty snazzy and super clear documentation on this – so it was a case of using:

-v [my sql files in vcs]:/flyway/sql

as part of the run – though I had to ensure I also cleaned the build environment first, otherwise it would just be like deploying to a regular database, and we want to make sure we can build from the ground up every single time! This lead to me having the following environment variables:

As, rather helpfully, all of our files from Git are copied to the working directory during the build and we can use the environment variable $(Build.Repository.LocalPath) to grab them! This lead to me updating my YAML to actually do some Flyway running when we spin up the container!

trigger:
- master

pool:
  vmImage: 'ubuntu-latest'

steps:
- task: DockerInstaller@0
  inputs:
    dockerVersion: '17.09.0-ce'
  displayName: 'Install Docker'

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(FLYWAY_LOCATIONS):/flyway/sql -v $(FLYWAY_CONFIG_FILES):/flyway/conf flyway/flyway clean -enterprise
  displayName: 'Clean build schema'

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(FLYWAY_LOCATIONS):/flyway/sql -v $(FLYWAY_CONFIG_FILES):/flyway/conf flyway/flyway migrate -enterprise
  displayName: 'Run flyway for schema'

Effectively, this will spin up the VM in ADO, download and install Docker, fire up the Flyway container and then 1) clean the target schema (my Azure SQL DB in this case) and 2) then migrate all of the migrations scripts in the repo up to the latest version – and this all seemed to work great!*

*Note: I have an enterprise Flyway licenses which enables loads of great features and support, different version comparisons can be found described here.

So now, whenever I add Flyway SQL migrations to my repo as part of a branch, I can create a PR, merge them back into Trunk and trigger an automatic build against my Flyway build DB in Azure SQL:

Conclusion

Getting up and running with Flyway is so very very easy, anyone can do it – it’s part of the beauty of the technology, but it turns out getting the build up and running too, when you’re not just embedding it directly within your application, is just as straightforward and it was a great learning curve for me!

The best part about this though – is that everything above can be achieved using pretty much any relational database management system you would like, either via the command line and a dedicated build server, or via the Docker container at build time. So get building!

ready lets go GIF

Classification by design: Including data cataloging in a DevOps process

“The most powerful tool we have as developers is automation.”
Scott Hanselman

It is no secret that I love to talk about data protection, specifically from the perspective of structured data. When we talk about database development practices, we often find ourselves talking about 3 things most often:

  • Source Control
  • Continuous Integration and Continuous Delivery/Deployment (CI/CD)
  • Estate Monitoring

Some people refer to this as “DataOps“, others refer to it as “DevDataOps” but in reality, it’s all DevOps guys. This may be an unpopular opinion (and if it clashes with yours please forgive me, it’s just my opinion) but just because a certain niche area hasn’t been specifically called out within a subset of DevOps doesn’t mean you have to invent your own term for it!

Now this leads me on to DevSecOps, or as I like to call it… More secure DevOps.

rimshot GIF

No but seriously this is a slightly different case – DevSecOps is like DevOps but fortified with security from the ground up. There’s a fantastic article and diagram of this on Plutora from Mark Robinson of how this looks (below) and if you haven’t read his article I would definitely go and give it a read!

DevSecOps Diagram

Good DevOps practice is a combination of different things working together, bringing the right mentality, the principles, processes and amazing tools at our disposal like automation but this all includes security from the ground up too. DevOps is about putting those principles and practices in place to strengthen the pipeline, so why don’t we treat security in the same way?

Take, for example, 3 pieces of legislation that have been very much in the spotlight:

The controller shall implement appropriate technical and organisational measures for ensuring that, by default, only personal data which are necessary for each specific purpose of the processing are processed.
GDPR (Europe) Art. 25 “Data protection by design and by default”

Processing agents shall adopt security, technical and administrative
measures
able to protect personal data from unauthorized accesses and accidental or unlawful situations of destruction, loss, alteration, communication or any type of improper or unlawful processing.

– LGPD (Brazil) Chapter VII, Art. 46 “Security and Secrecy of Data”

“A Controller or Processor is required to implement appropriate technical and organisational measures to demonstrate that Processing is performed in accordance with this Law…”
DIFC LAW NO. 5 OF 2020 (Dubai) Part 2D, Art. 14 (2) “Accountability and notification”

There’s a common running theme here and although lots of global legislation will either allude to, or directly tell you ways you can be compliant and what some of these “organizational” and “technical” measures are, it’s still pretty blurry.

How do we know what we can do? How do we know what “default” and “design” mean in this context? Well, we build it into the DevOps process.

Now I could sit here forever and talk about why transforming your database development, deployment and provisioning processes allows us to be more secure, but that’s a lot of material and it might have to come in chunks! So what we’re going to focus on today is as the title suggests: Data Classification and Cataloging.

Why is Cataloging important?

Cataloging structured data is incredibly important because it can be one of the first steps we take to securing sensitive Personally Identifiable Information (PII) or Protected Health Information (PHI) wherever it exists across our database environments. It allows us to make strengthened, contextual decisions about the data we hold including how we treat it in pre-Production, how long we retain it for and which systems and processes consume it.

But the most important part of this is simply: it tells us where the risk is.

Read through any of the most recent data protection laws and you will notice that a few things come up quite a lot including “Data Protection Impact Assessment“, or DPIA. Effectively if you can assess the risk of processing activities you can more readily answer the data protection questions and challenges you may face.

Knowing where your data resides can be the first step to helping you assess this risk, and to more readily answer your own data questions. If you want to read more about Cataloging specifically and why it is useful, you can read more about it on my previous blog here.

Where does Cataloging fit into DevOps?

This one is simple to answer. Once you have fully classified your entire estate, you’re not done. No, if you’re a development house or indeed even a single developer – if you are making any schema changes to the tables holding that sensitive data, you’re never done.

The reason for this is that Cataloging is an evergreen activity – if you update tables by removing columns, adding columns, splitting tables, adding tables… anything! Well then you need to be ready to make sure that you are:

a) Prepared and equipped with knowledge of the tables you’re working on and if this is a high risk activity.

b) Updating classification information to reflect the new “truth”, i.e. if you’re adding a column that will collect people’s Twitter handles, then that column should be classified as sensitive, and this should be reflected the moment it is deployed to Production.

So it is important to have the correct people working on this, with the right knowledge, preparation and processes and using the correct tools ensuring that those updates are persisted properly and securely through your deployment pipelines.

Huh… people, processes and tools… That sounds familiar!

The Process: SQL Data Catalog, SQL Change Automation and Azure DevOps

For this little experiment of mine I used Redgate’s SQL Change Automation (Migrations First approach in SQL Server Management Studio) and SQL Data Catalog to both develop & deploy and classify/categorize respectively, and for simple version control and orchestration of this pipeline I opted for Azure DevOps (with SQL Change Automation CI/CD plugins):

NOTE: Heads up, all of the code I used for this can be found right here in my GitHub, feel free to have at it!

Step 1) Setup SQL Change Automation project with an Azure DevOps Git Repo and then create the YAML file to build it, and create a deployment process to Acceptance and the Production.

Ok – that’s a bit of a mouthful and a massive ask. There’s a lot of things there you have to be familiar with, but we don’t have time to go into right here. BUT fortunately if you checkout Redgate University right here, my blog post on using Change Automation with Azure DB and Redgate’s Product Learning section, you’ll be a Database DevOps ‘Whizz-Kid’ in no time!

Fast forward a little and I had my example databases, VCS and pipeline all up and running:

Step 2)The “Theory”: This is where things get interesting. So we have an example pipeline set up and we are able to completely deploy all the way through to “Production” so let’s talk theory.

In SQL Data Catalog I have covered both my Production and Acceptance Databases:

Now, in development we don’t make changes directly to Production, so why should Classification be any different? Now how you adapt the above code is up to you, feel free to split it, move it around, incorporate it into Pull Requests if you want to… But I’m going with a bit more of a simple situation.

Situation: Developer makes a change in Development, which gets committed, reviewed and merged o the main branch, resulting in a build and a deployment, in this case to Acceptance and then it is later deployed to Production.

Now, by Acceptance we should only have the “good work”, i.e. all of our testing is shifted left within DevOps so Acceptance is basically the last stop before Production. Therefore we should classify the work we have done on Acceptance, crucially, before it gets to Production and starts gathering sensitive data, and then copy this classification up on deployment.

Ideal: We should have no columns on Production that have not been classified.

Step 3) – In Practice: Fortunately it’s very easy to automate a lot of these steps with SQL Data Catalog utilizing it’s PowerShell cmdlets and REST API. The cmdlets are fully documented and very easy to use (docs here). This allows us to easily scan, classify and copy classifications up to other databases, but we’ll also need to do some checks and report if there are discrepancies, as part of the deployment pipeline that can be investigated.

  • Are there any columns on Acceptance that aren’t classified but have been deployed to Production? (failure to comply with process)
  • Are there any columns on Production that have not been classified? (classification drift)
  • Are there any unclassified columns on Acceptance that have not yet been deployed to Production (for pipeline hygiene purposes)

The other part of this ‘fun’ is reporting what has been changed in the same process. Now fortunately SQL Change Automation spits out a Changes.json file with its Release Artifacts and we can steal that away and find out how many tables have been created or changed in this release and report that back so we can correlate what has been done and what is missing:

So actually getting this up and running is just going to require 3 things:

  1. The PowerShell script from GitHub (or your own personalized variant) as a step in your production deployment
  2. Data Catalog available and pointed at Acceptance and Production (or your versions of these environments)
  3. Variables set in Azure DevOps to fill the gaps (e.g. Where is Data Catalog? Whats my PowerShell Auth token? What are my Acceptance and PROD DBs called? etc.)

3 is the last step there so you’ll need something like this to run the script:

  • DatabaseDeploymentJSON – where the JSON file will be with the latest changes in the Prod release
  • DataCatalogAuthToken – Your PowerShell Auth token from Settings in Data Catalog
  • DataCatalogUrl – The full URL to your Data Catalog installation, missing the “\” at the end (ending :15156)
  • ExportPath (Optional) – I specified the path for my Database Deployment Resources to save typing it out in the Redgate plugins
  • ProdDB / StageDB – As you would expect, the Production and Acceptance/Staging DBs you’re deploying to/from
  • ProdInstance / StageInstance – As above, except the instance the Database are located on

In the variables above the Instance and DB names are purely used within Data Catalog, so there’s no need to worry about anything happening to the actual databases themselves!

Once you’ve run through the deployment pipeline a couple of times and the changes.json file is being produced, you can go ahead and copy the script into an inline PowerShell script step in your release and you should find it will fire to life! I simulated an example by modifying my Contacts table and my Articles table, adding 1 column each and deploying both to Acceptance. I then classified just 1 of these in Acceptance in Data Catalog:

and then approved the deployment to Production and tada!

Ok you probably can’t make all that out, but it effectively says:

(Information) Table dbo.Articles was modified in this deployment.
(Information) Table dbo.Contacts was modified in this deployment.

That much we knew!

1 column(s) with classifications were discovered on VoiceOfTheDBA Acceptance that are not classified in VoiceOfTheDBA Production:
dbo.Articles.TestingPineapple

Excellent, we classified that one so it gets copied up and we can verify that in data catalog against Production:

and finally, we get a warning about Production now containing unclassified columns:

(Alert) The following columns have been discovered on VoiceOfTheDBA Production that require classification:

dbo.Contacts.TestingPineapple

You should classify these columns in VoiceOfTheDBA Acceptance prior to the next deployment.

Just as we expected. Success!

Happy Tom And Jerry GIF

Conclusion

Classification and categorization belongs as part of DevOps, if you expect the context for your business decisions around data to remain evergreen and informed then it cannot sit on the shoulders of one or two people to support it, and it cannot live in a manually updated Excel sheet or document.

By including it within the DevOps process, not only do you add an additional layer of security but you also make it an automated, team activity that can be audited, checked and easily kept up to date.

Is this DevSecOps? Well… not really no. Is this a more secure approach to Database DevOps? Absolutely! Happy DevOpsing!

My wife and I started a food blog!

“Veganism is not a “sacrifice.” It is a joy.”
Gary L Francione

This is just a short one in the grand schema of things, you probably all know that I enjoy blogging but a lot of my posts end up quite long and rambling. This one however will be quite concise.

My wife and I started a food blog.

There. I said it.

She had been saying to me for some time that she wanted a place to take all the vegan recipes we make and to put them up on the interwebs for people to find and make and get as much joy out of as we do, so we finally did it. It stalled a few times and then eventually, like 2 weeks ago, we finally got the first couple of posts up.

If you’re interested in the sort of plant-based cooking we do at home, you’re looking to go plant based or just looking to help the animals out, you can find our blog right here:

https://TheSnugVegans.com/

We look forward to dining with you!

The Snug Vegans

(SQL) Change ALL the Azure SQL Database Automation!

“But I can hardly sit still. I keep fidgeting, crossing one leg and then the other. I feel like I could throw off sparks, or break a window–maybe rearrange all the furniture.”
Raymond Carver

I understand that starting off a blog about Azure SQL Database with the above quote is a little weird, but honestly I’m _really_ excited about what I’m about to tell you.

***Note before starting: This blog post assumes you’re familiar with the concepts of Database Source Control, CI and CD, Azure SQL Database and pipelines within Azure DevOps, otherwise here be dragons.***

I am a huge fan of SQL Change Automation – mostly because of the migrations functionality. In my mind it represents an ideal workflow for making complex SQL Server database changes. If you’re not sure about the different models (State, Migrations, Hybrid), take a look at my blog post from last week here! But until this time it has had one thing that I could not easily do with it… Platform as a Service, Azure SQL DB.

Now don’t get me wrong, SQL Change Automation could easily deploy to Azure SQL Database but I had a problem. The words:

Chris how do we benefit from the migrations approach and put the shadow database and build db in Azure SQL too? We don’t have any local instances or VMs we can use for this and Dev, Test and Prod are all in PaaS!”

elicited this response:

cry crying GIF

But. No. Longer.

Now for those of you who don’t know, the _SHADOW_ database that SQL Change Automation creates is effectively a schema and static data only copy of your database, and it is dropped and built each time you verify, to ensure that all of the migrations run successfully and you can effectively check your work and shift the build left (!!), before you even check into source control.

This shadow database and the build database shared one thing in common and that was that you couldn’t build them in Azure SQL DB, which left 2 choices:

  • Use an instance of SQL Server. Developer for the shadow locally maybe; a VM in Azure or on-prem hosted instance for building
  • [For build specifically] Use localDB. Not advisable if your database contains any objects not supported by localDB because (juuuust in case you didn’t know) it is SQL Server Express.

But on May 12th 2020 (and I only found out about this like 2 weeks ago) the SQL Change Automation team at Redgate released version 4.2.20133 of the plugin for SSMS which included a few super cool things like additional Azure SQL support and the Custom Provisioning Scripts feature.*

excited excitement GIF

Now this is great because not only can we now easily create SQL Clones to be used as the development source (and I’ll blog about THAT a little later) but of course you can use it to use an Azure SQL DB for the shadow AND to use a persistent Azure SQL DB for the CI build as well!

Now unfortunately Kendra kinda beat me to the punch here and she produced a fabulous 3 part video series you can watch on using SQL Change Automation solely with Azure SQL DB, and you can view those here if you don’t want to see me try it out:

Getting set up

The first thing I did was make sure that I had all of the necessary environments to try this out – I created 3 Azure SQL Databases to mimic Development, Build and Production environments on 2 separate Prod and Non Prod Azure SQL Servers. I ran the DMDatabase prep scripts (you can find these here) to setup Dev_Chris and Production, but left BuildDB empty.

Next It was time to create my project, so I hopped over into Azure DevOps and created a new project, initialized it with a README and then Cloned it down onto my local environment:

Everything was ready to go so it was time to create my project!

Setting up SQL Change Automation in SSMS

*cough* or if you’re me, update it first because you’re on a REALLY old version *cough*

Then I hit “Create a New Project” and it allowed me to just specify the connection string to the Dev Azure SQL DB and the project location was the checked-out local repo:

Didn’t change any of the options because I’m a rebel and I didn’t feel like filtering anything out! But of course now comes the fun bit… the baseline. I chose my production Azure SQL DB as it’s my only upstream DB at this point, and it’s time to hit “Create Project”.

…and Huzzah! It’s worked and we’re all good!

excited andrew garfield GIF by The Academy Awards

Now… that’s actually not the best bit! The reason why Andrew there is clapping so hard? Well that little piece of magic has happened in the background! A Shadow database has actually been created for me against my azure server automatically! This is done by using the connection string that is used for dev!

Now… one thing to check, and I didn’t think to do this, but you can specify the connection string in the SQL Change Automation user file but I just left mine for a bit not realizing it created an Azure SQL DB for the Shadow that was CONSIDERABLY higher tier than my dev environment (bye bye Azure credit!), but fortunately I was able to scale it down quickly to basic and that has stuck, but be warned!

So I did what all ‘good devs’ would do now… I committed and pushed my initial commit directly to my main branch! (Don’t tell my boss!)

and safely sat my Database in Azure DevOps:

Setting up the build and deployment stages

This bit was actually just as easy. I used to hate YAML but thanks to a certain (wonderful) Alex Yates I jumped in anyway and it turned out to be just fine!

I created a new basic YAML file within Azure DevOps (and used the assistant to just auto populate the Redgate defaults, if you don’t know YAML or what it can do already, there’s a really good MS article here) and committed it to the main branch again (whoopsie) and the only component was the SQL Change Automation plugin I pulled in from the Azure DevOps marketplace, and I configured the build to target my “nonprod” server and the Build DB I had created previously.

On saving and running the pipeline succeeded!

All that was left to do was to create a Release Pipeline. So naturally, I jumped straight in and created a new pipeline, and I started with an empty job and called it Production*note* make sure you also choose your Build artifact before configuring your release stage too by clicking the Add an Artifact option!:

I added the SQL Change Automation: Release step to the agent job (note because this is all hosted, I’m using an Azure DevOps hosted agent to do this step):

Now you’ll need to add 2 stages (both the SQL Change Automation: Release plugin) at this point, a “Create Release” and a “Deploy from Database Release Artifact” because one will look at the target and figure everything out for you, and you’ll be able to review exactly what will be deployed, and the other will actually _do_ the deployment:

From here you just have to specify the options available, like in this wonderful walk-through here from the fabulous Chris Kerswell of DBAle fame! For me, this was simply targeting my Production Azure SQL Database.

You’ll definitely want to use the project variables to pick up the right package, and also leave the export path blank in both steps for now:

You can Clone the step by right clicking instead if you want to which will preserve all the connections you’ve already provided! Then once it’s all pointed at the right place, save and queue the release!

And of course, we were successful:

and then finally with a couple of triggers set to automatically build and deploy I made a change to my Contacts table in my Azure SQL Dev DB and a few minutes later, thanks to Azure DevOps and Redgate SQL Change Automation the very same change appeared in Production, with no reliance on anything other than Azure SQL DB and SQL Change Automation:

Before the DevOps process on Dev, ready for a migration to be generated
After: Automatic post-build deployment of the new column to the Production Azure SQL Database

Conclusion

If you have all of your databases in Azure SQL Database**, fear not because SQL Change Automation to the rescue! You can very easily set up and configure a pipeline in Azure DevOps or indeed any pipeline of your choice, but it’s never been easier to persist development changes all the way through to Production in a low risk, incremental, “DevOps” way!

—NOTES—

*An important word from the release notes: Note that it is still generally recommended to locate the shadow database locally where possible as that will usually result in a faster database connection. The default CreateDatabase.sql and DropDatabase.sql scripts can be altered to improve performance or implement custom provisioning logic.

**If you have all of your Databases in Azure and you need them masked for Dev/Test too, check out this previous blog post in which I outlined how to do that using Azure DevOps too!

Which database source control model works best for me?

“Destiny is not a matter of chance; it is a matter of choice. It is not a thing to be waited for, it is a thing to be achieved.”
William Jennings Bryan

For many people, figuring out how to get their development database into source control is the first step to a robust, repeatable, automated (and exciting) database DevOps pipeline. This, coupled with exactly which technology (Azure DevOps, Github, GitLab, BitBucket… the list goes on) you’ll be using for Source Control (and later CI/CD) can make it quite overwhelming.

overwhelmed choices GIF

Now fortunately I’ve worked with a number of teams on setting up source control methodologies and some work better than others depending on how you want to work. Remember:

“DevOps is the union of people, process, and products to enable continuous delivery of value to our end users.”Donovan Brown

And the key there is equal parts, whilst technology has a part to play, it comes down to the teams; nurturing and feeding a positive mindset of collaboration and communication within the team and then defining which methodologies and processes work best for you.

Once you’ve got that down, pick the source control methodology that works best for you, and luckily there are 4 choices:

  • State-First
  • Migrations-First
  • Hybrid- or Optimized-Model
  • Other

Ok… maybe I lied about the 4 choices because other can encapsulate many many different options in itself. But, what I’m going to talk about below are the 3 primary options I see development teams adopt and how they fit into your teams culture.

If you’re already tired of reading then you’re in luck! I also talked about this same topic at Redgate Streamed on 28th May 2020 so if you follow that link you can “register” to watch on demand, I will tell you in your own ears! (As opposed to reading below) – I won’t tell you to just watch my session because you should DEFINITELY check out the sessions also given by Kendra, Grant, Ben and Frank which were… well:*

Lets Go Yes GIF by Music Choice
*(SOOOOOO Good!)

State-First Approach

The state-first approach is, as it would suggest: the state of each object within the database is captured by whatever tool you use, i.e. the script needed to CREATE that object, and it is written out into its own flat file (most often a .sql file) in version control. The actual structure of these files and folders can vary by technology but largely it will follow a logical structure and the bottom line will be a create script per object.

When an update is made to that object, a newer version of that same create script is generated and it is added as a newer version of that script in version control and that is the latest version of the database which we can then deploy. When using state-first we have no alters, only creates, so it will be necessary to do a comparison at a later stage to work out the difference, and by extension the update/alter script that will be needed to propagate changes to later stage environments.

The benefits of the State-First method include (but are not limited to):

  • A simple approach to get started with standardizing development practices: It’s aligned with the practices we already have in place on the application development side, where source control has been standard practice for years.
  • Easy on-boarding for teams in the ‘Shared development model’: When every developer is forced to share a single development environment it can be quite hard to ensure that developers are keeping work separate; most tools that enable you to work in this model allow you to ‘lock’ objects at the database level as you work on them, or who exactly made each change that might be committed.
  • Easier to roll-back to previous state: Rollbacks are a pain with databases, but there are times where they are necessary. Maintaining a full history of the state at any given time makes it easier for us to compare and rollback environments to a state that we know worked well.

The drawbacks though of the State-First method include (but are not limited to):

  • Not as easy to achieve small, incremental deployments: Because we’re reliant on the state of the database at each stage there is still a certain element of overhead that is attached to each deployment.
  • Upgrade script determined at a later stage: Lots of people like to know EXACTLY what changes will be deployed and HOW against target environments, but because of the above reason, we’re reliant on approving changes early on, but only truly seeing how it will be deployed later in the pipeline, which doesn’t give us the same reliability or peace of mind.
  • Not as easy to refactor complex table changes: The State-First method is “How did it look to begin with and how did it look at the end?” so it doesn’t take into account the nuanced steps that may have been involved, which can be problematic when you’re adding a NOT NULL column to a table that has existing data, so these sorts of complex changes might require additional pre- and post-deployment scripts.

Migrations-First Approach

The Migrations-First approach differs significantly from the State-First approach because, as it would suggest, it relies on migrations to identify the version of the schema across environments and they usually rely on guids, numbering conventions, checksums and others to keep track of the schema, normally within a log table of their own on the affected schema itself. The migrations often come in the form of .sql files that have been written or generated and there are lots of different types, but they can be boiled down often to the idea of Repeatable, Versioned and Undo Migrations (see here on the Flyway site for a more in depth summary of these types)

The migrations then, actually contain the changes as you would like them to go out; many believe that (after testing) the script they have written is as it should be deployed, and that is exactly what is then being run against each stage. Now naturally, you need to build ON something, if you have an existing database, so many technologies will offer some sort of baselining option, to understand what already exists and what the incremental migration scripts are deploying to.

The benefits of the Migrations-First method include (but are not limited to):

  • Enables small, frequent, incremental migrations and predictable deployments: Everything is just that tiny piece of work you did, specifically. That means that only what you need to go out will go out; only what was approved at Pull Request time. This gives us high confidence that we’re sending the right changes to Prod.
  • Ideal for environments with high up-time requirements: There’s no heavy state to check, we’re just migrating these tiny changes, which means there’s far less chance of causing huge overhead on Production at deployment time.
  • Ability to use your own custom standards and code for table changes in deployments: No script generation or the ability to edit generated scripts is one of the greatest capabilities of this model. For complex changes, the steps to achieve this we KNOW that work are included, and not only that, the scripts are commented and formatted and easy to understand with our company standard, making it easy to keep track of what has been deployed.

The drawbacks though of the Migrations-First method include (but are not limited to):

  • Not as easy to pick and choose changes to be deployed: If a developer has captured multiple changes within the same script, but we only want to deploy a subset of those changes, or we don’t want to deploy to a subset of those objects right now, then it’s really hard (almost impossible) to try and unpick these changes, this also makes testing certain changes in isolation tough!
  • Higher learning curve for teams: This method is neither as easy to adopt nor as intuitive as the State-First approach, which means developers need to get used to writing their migration scripts, ensuring they’re properly formatted, commented, tested, numbered and where necessary, the undo script for those changes. This results in a much higher ask for the team; the cost for gaining the predictability of deployments.
  • Harder to roll-back changes: _On those very same undo scripts then_ they have to be absolutely perfect. It’s still much harder to undo, especially if we’re trying to undo migration 5.0.1 when we’re already on 6.1.2, everything has to cascade neatly if you’re carrying out multiple undo’s and having a water tight undo strategy is hard to nail down.

The Hybrid / Optimized Model

This particular model is a rare one to find because it is not offered widely, but where it is achievable it can offer the benefits of both the State- and Migrations-First models.

As the name would suggest, it is a combination of the state and migrations approaches into a single Hybrid model; developers store the state of their database in source control, allowing them to easily rework their changes and commit multiple times to their working branch as they develop the “end goal”, and then from this same location once those changes are confirmed, pushed and ready to go, the relevant migrations are generated from the latest state.

Now this model can be adapted into lots of different workflows: developers can all generate their own migrations from their state and check them in together when they’re happy. This records a granular history of each change that was made and how it applies to each object, and is easy to work with, and then the migration contains just what needs to go out from all of that work. Another option would be having developers make the changes and check these into a DB State folder in source control, and then having more experienced developers or DBAs etc. generate the respective migrations from the state, knowing that they have a greater confidence in the SQL specific changes that are captured in the script. This is nice because it gives cause for another pair of eyes, which again gives greater confidence in what ultimately gets deployed.

The benefits of the Hybrid / Optimized method include (but are not limited to):

  • Full granular history around object changes on a state level, but with customization, flexibility and reliability of migration scripts: Know exactly what has changed, when and by whom, but don’t worry that you don’t know exactly what change will be deployed.
  • Separation of duties for Developers and Senior Team Leads / DBAs (who generates what / who has what specialty) and a lower learning curve for developers: Easy for developers to make changes quickly and easily without having to worry about the “nitty-gritty” and exactly what SQL will be needed. Gives DBAs and senior developers peace of mind that changes are ultimately adopted and improved by people who _know_ the database.
  • Easily extends existing state-first model where migrations are needed: State-First is a great choice 70% of the time but there ARE times where data migrations or complex changes are needed. This method includes these changes where needed, instead of relying on pre- and post-migration scripts, which run globally every single time.
  • Easier to pick and choose changes to go out: Because we can choose which changes to which objects are going out in the migration scripts, it’s easier for us to grab only the ones we want to push out each time, like an additional “cherry pick” layer within the development process.

The drawbacks though of the Hybrid / Optimized method include (but are not limited to):

  • Additional step added to the process can make it feel like red-tape / added work: In some cases teams may wish to make changes and get them out _fast_ as part of continuous deployment, and could be doing so hundreds of times per day. This model can get in the way of that because it adds an additional layer of dependency.
  • Could add some time to the overall development process for new changes: This is almost exactly the same as the above reason. More steps to include, more people to include, slightly less automation than we would like _perhaps_ so naturally time to deploy increases slightly (but arguably is offset by greater confidence in the change? I’ll let you decide!)
  • Duplicated schema model in Source Control repository: Some tools keep a copy of the schema in source control as reference for the migrations, others don’t. In either case, you’re maintaining two versions of a repository, which many say should be the single source of truth, if these two are even slightly out of sync, who are we going to believe? This model calls for discipline, as sloppiness can destroy all of the proposed benefits.

aaaaaaaaaaaaaaaaaand… breathe!

Slow Down Reaction GIF by True and the Rainbow Kingdom

Conclusion

There are lots of different models you can adopt for the source controlling of your database and changes, in this post I’ve outlined 3 (well… 2 and a half really) but whatever you’re looking to adopt, hopefully this will give you greater confidence in adopting the right one.

Have a wonderful week!

Azure DevOps Masking a.k.a “point, no click”

“[My] kids haven’t responded to my GDPR requests so I don’t think I’m legally allowed to tell them when dinner’s on the table.”
@mrdaveturner

Ah masking. You would have thought I’d be sick of it by now, no? No, fortunately now, more so than ever, I find myself answering question after question and tackling use-case after use-case. So when I was asked this week:

“Chris, is there a way for us to call Data Masker for SQL Server directly from Azure DevOps?”

I thought to myself, well that sounds easy enough… and it was! I know what you’re thinking, c’mon Chris, surely there is more to it? But no, it’s actually pretty straight forward!

I pointed them at the PowerShell module and cmdlets for SQL Provision and the Azure DevOps plugin to automate all of their Provisioning and Masking process, thinking all the while “pffft, they could have made this harder!” and then…

“No sorry Chris, is there a way for us to call JUST Data Masker for SQL Server directly from Azure DevOps?”

Ah! Now that’s an interesting one!

#1 Figure out where you want Data Masking to run in your process

This empty Azure deployment stage looks good enough for now! If you wanted to chain other processes either side of it, that’s cool too! Maybe you have your own provisioning process in place and you want to point Data Masker at it to sanitize it? Makes sense to me! For now I’m going to stick with a single agent job for simplicity.

#2 Figure out what is actually going to run Data Masker

Data Masker is a client install and as such will need to be installed on a *gasp* actual machine!

No but seriously, any server you have lying around, physical or VM will do the trick as long as it meets these requirements. Now this Server/VM will need to have an Azure DevOps agent on it already, which of course is the ideal candidate for being the “thing” that calls Data Masker – this could be the Staging/Non-Functional/Pre-Prod environment also of course, so you could copy down PROD and then immediately invoke masking.

#3 Call the command line from Azure DevOps

In your pipeline steps you can specify the calling of an executable on the machine where the agent resides. Fortunately Data Masker has a wonderful command line available that you can call, you can read all about it here: https://documentation.red-gate.com/dms/data-masker-help/general-topics/about-command-line-automation

The PARFILE you could of course dynamically replace with variables so that it only calls the relevant parameter file for that particular database as well, a nice benefit!

My PARFILE just simply looked like this:

It was calling a local Data Masker set “AzureFun” – now the thing to bear in mind is that Data Masker will run with the Windows authentication credentials that are being run as by the Azure DevOps agent, unless you specify otherwise. In this case because the Azure DevOps agent has the correct permissions to update the databases on this instance anyway I’m fine to use Windows Authentication:

Conclusion

It’s very easy to simply call the command line of Data Masker for SQL Server directly from Azure DevOps, does this same approach work from other CI/CD tools? If they can call executables on the target server then absolutely! So it’s very easily included in the process – you just have to think about where Data Masker is installed and what credentials you’re using for it!

Bonus Point – what about if it’s all Azure SQL Database?

You had to do it didn’t you, you had to say it!

“But Chris, now we know we can call this all from Azure DevOps, what if we wanted to mask and copy Azure SQL Databases into Dev/Test etc.?”

Well actually the good thing is, it’s also pretty similar! When you’re connecting Data Masker to an Azure SQL DB you only need to specify this in the connections in the controller. Again, authentication will likely have to be SQL Auth at this point, and you need to be in Cloud mode, and I’d recommend setting the connection timeout to 10s rather than the standard 5s, but it can still be called as normal from the PARFILE:

So the Data Masker element is reasonably straight forward – that’s the good news. But the thing you REALLY need to stop and think about is:

Where are our Dev and Test copies going to BE?

Option #1: If they’re going to be on VMs or local dev and test servers / developer machines then you could follow a similar approach to one I laid out in this blog post for Redgate in which you create a BACPAC file and split it out on premise before importing it and then provisioning from there. And you could use this code in my Github to achieve something very similar. Caveat: I am no PowerShell guru, who do you think I am? Rob Sewell? Chrissy LeMaire? No. Sadly not. So you can build your own logic around my code though, have at it, I don’t mind! ^_^

Option #2: Keeping everything in Azure. You can copy databases around in Azure and it seems to work pretty well! So I wrote this PowerShell (also in my GitHub for y’all) to effectively copy a PROD DB into the same resource group, mask it and then copy it across to a Dev/Test resource group, dropping the temp copy so as not to incur lots of extra Azure costs (this is just one of the methods I’ve seen people use, again it’s up to you!) – again, see the caveat in option #1 above for my statement on PowerShell! The good thing is, you can use the ‘&’ simply from PowerShell to call Data Masker’s command line.

Either of these options can be run from Azure DevOps also as part of your provisioning or working processes, but instead of including a call to the command line, you can run a fun PowerShell script instead:

Second Conclusion *sigh*

There are lots of ways to get what you need into Dev and Test, but these copies should be masked if they contain personal, identifying information. There are some methods above but there are plenty of others out there on the internet and if you’re not sure about getting started with data masking; try my post here – happy masking!

Reusable Schema Deployments with SQL Source Control

“That’s who you really like. The people you can think out loud in front of.”
John Green

Can we all take a moment to appreciate how awesome Kendra Little is? No really, go on over to her Twitter or something and remind her. Because not only is she a genius but she brings out the best in a lot of folk, and I don’t think she gets enough credit – so my quote above today is for her!

This is one of those times though, where we stumbled on an idea, and together we fleshed it out and thanks to her ingenuity and straight up desire to help we ended up with a full on video about it! Thank you Kendra! So if you don’t want to sit here and read about Reusable Schema Deployments, take a look at the video below instead where we cover everything from the key differences between State and Migrations, what a filter file is and how to use YAML in Azure DevOps!

For those of you who prefer a nice read, grab a coffee or tea and a biscuit (cookie) and read on!

The Problem

We find ourselves in the unenviable situation where we have a Production database that is delivered to customers to support different applications that we provide to them. When a new customer comes on board and chooses any number of our products, we then deliver them along with a copy of the database containing a set of objects that are specific to their setup.

Example. We produce 3 applications for our customers; Beep, Derp and Doink. In the database that we supply with these applications, we have corresponding schemas ‘Beep’, ‘Derp’ and ‘Doink’, as well as ‘dbo’ which holds a number of objects common across all instances of the database.

The question then is: “How do we deploy only the combinations of these schemas to the relevant customers AND make sure there is as little down time as possible?”

I mean, besides magic, of course!

shia labeouf magic GIF

Solution 1: Less Work, More Schema

There’s a reason why, when you buy a new ERP or CRM system for your company, many times you will receive ALL of the schema! Just bought in a new system to help manage your General Ledger, Accounts Payable and Accounts Receivable, and those are the only modules you’re going to use?

Whooooo-boy-howdy you better believe you’re going to get schema objects for Asset Management, Billing and Risk Management too!

The reason for this is that it is much easier to deliver. It is a single package that needs to be deployed everywhere and if the customer already has the relevant objects in the database then it is MUCH easier to just turn on corresponding application functionality that starts to populate and use those objects!

The problem is, if EVERY customer get’s EVERY object change across EVERY schema… well then it’d be a lot of changes and potentially some quite big changes that could impact our customers, perhaps unnecessarily. This could easily be an argument to be made for the migrations approach to source controlling and deploying changes, but that’s one for another day!

Solution 2: You get a filter, and you get a filter, EVERYBODY gets filters!

oprah lol GIF by Amy Poehler's Smart Girls

In the state based way of deploying we can actually use filter files (.scpf) which allow us to filter at the point of creating a database release artifact. This is a game changer because that means we can have the convenience of developing against a single source database in Dev, source controlling ALL object changes together and it’s only once we actually get to the point of deploying to the customer do we include the filter file in the Create Release Artifact step to include ONLY the necessary schema objects that are relevant to them.

Now this is also a great way of doing it because it means that everything in source control is still our single source of truth and we’re able to validate that everything builds together and we can run unit tests broadly against all objects etc. however it does also mean that we have to either maintain separate filter files for every customer, or for every combination of our Schemas that a customer could receive and update them as and when people add or remove certain applications. It also doesn’t give us any validation that THIS particular release artifact that has been created for that customer actually works independently from the rest of the schema objects and therefore we’re deploying something that hasn’t actually been tested in isolation first!

Finally, the secondary problem with this approach is that it is SLOOOOOOW. Like super slow. This is because the heaviest part of the state based database deployment process is the creation of the release artifact determining what changes should be included in the release that is going out of the door and this is being carried out and putting overhead independently on every. single. customer. Not fun.

Solution 3: Reduce, Reuse, Recycle.

If we take a step back from this problem and look at exactly what we’re trying to do and what we want to do. We want to deliver ONLY the necessary changes to a particular schema in the database that supports that specific application.

But this means that there is a commonality across customers – if for example we assume that we have 30 customers that have a variation of the Beep schema (I’m going to ignore dbo for now because everybody has that), they may also have Derp, or Doink or no other schemas, but the point is all 30 of those customers will require the exact same updates to their Beep schema full stop.

This means, if we can generate a single artifact once, that can be used for all 30 customers receiving this schema, or indeed ANY schema, then we can:

a) Reduce the amount of comparisons taking place to create release artifacts
b) Reuse and Recycle the same release artifacts for multiple customers
c) TEST these artifacts before they actually go out the door!

This is effectively achieved by adding an additional layer on top of the development and deployment process:

An additional step is introduced to produce a single reusable artifact prior to the final Prod deployment, Pre-Prod all receives the same package which contains every object regardless of schema, however when a Production release needs to go out the release artifact is built against the Beep database (in this case, which only has the Beep and dbo schemas) so the pain of the creation of the artifact actually sits outside of the customer environment AND is created only once, allowing us to now distribute that change to any customer who many require it to upgrade their Beep schema.

The same is done for each schema in turn which means we then deploy the fast reusable artifacts, and the only process change required is the step immediately before deploying to Production, almost like the independent databases are an exact mirror.

Don’t get me wrong there are challenges with this model as well.

What if we want to deploy a completely new database with a set number of schemas? Well. You may have to do an ad-hoc deployment or add an additional process to the pile which does that create for you!

What if we create versions 10.4.1, 10.4.2 but these only make it up to one of these Pre-Prod mirrors and then we want to push 10.4.3 to Customer Production environments? We will no longer receive the artifacts from .1 and .2, only .3! Here you will have to create the specific filtered artifacts ONLY prior to deploying to Production so that EVERY change is captured. In the video above I used a Golden DB which had every deployment on it and used this to test my schema specific deployments prior to deployment but it depends on what setup you want to adopt.

Conclusion

Filters are incredibly powerful and if you have subtle differences in hosted environments across your customers or even across your own DBs they can be an ingenious way of being able to keep all core objects AND key variations within Dev, version control and Pre-Production but then making sure that the target ONLY receives what it needs.

But be aware, subtle variations can snowball and you do have to be careful how you handle them as it is not very easy to scale to multiple customers. Fortunately in this scenario 1 schema was mapped to 1 application being delivered which makes it easy to determine who gets what, but the more differences you have, the harder they will be to continuously integrate and deliver.

And FINALLY (that’s right I’m going to stop talking), if you want to read more about this model from Kendra’s perspective, she also wrote about it! You can read her account here: https://littlekendra.com/2020/04/08/make-database-code-reusable-in-sql-source-control-with-deployment-filters/

Have a great week, stay safe and stay well!