“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” – Sir Arthur Conan Doyle, Sherlock Holmes
Can you tell I’m loving Flyway at the moment? Well I am. It’s JUST SO GOOD! Honestly there are so many things you can do with it! Don’t know what I’m talking about? Check out my posts on xRDBMS DevOps with Flyway and tSQLt unit tests with Flyway and you’ll see what I mean!
As a result of the above posts though I was asked a question that I had to think about for a little bit before having the best possible answer, how can we seed some testing data INTO the build database so that we can run some meaningful tests against it?
This makes perfect sense to me, but there’s also a few different ways to do this – so let’s go fly(way)!
1 – Test Data Migration Scripts
In my previous posts on Flyway (above) I talked about having an entirely separate build folder present within the repository, and a folder of test migrations alongside our schema migrations – I called these the Build_Config folder, (containing the build configuration file) and the Test_Migrations folder (unsurprisingly containing testing migrations) in the _Migrations location:
I was using the same build config for 2 purposes; 1) to build the schema migrations from the base version, by passing it the Schema_Migrations location dynamically and 2) then building the tSQLt framework and testing objects by passing it the Test_Migrations location dynamically.
This actually worked surprisingly well, but even beyond this – the same method can be repurposed, or added to, by augmenting your testing scripts and adding a data insertion task (as an additional script or group of scripts). In my folder, I can simply add a migration like this:
Because of course I like dogs.
and once pushed to the repository and the build has run we should be able to verify our testing data is present:
A bonus win for this step of course, is that where Devs have their own Flyway config files locally for their development databases they could also overwrite this behavior and point the testing and/or data scripts at their own database so they have some seed data to work with too!
2 – Add a data generation step to the pipeline
There are SO MANY technologies out on the inter-webs for generating data. SO MANY. Many of them also have a command line or PowerShell module that we can use to easily invoke them against a target, especially if that target is going to be persistent like my Flyway Azure SQL Build DBs!
There are numerous ways to invoke tools and applications and fortunately good CI/CD tools like Azure DevOps offer multiple ways to, for instance, run PowerShell or CLI steps from within the pipeline – so we could easily invoke SQL Data Generator on a VM or physical machine we have an Azure DevOps agent on – but this thinking also opens up the possibility of using something like Chocolatey to dynamically install the software on the Azure DevOps hosted pool VM during build (for the Redgate tools at the moment I suppose you’d need a Windows VM).
I will be writing a future blog post about this step because it sounds _very_ interesting, but I’m not sure yet what can be done specifically using Chocolatey or if I’ll have to look elsewhere, although I have read this post in the past (thanks Paul!) detailing limitations and a great workaround using Azure DevOps, so it’s likely that’ll be my first port of call!
Just to give you an idea of end result with SQL Data Generator specifically though:
3 – Use existing data, don’t generate
Ok this one is going to be controversial already, I can tell! Let’s all stay calm!
The best data to be tested is our data. What we have in Production is what will have these changes deployed to it… eventually! So shouldn’t we just test against that? Well. Maybe, maybe not depending on what is in there.
There’s a few methods to achieve this – my personal favorite would be to use a SQL Clone, spin that up on a build VM rather than using an Azure SQL DB, and we can have all the data in an instant. Of course if we hold any sensitive PII/PHI then we should ensure that is protected first!
Of course there are lots of other options, like restoring a backup or spinning up a container etc. and these can all just be a stage in the YAML file before invoking Flyway but the point is, if we use an existing copy of our Prod database from some source or another, it will have 2 things we really care about:
Data. Ready to go, ready to test, ready to give us the best possible insight into our changes.
The flyway_schema_history table. Instead of running EVERY migration we’ve ever written, which could take a while for a large team, we run only the latest migrations to check that they would deploy happily to the Production target.
To get this stage to work though, you would need to do a couple of things differently:
The build DB would have to be created from the clone/backup/other every time instead of simply cleaning the schema down.
You would need to remove the Flyway Clean step from the pipeline in my previous post, because it would otherwise drop all the tables (and then we wouldn’t have any data!)
By extension, this also makes the callback to remove the tSQLt objects void, so you can remove that too.
There are a lot of different ways to generate data, you can generate completely synthetic data, you can mask data or use Prod data, it’s up to you! Ultimately it will just for another part of your pipeline – just be careful of ordering! You don’t want to try generating data into a table that hasn’t been built yet.
Respect your YAML file and you’ll get schema, data and unit tests and this will lead to one thing. Greater insight, earlier.
“If you truly have faith in your convictions, then your convictions should be able to stand criticism and testing.” – DaShanne Stokes
Welcome fellow Test–Driven-Development enthusiasts… is what I would say if i actually ever did TDD and didn’t just, you know… write regular unit tests after the fact instead.
I’m going to be honest, I love the idea of TDD but have I ever actually been able to do it? No. Have competent developers been able to do it successfully? Yes, of course. Don’t know anything about TDD? You’re in luck! Click here for an introduction (don’t worry though, THIS post is not going to be about TDD anyway, so you can also keep reading).
But one thing we can all agree on is that testing is pretty important. Testing has evolved over the years though and there are a million-and-one ways to test your code, but one of the most difficult and frustrating things to test, from experience, is database code.
Some people argue that the days of testing, indeed, the days of stored procedures themselves are gone and that everything we do in databases should be tested using a combination of different logic and scripting languages like Python or PowerShell… but we’re not quite there yet, are we?
Fortunately though we’re not alone in this endeavor, we have access to one of the best ways to test T-SQL code: tsqlt. You can read more about tsql at the site here but in short – we have WAYS to test your SQL Server* code. The only problem is, when you’re using a migrations approach… how?
*There are also many ways to unit test code from other RDBMS’ of course, like utPLSQL for Oracle Database or pgTAP for PostgreSQL – would this method work for those? Maybe! Try adapting the method below and let me know how you get on!
I’ve already talked about how implementing tests is easier for state based database source control in a previous post because we can easily filter tests out when deploying to later stage environments, however with migrations this can be a real pain because you have to effectively work on tests like you would any normal database changes, and maybe even check them in at the same time – so ultimately, they should be managed in the same way as database schema migrations… but we can’t filter them out of migrations or easily pick and choose what migrations get run against test and Prod, without a whole lot of manual intervention.
Basically. It’s a mess.
But during my last post about Flyway I was inspired. This simple and easy to use technology just seems to make things really easy and seemingly has an option for EVERYTHING, so the question I started asking myself was: “How hard would it be to adapt this pipeline to add unit tests?” and actually although there were complications, it was still easier than I thought it would be! Here’s how you can get up and running with the tSQLt framework and Flyway migrations.
1 – Download the scripts to create the tSQLt framework and tests from the site
Ok this was the easiest step of them all, largely because in the zip file you download from the tsqlt website all you have is a set of scripts, first needed to enable CLR and the second to install the tsqlt framework:
As part of my previous pipeline I’m actually using Azure SQL Database as my development environment, where RECONFIGURE is not a supported keyword and where we don’t need to run the CLR script anyway, so all I needed was the tSQLt.class.sql file.
The good thing about this is that we can copy it across into a migration and have this as our base test class migration, and then any tests we write on top of it will just extend it – so as long as we remember to update it _fairly_ frequently with any new tsqlt update, we should be fine! (Flyway won’t throw an error because these are non persistent build objects, so no awkward checksum violations to worry about!)
2 – Adapt the folder structure in the repository for tests
I added 2 new folders to my _Migrations top level folder, a Schema_Migrations folder and a Test_Migrations folder. When you pass Flyway a location for migrations, it will recursively scan folders in that location looking for migrations to run in order. I copied the migrations I had previously into the Schema Migrations folder and then my new tSQLt creating migration into the Test Migrations folder. This allows them to be easily coupled by developers, whether you’re writing unit tests or practicing TDD:
You’ll have noticed I called my base testing migration V900__ – this is because I do still want complete separation and if we have a V5 migration in schema migrations and a V5 testing migration, we’re going to have some problems.
3 – Add a callback to handle removal of the objects
As I was putting this together, I noticed that I could use flyway migrate to run the tSQLt framework against my Dev database, but every time I tried to then flyway clean that database I got a very nasty error stating that the tSQLt assembly could not be removed because of dependent objects.
Flyway does not handle complex dependencies very well unfortunately, that’s where you’d use an industry leading comparison tool like SQL Compare so, with some advise from teh wonderful Flyway team, I set to work on a callback. A callback is how you can hook into Flyway’s own processes, telling it to do something before, during or after certain commands. In my case we were going to remove all of the tSQLt objects prior to running Flyway clean to remove the rest of the schema. To make it future proof (in case objects are added or removed from the tSQLt framework), I wrote a couple of cursors to go through the different objects that were dependent on the assembly and remove them, rather than generating a script I know to have all of the tSQLt objects in right now. You can find the code for the callback in my GitHub here, you are welcome to it!
All you have to do is name it beforeClean.sql and ensure it is in the directory with your other sql migrations so that it will pick this up and run it – I put it in my Test_Migrations folder, because I only want it to run this callback when cleaning the build DB, as this is the only place we’re utilizing automated unit tests… for now!
4 – Update the Azure DevOps pipeline
I’ve got my callback, I’ve got my tSQLt migration and the folder structure is all correct and is pushed to Azure DevOps but naturally it is breaking the build *sad* but fortunately all we now have to do is update the YAML pipeline file:
You will notice a couple of important things that I have highlighted above:
I’m cleaning the build schema using the Test_Migrations repository – this is because that is where my callback is and I need that to run before the clean otherwise it will fail due to the tSQLt assembly issue (line 17)
I am running the migrate for the tests and the schema separately in the file, instead of just calling flyway to recursively run everything in the _Migrations folder. This is because I want them to be 2 separate steps, in case I need to modify or remove either one of them, or insert other steps in between and so that I can see the testing output in a separate stage of the CI pipeline (lines 23 and 29).
Caveat: As a result of (Option 2) running the 2 processes separately, it means running Flyway twice but specifying the Schema_Build and Test_Build folders in the YAML as being mapped to Flyway’s sql directory (lines 16 and 22 in the file above) but the problem this causes is that the second time Flyway runs, when it recursively scans the Test_Migrations folder it will not find the migrations that are present in the Flyway_Schema_History table, resulting in an error as Flyway is unable to find and resolve the migrations locally.
The way to fix this though is pretty simple – you find the line in the Flyway Config file that says “IgnoreMissingMigrations” which will allow it to easily continue. We wouldn’t have to worry about this setting though, if we were just recursively looking to migrate the Schema and Test migrations in the same step (but I’m a control freak tee-hee).
Now, once committed this all runs really successfully. Velvety smooth one might even say… but we’re not actually testing anything yet.
5 – Add some tests!
I’ve added a single tSQLt test to my repository (also available at the same GitHub link), it was originally created by George Mastros and is part of the SQLCop analysis tests – checking if I have any user procedures named “SP_”, as we know that is bad practice – and I have wrapped it up in a new tSQLt test class ready to run.
You’ll notice I also have a V999.9__ migration in the folder too, the purpose of this was to ‘top and tail’ the migrations; first have a script to set up tSQLt that could be easily maintained in isolation and then end with a script that lets me do just 1 thing: execute all of the tests. You can do this by simply executing:
and we should be able to capture this output in the relevant stage of the pipeline.
Some of you may be asking why I chose to have the run unit tests as part of the setting up of the testing objects – this was because I had 2 options:
I’m already executing scripts against the DB with Flyway, I may as well just carry on!
The only other way I could think to do it was via a PowerShell script or run SQL job in Azure DevOps but the 2 plugins I tried fell over because I was using a Ubuntu machine for the build.
So naturally being the simple person I am, I opted for 1! But you could easily go for the second if you prefer!
6 – Test, Test, Test
Once you’ve handled the setup, got the callback in place (and also followed the steps from the last blog post to get this set up in the first place!) you should be able to commit it all these changes and have a build that runs, installs tSQLt and then runs your tests:
I realize there are a lot of “Warnings” in there, but that is just Azure DevOps capturing the output, the real part of this we’re interested in is lines 31-40 and if we clean up the warnings a little you’ll get:
|Test Execution Summary|
|No|Test Case Name|Dur(ms)|Result |
Test Case Summary:
1 test case(s) executed, 1 succeeded, 0 failed, 0 errored.
But if I introduce a migration to Flyway with a new Repeatable Migration that creates a stored procedure named SP_SomeNewProc…
|Test Execution Summary|
|No|Test Case Name|Dur(ms)|Result |
Test Case Summary:
1 test case(s) executed, 0 succeeded, 1 failed, 0 errored.
It even tells us the name of the offending sproc:
All I have to do now is make the corresponding change to remove SP_ in dev against a bug fix branch, push it, create a PR, approve and merge it in and then boom, the build is right as rain again:
Thus bringing us back into line with standard acceptable practice, preventing us from delivering poor coding standards later in the pipeline and ensuring that we test our code before deploying.
Just because you adopt a more agile, migrations based method of database development and deployment, doesn’t mean that you have to give up on automated testing during Continuous Integration, and you can easily apply these same principles to any pipeline. With just a couple of tweaks you can easily have a fully automated Flyway pipeline (even xRDBMS) and incorporate Unit Tests too!
“Some people try to make everything complicated, be the person who tries to make everything simple.” – Dave Waters
Simplicity is in my blood. That’s not to say I am ‘simple’ in the sense I cannot grasp more than the most basic concepts, but more that I am likely to grasp more complex problems and solutions when they are phrased in simple ways.
This stems from my love of teaching others (on the rare occasion it falls to me to do so), where I find the moment that everything just ‘clicks’ and the realization comes over them to be possibly one of the most satisfying moments one can enjoy in life.
Now recently I’ve been enjoying getting my head around Flyway – an open source JDBC based migrations tool that brings the power of schema versioning and deployments together with the agility that developers need to focus on innovation in Development. There’s something about Flyway that just… ‘clicks’.
It doesn’t really matter what relational database you’re using; MySQL, IBM DB2, even SAP HANA! You can achieve at least the core tenants of database DevOps with this neat and simple little command line tool – there’s not even an installer, you just have to unzip!
Now I’ve had a lot of fun working with Flyway so far and, thanks to a few people (Kendra, Julia– i’m looking at you both!) I have been able to wrap my head around it to, I would say, a fair standard. Caveat on that – being a pure SQL person please don’t ask me about Java based migrations, I’m not quite there yet!! But there is one thing that I kept asking myself:
“When I’m talking to colleagues and customers about Database DevOps, I’m always talking about the benefits of continuous integration; building the database from scratch to ensure that everything builds and validates…” etc. etc. so why haven’t I really come across this with Flyway yet?
Probably for a few reasons. You can include Flyway as a plugin in your Maven and Gradle configurations, so people writing java projects already get that benefit. It can easily form part Flyway itself by virtue is simply small incremental scripts and developers can go backwards and forwards however and as many times as they like with the Flyway Migrate, Undo and Clean commands, so is there really a need for a build? And most importantly, Flyway’s API just allows you to build it in. So naturally you’re building WITH the application.
But naturally when you’re putting your code with other people’s code, things have to be tested and verified, and I like to do this in isolation too – especially for databases that are decoupled from the application, or if you have a number of micro-service style databases you’d want to test all in parallel etc. it’s a great way to shift left. So I started asking myself if there was some way I could implement a CI build using Flyway in Azure DevOps, like I would any of the other database tooling I use on a regular basis? Below you’ll find the product of my tinkering, and a whole heap of help from Julia and Kendra, without whom I would still be figuring out what Baseline does!
Option 1) The simplest option – cmdline
Flyway can be called via the command line and it doesn’t get more simple than that.
You can pass any number of arguments and switches to Flyways command line, including specifying what config files it’s going to be using – which means that all you have to do, is unzip the Flyway components on a dedicated build server (VM or on-prem) and then, after refreshing the migrations available, invoke the command line using Azure DevOps pipelines (or another CI tool) to run Flyway with the commands against a database on the build server (or somewhere accessible to the build server) and Bingo!
And that’s all there is to it! You get to verify that all of the migrations up to the very latest in your VCS will run, and even if you don’t have the VERY base version as a baseline migration, you can still start with a copy of the database – you could even use a Clone for that!
But yes, this does require somewhere for Flyway to exist prior to us running with our migrations… wouldn’t it be even easier if we could do it without even having to unzip Flyway first?
Option 2) Also simple, but very cool! Flyway with Docker
Did you know that Flyway has it’s own docker image? No? Well it does!* Not only that but we can map our own version controlled Migration scripts and Config files to the container so that, if it can point at a database, you sure as heck know it’s going to migrate to it!
This was the method I tried, and it all started with putting a migration into Version Control. Much like I did for my post on using SQL Change Automation with Azure SQL DB – I set up a repo in Azure DevOps, cloned it down to my local machine and I added a folder for the migrations:
Into this I proceeded to add my base script for creating the DMDatabase (the database I use for EVERYTHING, for which you can find the scripts here):
Once I had included my migration I did the standard
Git add .
Git commit -m "Here is some code"
and I had a basis from which to work.
Next step then was making sure I had a database to work with. Now the beauty of Flyway means that it can easily support 20+ RDBMS’ so I was like a child at a candy store! I didn’t know what to pick!
For pure ease and again, simplicity, I went for good ol’ SQL Server – or to be precise, I created an Azure SQL Database (at the basic tier too so it’s only costing £3 per month!):
Now here’s where it gets customizable. You don’t NEED to actually even pass in a whole config file to this process. Because the Flyway container is going to spin up everything that would come with an install of Flyway, you can pass it switches to override the default behavior specified in the config file. You can adapt this either by hard-coding strings or by using Environment Variables alongside the native switches – this means you could pass in everything you might need securely through Azure Pipeline’s own methods.
I, on the other hand, was incredibly lazy and decided to use the same config file I use for my Dev environment, but I swapped out the JDBC connection to instead be my Build database:
I think saved this new conf file in my local repo under a folder named Build Configuration – in case I want to add any logic later on to include in the build (like the tSQLt framework and tests! Hint Hint!)
This means that I would only need to specify 2 things as variables, the location of my SQL migrations, and the config file. So the next challenge was getting the docker container up and running, which fortunately it’s very easy to do in Azure Pipelines, here was the entirety of the YAML to run Flyway in a container (and do nothing with it yet):
So, on any changes to the main branch we’ll be spinning up a Linux VM, grabbing Docker and firing up the Flyway container. That’s it. Simple.
So now I just have to pass in my config file, which is already in my ‘build config’ folder, and my migrations which are in my VCS root. To do this it was a case of mapping where Azure DevOps stores the files from Git during the build to the containers own mount location in which it expects to find the relevant conf and sql files. Fortunately Flyway and Docker have some pretty snazzy and super clear documentation on this – so it was a case of using:
-v [my sql files in vcs]:/flyway/sql
as part of the run – though I had to ensure I also cleaned the build environment first, otherwise it would just be like deploying to a regular database, and we want to make sure we can build from the ground up every single time! This lead to me having the following environment variables:
As, rather helpfully, all of our files from Git are copied to the working directory during the build and we can use the environment variable $(Build.Repository.LocalPath) to grab them! This lead to me updating my YAML to actually do some Flyway running when we spin up the container!
Effectively, this will spin up the VM in ADO, download and install Docker, fire up the Flyway container and then 1) clean the target schema (my Azure SQL DB in this case) and 2) then migrate all of the migrations scripts in the repo up to the latest version – and this all seemed to work great!*
*Note: I have an enterprise Flyway licenses which enables loads of great features and support, different version comparisons can be found described here.
So now, whenever I add Flyway SQL migrations to my repo as part of a branch, I can create a PR, merge them back into Trunk and trigger an automatic build against my Flyway build DB in Azure SQL:
Getting up and running with Flyway is so very very easy, anyone can do it – it’s part of the beauty of the technology, but it turns out getting the build up and running too, when you’re not just embedding it directly within your application, is just as straightforward and it was a great learning curve for me!
The best part about this though – is that everything above can be achieved using pretty much any relational database management system you would like, either via the command line and a dedicated build server, or via the Docker container at build time. So get building!