Moving from Redgate SQL Source Control pipelines to Flyway Desktop with Redgate Deploy

“Like all magnificent things, it’s very simple.”
Natalie Babbitt

There has been a lot of change over the years in the Redgate solutions – I hasten to add this is a good thing. Back in my day it was SQL Source Control to store your database in Version Control; at the time it was probably a 50/50 split between people who used Git and people who used other systems like SVN, TFVC (TFS/VSTS) and Vault or Mercurial etc. and you could then use DLM Automation to build and deploy this state-based database project to Test, Prod and so on.

SQL Source Control and DLM Automation (later SQL Change Automation) have formed the basis for many a pipeline for many many years, and they have been reliable, in some cases life changing for those who have used them… but the times, they are-a changing!

These technologies are still a great option and are still present in Redgate Deploy for those whom they work for, however with the rise of still further distributed computing topologies, and the dominance of cloud-hosted architecture and PaaS databases in todays world – something new is needed.

Enter Flyway Desktop.

As you’ve seen in some of my previous posts, Flyway Desktop is really really easy to get up and running with, not only that but it combines the State and Migrations models together creating one repo with ALL the benefits, and none of the deciding which model is best for you. It was architected from the ground up to be 3 things:

  • Ingeniously simple: to set up, to use, to everything.
  • Cloud ready: designed for use with IaaS and PaaS database options
  • A combination of the best of the best: all of the benefits of previous Redgate solutions, few to none of the drawbacks

...but what if you’re already using Redgate?

Yes Flyway Desktop and Redgate Deploy in general are super easy to get up and running with for new databases, even difficult, monolithic databases (thank you Clone as shadow!), but what about projects you already have under source control? Like I mentioned, SQL Source Control has been around for years and is beloved by many, and SQL Change Automation is still in use by thousands too. We want to maintain the history of our changes for reference, and we don’t want to simply disregard the whole pipeline. So the big question is how do we upgrade our state-based pipeline? Let’s find out together!

Note: This post is for people who want to or are interested in moving to a newer solution (and to give them an idea of what to expect) and in no way reflects any level of urgency you should be feeling – I’m certainly not pushing you to move any of your pipelines now, especially if you’re happy with what you have!

Setup

For starters I set up an end to end SQL Source Control and SQL Change Automation pipeline in Azure DevOps – my understanding of the approach I’m going to take is that this should work wherever your pipeline is (TeamCity & Octopus Deploy, Bamboo, whatever) so don’t feel that this post is not for you just because I used Azure DevOps.

I set up a copy of the DMDatabase on my local SQL Developer Instance, and then created an Azure DevOps repo and cloned it down to my machine:

I linked my database to the repo, created a filter to filter out users and committed it to my repo – then I set up the YAML for the build, and the Release steps for SQL Change Automation:

My SQL Source Control Project in Azure DevOps (Git)
The YAML to build my SQL Source Control Project
Release Steps in Azure DevOps
Deployment Steps

Everything seems to be deploying ok, I’ve even set up an Azure SQL Database as the target for my database changes. Now we have this SQL Source Control -> SQL Change Automation pipeline running, lets investigate replacing it.

SQL Source Control

The first thing I did was to open Flyway Desktop and create a new project – I pointed the project at my Dev DB and at the same local repo that I host my SQL Source Control files in:

and without committing the state to my schema-model folder, only linking to the Dev database, we end up with our repo looking like this:

I’m going to delete the Redgate.ssc file, because we’re no longer in SQL Source Control and I’m going to move every other file to the schema-model folder that is now under my project name (DMDatabase) – full on Copy Paste style:

…and then hit refresh in the Schema Model tab of Flyway Desktop:

and… nothing should happen. Absolutely nothing, because the state of your project, the Schema-Model folder should now exactly match the state of your development database (assuming you had everything committed to SQL Source Control!) – so now we come across to the version control tab aaaand…

WAIT!

If we commit now it will break our CI build, because when we trigger with a new push, my YAML will be expecting $(Pipeline.Workspace)/s/Database as the input, but now we have a slightly altered project we want to build a slightly different path. I’m going to temporarily disable my CI trigger in the YAML pipeline:

and now I’m going to Pull (to get the YAML file in my local repo) and then commit and push my changes:

Now I’m going to change my build YAML file to $(Pipeline.Workspace)/s/Database/DMDatabase/schema-model then save and re-enable Continuous Integration:

et voila!

SQL Change Automation sees it as a regular state based repo and builds and deploys it with no issues whatsoever:

and just like that! SQL Source Control is replaced – our teams can now pull down the latest copy of the Repo with the Flyway Desktop project in and open it. All they will need to do is re-specify their Dev Database Connection. If you are only using SQL Source Control or you’re using SQL Source Control with the SQL Compare GUI for more manual deployments currently then you’re done! When you want to extend your pipeline, you can read below.

SQL Change Automation

This is the step where we have to fundamentally change the way the pipeline works. It’s easy to switch across from a SQL Source Control to Flyway Desktop, which means we get immediate upgrades in speed, reliability and stability in our development process, especially where we’re working with Cloud-hosted databases.

With Redgate Deploy though, we’re fundamentally leveraging the Flyway command line capability for smooth, incremental deployments, and this is always a migrations only deployment – to move across to using Flyway then we’re going to need to make a few alterations to how the pipeline works.

First-things-first: We need some migrations, more specifically: THE migration. When you create a Flyway Desktop project usually you create a Baseline script. This script is the state of your Production environment(s), or a copy of them, and is used to basically be the starting point for your incremental migration scripts in the pipeline. The Baseline, once generated, is run against an empty database referred to in Flyway Desktop as the Shadow Database, although this can of course be a Clone too. Not every developer necessarily needs this – only the ones who will be generating the deployable artifacts, the migrations themselves, and putting them into source control, but they are definitely needed for deployments.

Note: I have some clients I’m working with who want every developer to affect schema changes and then immediately generate the migration for this and share with the team, but equally I have others who want 10 or so developers to share the responsibility of schema changes, and then once they’ve reviewed at the end of a sprint, they generate the Migration for the changes, source control it and approve it.

So in Flyway Desktop we set up our erasable database, our Shadow DB:

I use an empty database I stood up quickly in the Azure Portal:

and on the Generate Migrations tab I’m now prompted to create a baseline script:

I’m going to create the Baseline from my “Prod” environment that I’ve been using for my SQL Source Control deployments and hit baseline:

When you save and finish this will now run the baseline against the Shadow DB to recreate everything – and this is going to give you a chance to detect any changes you still have outstanding in the schema model – Flyway Desktop will compare the environments and detect any outstanding Dev changes, allowing you to also produce a migration for them.

Note: If your plan is to use this process to capture any outstanding code in a V002 “Delta” script to bring all environments back into line, you absolutely can but I would advise you to make the script idempotent – if you add all the necessary IF EXISTS statements for the deployment, you should be ok and it will only create or alter the objects that have to be, in order to sync all the environments up.

First Pull any pending changes from your repo then commit and push this into your Git remote:

and it should look a little like this:

Now for second-things-second, the build. This is actually going to be a very simple step, perhaps the easiest to change. We’re already using YAML, and as you know from previous posts it’s really very easy to leverage the Flyway command line as part of your YAML pipeline, so I’m going to simply swap out the SQL Change Automation build YAML with an updated version of the Flyway YAML from that post:

trigger:
- main

pool:
  vmImage: 'ubuntu-latest'
 
steps:
- task: DockerInstaller@0
  inputs:
    dockerVersion: '17.09.0-ce'
  displayName: 'Install Docker'
 
- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(locations):/flyway/sql flyway/flyway clean -url=$(JDBC) -user=$(userName) -password=$(password)
  displayName: 'Clean build schema'
 
- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(locations):/flyway/sql flyway/flyway migrate -url=$(JDBC) -user=$(userName) -password=$(password)
  displayName: 'Run flyway for build'

My password and username I shall hold back for the JDBC connection variable needs to be encapsulated in quotes, to prevent it being escaped or running partially because of the semi-colon:

jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build”

and the locations variable was my newly created migrations folder:

$(Pipeline.Workspace)/s/Database/DMDatabase/migrations

Fortunately these few changes mean that I now have a green build where I’m cleaning my Build DB and then building all of my files from there:

Deploying to Production is the only thing left. There’s a decision to be made here – because we’re just invoking the Flyway Docker Container, and we already have the YAML pipeline set up for the build we can:

  • As part of the build, zip up the migrations from the repo and publish them as an artifact, which we can then hand off to the Release portion of Azure DevOps, or indeed any other solution such as Octopus Deploy and run Flyway command line from there
  • OR we can simply expand out the YAML file – discard the “Release” pipeline and go FULL pipeline as code (which is also easier to audit changes on).

Given that we’re modernizing our deployment pipeline and introducing lean deployments of these incremental migration scripts, I’m opting for the latter, so I disable and archive my Release pipeline specifically and simply expand my YAML file with an additional step and an additional variable for the ProdJDBC instead of the Build DB:

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(locations):/flyway/sql flyway/flyway migrate -url=$(ProdJDBC) -user=$(userName) -password=$(password) -baselineOnMigrate=true -baselineVersion=001.20211210091210
  displayName: 'Deploy to Prod'

and of course in that YAML not forgetting the all important –baselineOnMigrate and –baselineVersion switches (which I’ve always been forgetting) – these are important because we’ll be marking the baseline script as deployed against our target and not actually running the baseline script – we don’t want to try to recreate all of the objects that already exist there.

This is the result:

Successful deployment to Prod, successful move to Flyway Desktop

Pre- and Post- Deployment Scripts

You might leverage pre- and post-deployment scripts in your SQL Source Control pipeline, something that has to happen each time before or after a deployment – if you want to maintain these in your new repo moving forwards you’ll need to make use of the Flyway callback functionality; take your pre-deployment scripts and turn them into a beforeMigrate callback and turn your post-deployment into an afterMigrate callback. These can sit in your migrations folder but:

  1. You may not need these now – because you have access to the migrations first deployment model, most changes can now be tailor-made to your deployment needs, such as injecting DML. statements in with your DDL scripts
  2. They will also run every time against your Shadow DB when you generate a new migration – just something to be aware of.

Final Word

It was much much easier than I thought it would be to move across, but I by no means believe that this will be as easy for everyone who needs or wants to move in the medium-long term. I am always an advocate of testing things out prior to setting them up in earnest, and would encourage you to try this workflow out for yourself first, perhaps in tandem with your SQL Source Control pipeline against a dummy Prod DB temporarily to see how comfortable your team is with the process, and to give yourself the time to ask the questions you might have.

3 simple pipelines for database development with Redgate Deploy – Part 3: CircleCI

“There is no place to reach.. only places to rest to carry on.”
Jaya Bhateja

SPOILER ALERT – This is part 3 of a 3 part series on enabling database deployments using Redgate Deploy, so if you have not read at least the Setup and Principles section of my previous post (Part 1 which you can find here, and if you’re interested Part 2 here for GitHub Actions) then I would strongly advise you do so! Thanks!

In my setup post we managed to get 3 Flyway Desktop repositories set up: 1 for each CICD system we’ll be using, and a number of Azure SQL Databases to use as “Dev“, “Build“, “PROD” etc. – I have never used CircleCI before so this will be a new experience as I try to figure it out at the same time as set up a database deployment pipeline… but just to recap the principles of what we’re trying to achieve:

Principles

I’m setting up 3 separate pipelines in this post which will all effectively do the same thing, but for different “Prod” copies of databases, however when building and deploying in practice you will have a number of tasks you will want to accomplish in and around the process itself (such as really useful things like Unit Tests, Code Analysis etc.). To keep things simple I will be creating a 6th Database – the “Build” database which will act as our CI validation step and our process for all 3 pipelines will be:

  • Invoking a Flyway Clean against the “Builddatabase – this step will remove every object on the database leaving it empty
  • Invoking a Flyway Migrate against the “Builddatabase – this step will build the database from scratch to validate our baseline script and any further migrations build successfully
  • Invoking a further Flyway Migrate against our respective “Prod” database, to deploy the latest scripts we have generated.

CircleCI

Ok I made my way into CircleCI and it was really easy to get up and running with (the free tier that is) and OHMYGOSH will you look at this sleek beauty:

So far so good – CircleCI seems to be even easier to understand so far than GitLab (and CONSIDERABLY easier than GitHub Actions) – I’m sure there are a lot of major differences (and GitLab was really easy to use) but I’m hoping for a similar experience here by the looks of it!

I create a new project pipeline where it asks me to select a repo for this “project”:

So I hit “Set UP Project” and then “build my own yml script” – now you would think this might just give me a blank script but no, just like GitLab they give us the option of a starter pipe:

I’m going to go ahead and choose the “Hello World” pipeline because normally that’s the easiest to cannibalize!

Much like GitLab it has an indicator to let us know whether our YAML is valid or not (I’m looking at YOU Azure DevOps!!!!) which is a massive help, and in general it’s just pretty easy to see what each step is doing. I built out an example YAML file using similar commands to my GitLab pipeline like so:

version: 2.1
parameters:
  ciJDBC:
    type: string
    default: jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build
  prodJDBC:
    type: string
    default: jdbc:sqlserver://dmproduction.database.windows.net:1433;database=DMDatabase_PROD_CircleCI
  userName:
    type: string
    default: username
  password:
    type: string
    default: password
  migrationPath:
    type: string
    default: .\

jobs:
  clean:
      docker:
        - image: flyway/flyway:latest-alpine
      steps:
        - checkout
        - run:
            name: "Clean Build Database"
            command: "flyway clean -url=${ciJDBC} -user=${userName} -password=${password} -locations=filesystem:${migrationPath}"

  build:
    docker:
      - image: flyway/flyway:latest-alpine
    steps:
      - checkout
      - run:
          name: "Migrate to Build Database"
          command: "flyway migrate -url=${ciJDBC} -user=${userName} -password=${password} -locations=filesystem:${migrationPath}"

  deploy:
    docker:
      - image: flyway/flyway:latest-alpine
    steps:
      - checkout
      - run:
          name: "Deploy to Prod"
          command: "flyway migrate -url=${prodJDBC} -user=${userName} -password=${password} -locations=filesystem:${migrationPath}"

workflows:
  database-deploy-workflow:
    jobs:
      - clean
      - build
      - deploy

and also used the variables reference from the CircleCI documentation which was pretty helpful. But it resulted in this:

Turns out I made a few boo-boos along the way. So variables I was passing in like this: ${Variable} but Circle only really seemed to like it when I used << pipeline.parameters.variable >> because I had defined it at the beginning of the YAML file under parameters.

I also had the jobs running in parallel because I hadn’t defined in my workflow which steps were dependent on which – a lesson I SHOULD really have remembered from GitHub… but oh well. I corrected that:

workflows:
  database-deploy-workflow:
    jobs:
      - clean
      - build:
          requires: 
            - clean
      - deploy:
          requires: 
            - build

Interestingly everything was still failing and although everything was being passed through correctly, the only thing that was ACTUALLY making it to the Flyway Docker container was the first part of the JDBC connection:

Guess what? I had my quote marks in the wrong place.

destroy GIF

It’s ok though because 2 other things failed:

  1. The Prod deploy failed because it found a non-empty schema, a problem I seem to fall over EVERY SINGLE TIME, but which is easily remedied by providing the 2 switches to the Prod deployment: -baselineOnMigrate=true and -baselineVersion=[YourBaselineScriptVersion]
  2. The filepath specified wasn’t a valid path:

Yes, before anyone tells me I know my top level repo folder is still called “GitLab-Flyway“, I figured that out in the last post and I’m still face-palming. So I’m going to quickly alter the Prod Flyway migrate command and then play around with the filesystem locations first to see if I can find a value it likes…

Oh. It just needed a “.”… neat. Well here is the finished YAML that seems to work a treat:

version: 2.1
parameters:
  ciJDBC:
    type: string
    default: "jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build"
  prodJDBC:
    type: string
    default: "jdbc:sqlserver://dmproduction.database.windows.net:1433;database=DMDatabase_PROD_CircleCI"
  userName:
    type: string
    default: "username"
  password:
    type: string
    default: "password"
  migrationPath:
    type: string
    default: "./GitLab-Flyway/migrations"

jobs:
  clean:
      docker:
        - image: flyway/flyway:latest-alpine
      steps:
        - checkout
        - run:
            name: "Clean Build Database"
            command: flyway clean -url="<< pipeline.parameters.ciJDBC >>" -user=<< pipeline.parameters.userName >> -password=<< pipeline.parameters.password >> -locations=filesystem:<< pipeline.parameters.migrationPath >>

  build:
    docker:
      - image: flyway/flyway:latest-alpine
    steps:
      - checkout
      - run:
          name: "Migrate to Build Database"
          command: flyway migrate -url="<< pipeline.parameters.ciJDBC >>" -user=<< pipeline.parameters.userName >> -password=<< pipeline.parameters.password >> -locations=filesystem:<< pipeline.parameters.migrationPath >>

  deploy:
    docker:
      - image: flyway/flyway:latest-alpine
    steps:
      - checkout
      - run:
          name: "Deploy to Prod"
          command: flyway migrate -url="<< pipeline.parameters.prodJDBC >>" -user=<< pipeline.parameters.userName >> -password=<< pipeline.parameters.password >> -locations=filesystem:<< pipeline.parameters.migrationPath >> -baselineOnMigrate=true -baselineVersion=001.20211130101136

workflows:
  database-deploy-workflow:
    jobs:
      - clean
      - build:
          requires: 
            - clean
      - deploy:
          requires: 
            - build

and we have ourselves one nice, lean CircleCI build and deployment pipeline:

Baseline Script successfully marked as Deployed, and 2nd migration successfully deployed as shown by Flyway_Schema_History table on DMDatabase_Prod_CircleCI

Conclusion

Was the purpose of these three blog posts for me to build 3 perfect pipelines, with impeccable secrets handling, automated testing, code analysis and all the best practices that mean they can all be rolled out into Production deployment pipelines tomorrow with no editing?

No. No way. Far from it.

But the purpose was to prove something else – that it can be done. This is the bare bones approach to enabling your database pipelines with Redgate Deploy and the Flyway Docker container in 3 different CICD systems; GitLab, GitHub and CircleCI and what we hoped to observe was that they can all in fact be used, with Redgate Deploy, to deploy schema changes to any of the supported RDBMS’.

That is indeed what we did. Happy Migrating!

Thank you to everyone who has stuck it out through all 3 parts, trust me, I did an awful lot of learning here myself and made COUNTLESS YAML mistakes – although I don’t class myself a Level 20 Warlock-slash-CICD-Pipeline-Guru it has been thoroughly interesting and I hope you managed to use the basis for these posts as success for your own pipelines! If you do – let me know, I love to hear from anyone who reads my posts!

3 simple pipelines for database development with Redgate Deploy – Part 2: GitHub Actions

“Finding new, relevant quotes for every blog post is hard”
Chris Unwin

SPOILER ALERT – This is part 2 of a 3 part series on enabling database deployments using Redgate Deploy, so if you have not read at least the Setup section of my previous post (Part 1 which you can find here) then I would strongly advise you do so! Thanks!

Welcome to another edition of “Chris stumbles around in the dark until eventually it starts working, but don’t question it because it might stop.” – today we’re taking on GitHub actions and unlike GitLab, I’ve never used Actions before. In my previous post we managed to get 3 repositories set up: 1 for each CICD system we’ll be using, and a number of Azure SQL Databases to use as “Dev“, “Build“, “PROD” etc. just to recap the principles of what we’re trying to achieve though:

Principles

I’m setting up 3 separate pipelines in this post which will all effectively do the same thing, but for different “Prod” copies of databases, however when building and deploying in practice you will have a number of tasks you will want to accomplish in and around the process itself (such as really useful things like Unit Tests, Code Analysis etc.). To keep things simple I will be creating a 6th Database – the “Build” database which will act as our CI validation step and our process for all 3 pipelines will be:

  • Invoking a Flyway Clean against the “Builddatabase – this step will remove every object on the database leaving it empty
  • Invoking a Flyway Migrate against the “Builddatabase – this step will build the database from scratch to validate our baseline script and any further migrations build successfully
  • Invoking a further Flyway Migrate against our respective “Prod” database, to deploy the latest scripts we have generated.

GitHub Actions

Like I said, this is the first time I’ve tried to use Actions all by myself and I’m not 100% sure where to start – I’m familiar with the GitHub interface OBVIOUSLY but at first glance I wouldn’t say it’s immediately obvious… well not to a n00b like me anyway! There’s one little “Actions button” in the center and if you miss it you’ll spend far too long not seeing the wood for the trees!

My GitHub repo with just the Flyway Desktop files in

Well anyway after some investigation it turns out that GitHub Actions is called that for a reason, because you build Actions into your YAML pipeline. Duh. Not sure how I missed that… but it kind of makes perfect sense and on top of that its quite similar in a way to Azure DevOps or Jenkins plugins really.

Unfortunately though there was slim pickins’ on the GitHub “Marketplace” for Flyway Actions, the closest one, created by joshuaavalon only migrates and was specifically put together for PostgreSQL deployments:

joshuaavalon’s Action usage example

So I did what any self-respecting person would do… I took the difficult route, forked Joshua’s repo and made the action generic, so that you could pass in whatever command you wanted to, plus changed it to use the latest Alpine version of the Flyway docker container. Plus I bumped the version number tag and fixed the docs that was using an old version of the Action, and by extension, Flyway container.

Forked repo in GitHub for the Flyway Action

2 things on this:

  1. I would assume that there will be an official Flyway Action at some point in the future, and there’s an argument to be made that using different Flyway commands should be different Actions in themselves — so as a result, I’m not going to publish this Action to the marketplace nor am I going to raise a PR for Joshua’s repo. But it was good to test out the theory.
  2. Thanks to Matt and Andrew here at RG for helping me out with a rather annoying problem involving invoking the correct version – this was solved by them and saved me banging my head against a brick wall for MANY MANY hours.

Anyway once I had the Action itself working (and it turns out it just needs to be sat in a public repo for you to invoke it, which is pretty neat – you don’t HAVE to publish if you don’t want/need to) it was time to move on to properly testing my YAML to build and deploy the contents of GitHub_Flyway. This was what I started off with:

name: CI

on:

  push:
    branches: [ trunk ]
  pull_request:
    branches: [ trunk ]

  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: ChrisUnwin/flyway-action@master
        with:
          url: jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build
          user: username
          password: password
          command: clean

…and everything seemed wonderful:

Successful “build” e.g. Flyway Clean in GitHub Actions
Console output of successful Flyway Clean

But then I tried to extend my YAML to also include the Build and Deploy components for the pipeline…

Build failure in GitHub Actions

I have 2 problems:

  1. It turns out it’s not looking at the correct filesystem location (e.g. “.\SQL doesn’t exist”), which is obvious, because I’m a numpty and forgot to change this to instead point at my “migrations” location instead. So I’ll fix that now.
  2. I forgot to include the Baseline on Migrate and Baseline Migration Version arguments like I did with GitLab. Flyway Desktop generates the baseline, mine being V001_20211130101136__baseline.sql so I need it to baseline (but not run) V001_20211130101136 when Flyway goes to deploy. Otherwise Flyway will, like I experienced, fail the deployment because there are existing objects in the PROD database… obviously.

The solution to #1 is SUPER easy fortunately but the solution to #2 has me thinking. If i add -baselineOnMigrate=true to the action, then it will baseline my Build database too, meaning it won’t run the baseline and my build will fail; so I will need to pass an additional argument to my YAML somehow… *sigh* back to the Action.

I also by luck happened to notice that all my actions steps were running at the same time – but thanks to this blog post from Bryan Hogan I was able to find the neat little “uses” command, so that fixed that issue….

So the final version of the Action was:

name: Flyway Pipeline Action
description: Migrate database in Github Action using latest Flyway Docker Container
branding:
  icon: database
inputs:
  url:
    description: The jdbc url to use to connect to the database
    required: true
  user:
    description: The user to use to connect to the database
    required: false
  password:
    description: The password to use to connect to the database
    required: false
  locations:
    description: Comma-separated list of locations to scan recursively for migrations
    required: true
    default: filesystem:./sql
  command:
    description: Which of the Flyway commands you would like Flyway to run
    required: true
    default: migrate
  disableclean:
    description: Would you like clean disabled (true or false)
    required: false
    default: false
  baselineonmigrate:
    description: Would you like to baseline on migrate (true or false)
    required: false
    default: false
  baselineversion:
    description: What is your baseline version
    required: false
    default: 0.0
runs:
  using: docker
  image: docker://flyway/flyway:latest-alpine
  env:
    FLYWAY_URL: ${{ inputs.url }}
    FLYWAY_USER: ${{ inputs.user }}
    FLYWAY_PASSWORD: ${{ inputs.password }}
    FLYWAY_LOCATIONS: ${{ inputs.locations }}
    FLYWAY_CLEAN_DISABLED: ${{ inputs.disableclean }}
    FLYWAY_BASELINE_ON_MIGRATE: ${{ inputs.baselineonmigrate }}
    FLYWAY_BASELINE_VERSION: ${{ inputs.baselineversion }}
  args:
    - ${{ inputs.command }} 

I just re-invented the wheel (which I know you shouldn’t do) but it was super easy – I added the environment variables for Flyway to provide the -baselineOnMigrate and -baselineVersion switches much like I had to in the last post.

I never thought I’d be so happy to see those little green dots:

This resulted from the following pipeline YAML:

name: CI

on:

  push:
    branches: [ trunk ]
  pull_request:
    branches: [ trunk ]

  workflow_dispatch:

jobs:
  clean-build-schema:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: ChrisUnwin/flyway-action@master
        with:
          url: jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build
          user: username
          password: password
          command: clean
          
  build:
    needs: clean-build-schema
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: ChrisUnwin/flyway-action@master
        with:
          url: jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build
          user: username
          password: password
          locations: /GitLab-Flyway/migrations
          command: migrate
      
  deploy-to-prod:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: ChrisUnwin/flyway-action@v4
        with:
          url: jdbc:sqlserver://dmproduction.database.windows.net:1433;database=DMDatabase_PROD_GitHub
          user: username
          password: password
          locations: /GitLab-Flyway/migrations
          command: migrate
          disableclean: false
          baselineonmigrate: true
          baselineversion: 001.20211130101136

Can we take a minute to appreciate the fact that I have only just now realized that I accidentally named the top level folder GitLab-Flyway across all 3 of my repos? Well, this pipeline has nothing to do with GitLab, but it works and that’s what I care about!:

DMDatabase_Prod_GitHub successfully migrated to V002

Conclusion

Getting my head around how actions work as part of the pipeline was definitely one of the hardest aspects to this challenge – however, given that yet again we can simply include the Docker container for Flyway as part of the pipeline, we can very quickly and easily get up and running with Flyway for our Redgate Deploy pipeline, you just need the right YAML for the job.

3 simple pipelines for database development with Redgate Deploy – Part 1: Setup & GitLab

Society must adapt to diamonds, diamonds don’t adapt to society.
Abhijit Naskar

The world is changed… I feel it in the water… I feel it in the earth… smell it in the air. On a totally unrelated note did you know December 2021 marks the 20th Anniversary of the Lord of the Rings films? Just in case you were looking for your reminder to go and watch those masterpieces again, this is that sign!

Seriously though – gone are the days when I would demonstrate database pipelines on 1 or two different technologies. Over the last 6 years I have walked people through database deployments using an array of CICD options; Jenkins, TeamCity, Octopus Deploy, Bamboo… and most recently I’ve spent most of my time on Azure DevOps. At times it can even feel like Azure DevOps is the only solution you’ll need, but increasingly it’s becoming obvious that isn’t the case and there are new, shiny providers who offer some amazing experiences and awesome functionality.

Now seems like the best time to explore 3 of the ones I’m coming across more and more – CircleCI, GitLab and GitHub Actions.

The interesting part of this is that I genuinely believe that this will be incredibly easy. Maybe I’m naïve but from the looks of all 3 they seem straight-forward, understandable… and of course I’ll be using Flyway in my pipeline, which is the easiest, cross-platform friendly solution to use for this.

Note: I will assume you have some familiarity with Flyway in this post, if you don’t read more about the capabilities of Redgate Deploy here.

The Setup

For this “challenge” (if I can call it that) I’m going to be using Flyway Desktop installed on my Windows laptop, GitHub as my Version Control system and 5 Azure SQL Databases: 2 for “Dev” & “Dev_Shadow” (from which I will generate 3 independent repos) and 3 environments for PROD_GitLab, PROD_CircleCI and PROD_GitHub respectively. The structure of the database will be the DMDatabase, unsurprisingly the database I use for pretty much everything I do on this blog.

Note: Everything I’m doing today uses SQL Server (well… Azure SQL Database) however everything here is cross OS – you can set up similar pipelines for everything from Oracle to PostgreSQL to CockroachDB if you would like!

5 Databases ready to go – as shown in the Azure Portal

Fortunately CTRL+C, CTRL+V exists, so I’ll only have to setup once and then I’ll just copy the files across into the other two repos; I set up a new private Repo in my GitHub specifically for GitLab but you could easily repeat these steps below separately for GitHub Actions or CircleCI:

GitLab repo in GitHub

I clone this down onto my Windows machine using Git Bash and then linked to and created my Flyway Desktop project (don’t know how? Try this!):

Link the development database and the shadow, generate the Schema Model and the Baseline Migration from DMDatabase_PROD_GitLab (I just grab the relevant JDBC connectors from the Azure Portalthis makes it much easier!), don’t forget to specify the list of Schemas, I did and it ain’t pretty (but is an easy enough fix)

Then I commit and push the schema model files and the baseline migration up into GitHub:

For good measure I also changed the DM_CUSTOMER table on the Dev environment and generated a new schema-model and migration change so I know what is going to be deployed to my “Prod” environments as part of this test:

Then after committing and pushing to my repo, I copied all of the files over to my GitHub and CircleCI repositories too:

A quick check of my other repos and everything seems good to go!

Principles

I’m setting up 3 separate pipelines in this post which will all effectively do the same thing, but for different “Prod” copies of databases, however when building and deploying in practice you will have a number of tasks you will want to accomplish in and around the process itself (such as really useful things like Unit Tests, Code Analysis etc.). To keep things simple I will be creating a 6th Database – the “Build” database which will act as our CI validation step and our process for all 3 pipelines will be:

  • Invoking a Flyway Clean against the “Builddatabase – this step will remove every object on the database leaving it empty
  • Invoking a Flyway Migrate against the “Builddatabase – this step will build the database from scratch to validate our baseline script and any further migrations build successfully
  • Invoking a further Flyway Migrate against our respective “Prod” database, to deploy the latest scripts we have generated.

GitLab

After following the Setup instructions above, in GitLab I need to create a New Project and I want it to Build/Deploy from my GitHub repo, so I pick “Run CI/CD for external repository

Fortunately it’s very easy to connect directly from GitHub, but you will have to generate a Personal Authentication Token which you can do by going to https://github.com/settings/tokens and then authorizing the main repo you want to build from – for me this is GitLab_Flyway:

Painless! From here I select CI/CD template and because I’m starting from scratch I’m going to use the starter 3 stage template:

It has a rather neat layout and is pretty darn easy to get up and running with:

I may have tried several combinations to get the Flyway Docker container up and running but essentially the code I ended up running for my pipeline was:

stages:          # List of stages for jobs, and their order of execution
  - build
  - deploy

variables:
    userName: "MyUserName"
    password: "MyPassword"
    prodJDBC: "jdbc:[TheJDBCConnectionToTheProdDBYoureUsing]"
    ciJDBC: "jdbc:[TheJDBCConnectionToTheBuildDBYoureUsing]"
    migrationPath: $CI_PROJECT_DIR

build-job:       # This job runs in the build stage, which runs first.
  image:
    name: flyway/flyway:latest-alpine
    entrypoint: [""]
  stage: build
  script:
    - flyway clean -url="$ciJDBC" -user="$userName" -password="$password" -locations="filesystem:$migrationPath"
    - flyway migrate -url="$ciJDBC" -user="$userName" -password="$password" -locations="filesystem:$migrationPath"

deploy-job:      # This job runs in the deploy stage.
  image:
    name: flyway/flyway:latest-alpine
    entrypoint: [""]
  stage: deploy  # It only runs when *both* jobs in the test stage complete successfully.
  script:
    - flyway migrate -url="$prodJDBC" -user="$userName" -password="$password" -locations="filesystem:$migrationPath" -baselineOnMigrate=true -baselineVersion="MyBaselineVersion"

It was actually quite easy to spot where I had failed in previous runs and GitLab breaks things up quite nicely for us to see:

Some interesting things I noted using this setup:

  • Because we are deploying up stream to an environment that hasn’t been deployed to with Flyway before you have to pass in the -baselineOnMigrate switch, what was interesting though is that I also needed to specify the -baselineVersion, otherwise it tried to baseline V1, which of course did not exist as my baseline was named something completely different (V001_DateTimeStamp_blahlablah)
  • The entrypoint is specified as it is because it drops you right at the Flyway executable so you can issue the Flyway commands – without this it doesn’t work
  • You can ABSOLUTELY pass your variables in (like *cough* password and username) in a much more secure way through GitHub using variables, but this was a great start for me
  • To pass in the files using a hosted repo, I had to use the environment variable $CI_PROJECT_DIR and that’s where the repo is checked out to, where your migrations are

But it ultimately ended up in what I was expecting – the database was migrated using the Flyway command steps:

Conclusion

Is it possible to setup a nice easy pipeline from Dev -> Prod with Redgate Deploy and GitLab? Yes, absolutely it is, and you can build out the pipeline in whatever fashion you want. Thankfully, the Docker container makes things much much easier!

Now, let’s see how we get on with GitHub Actions!

PASS Data Community Summit 2021

“Education is the kindling of a flame, not the filling of a vessel.”
Socrates

I will be speaking at PASS Data Community Summit 2021

I have spoken at previous PASS Summits; both through the virtue of working for Redgate, and off my own back through dedication and passion to the subject matter I speak about: Data Privacy and Protection.

In 2018 I stood on stage with Microsoft to speak about the nature of Static Data Masking, how it differs from Dynamic Masking and what challenges need to be considered for a successful static masking rollout.

In 2019 I stood on stage alone to talk about creating a strategy for masking non-Production environments, including a walkthrough of the dbatools.io masking functionality utilized alongside Azure SQL Database classifications. PASS Summit 2019 was also when Kendra Little encouraged me to set up this blog, for which I’m forever grateful.

In 2020… well. You know what happened.

In 2021 Summit sees a new lease of life. Data Community Summit will be entirely online (no surprises there) but one big surprise you might not know is that it is completely free to attend. Never before will there have been SUCH a swathe of incredible speakers, with such a huge variety of topics and learning pathways for free and available on demand afterwards.

The dates for your diary? November 8-12, 2021

As it happens, I will also be speaking about setting up an end to end deployment pipeline using the Flyway Community Edition, Azure SQL Database and Azure DevOps it would be great to see you but with so much on offer I could absolutely understand if you watched on catch up!

You can see all the speakers here, but here’s a short list of some oft he sessions I will definitely be tuning in to!

  • Erin Stellato – Demystifying Statistics in SQL Server
  • Grant Fritchey – Identify Poorly Performing Queries – Three Tools You Already Own
  • Tracy Boggiano – Azure SQL Fundamentals
  • Angela Tidwell – Azure Devops Dashboards EZ as pie-charts!
  • Indira Bandari – Getting started with Python for Data professionals
  • Jess Pomfret – Azure SQL & Our Toolbox To Manage It
  • Taiob Ali – Think like the Cardinality Estimator
  • Neil Hambly – Azure Notebooks – Data Science fundamentals
  • and many more!

So please go check it out & register, support the community and do a bunch of learning in the process – it will be amazing to see you there and hopefully I’ll even get to see some of you in person in the not-so-far future!

Configuring Dynamic Data Masking in Azure SQL Database from SQL Data Catalog using PowerShell

Horror is the removal of masks.”
Robert Bloch

I spend a great deal of my time talking about Data Masking, don’t believe me? Checkout here, here, here and even here. I talk about it a LOT, but I’m always talking about Static Data Masking (SDM), which is the process of masking the data at the file level itself so it is irreversibly altered – this is fabulous for non-Production environments like Dev and Test, especially when you pair it with a good cloning technology.

But what about Staging / Production environments?

I often work with teams to implement SDM and one of the fastest routes to successfully generating your masking sets for cloned environments is, of course, SQL Data Catalog (or cataloging solution of your choice) – you’ve already put the effort in to classify your columns and figure out where sensitive information exists within your databases and instances… so doesn’t it make sense that we can just use THAT as a source of truth and generate masking from there?

Note: I actually produced an end-to-end video showing the process of Cataloging, Masking and Cloning in under an hour here: https://www.red-gate.com/hub/university/courses/sql-data-catalog/end-to-end-data-protection-with-sql-data-catalog-and-sql-provision – if you watch/try it let me know how you get on!

Funnily enough though we wouldn’t want to use SDM for Production (and potentially Staging) environments though – as it irreversibly changes the data, it’s just going to completely mess up all of our Prod data. To tackle this then, many people I work with turn to Dynamic Data Masking.

Dynamic Data Masking (DDM)

DDM is a method of masking the data based on your access rights to the data. As far as customers see they have access to their data through our site or application no issues, but if anyone else needs to query that data, or different people need to see different results when querying environments, DDM has been their way to go.

Whilst a lot of people like to pick up on some of the well known downsides of DDM, it’s not like you’re entrusting the entire security of an environment to it alone – there are a ton of measures we can put in place and DDM is just one; like an ex-colleague of mine (someone very wise whom I admired greatly and am still sad to this day I no longer get to work with them) used to say: “It’s about building a defensible position. The more you do the easier it is to prove you’re doing something and the more likely you are to BE protected.

So when a customer asked this week if it was possible to configure Dynamic Data Masking from SQL Data Catalog (because they’d seen the “Treatment Intent” category and the tag that clearly states “Dynamic Data Masking”), just like we’re able to configure Static Data Masking, well now that was a challenge I couldn’t turn down!

The SQL Data Catalog Taxonomy Page – Treatment Intent Category showing Dynamic Data Masking

DDM in Azure SQL DB

Configuring Dynamic Masking in Azure SQL DB is fairly straight forward through the Azure portal, you can go to your Azure SQL DB, click Dynamic Data Masking and it gives you the option to simply pick and save columns to apply Data Masking to, and to whom these rules apply / don’t apply:

DDM in the Azure Portal for the DMDatabase_Dev, with masks configured on customer_firstname and customer_email

However when we potentially have a lot of columns or DBs to configure masks for this is really going to get very old very fast. As with all things, I turned to PowerShell for the answer and fortunately I found it: https://docs.microsoft.com/en-us/azure/azure-sql/database/dynamic-data-masking-overview?#set-up-dynamic-data-masking-for-your-database-using-powershell-cmdlets – basically I can get existing DDM configurations and set new configurations for columns directly from my Azure SQL DB using PowerShell.

Now I’m not expert on DDM, and Redgate Data Masker for SQL Server is not a DDM solution (so I’ve only ever needed to know SDM really) I don’t pretend to be, but it seemed that I had everything I needed to tie Catalog into DDM.

PowerShell time!

I’ve written so much PowerShell to get classifications out of Data Catalog at this point it’s become second nature, but if you’re using the SDC PowerShell module and you need a reference you can view it here: https://documentation.red-gate.com/sql-data-catalog/automation-with-powershell but the standard “stuff” goes:

  • Pull down the PoSh module
  • Connect to catalog where it’s installed using an Auth token
  • Grab out the classifications with Get-ClassificationColumn
  • Shrink this down to just the columns we care about based on the tags

But the Az PowerShell cmdlets were honestly just as easy to use! I was surprised how easy it was to get up and running:

  • Connect to my Azure subscription
  • Get the current list of columns already with DDM masks
  • Remove these from the Catalog list
  • Update the remaining columns to use the default Mask

This was the full code I ended up using:

# This script is intended to be used with Azure SQL Database and Redgate SQL Data Catalog, however you are welcome to adapt and edit as required
# It will pull columns out of azure that are already being masked, and a list of columns that need to be masked with DDM
# It will then rationalise these, and configure Default DDM masks for any columns not already being masked on that Azure SQL DB

#Variables for Azure SQL DB & Catalog
$ResourceGroup = "DMDb"
$ServerName = "dmnonproduction" # Your instance minus .database.windows.net
$instance = "dmnonproduction.database.windows.net" # The instance or logical SQL Server as displayed in SQL Data Catalog
$DatabaseName = "DMDatabase_Dev"
$CatalogServer="http://pse-lt-chrisu:15156" # Your SQL Data Catalog location, leave off the trailing "/"
$authToken="REDACTED" # Your SQL Data Catalog Auth Token
$AzureSub = "Redacted" # Your Sub ID

# Get the SQL Data Catalog PowerShell Module & Connect
Invoke-WebRequest -Uri "$CatalogServer/powershell" -OutFile 'data-catalog.psm1' -Headers @{"Authorization"="Bearer $authToken"}
Import-Module .\data-catalog.psm1 -Force
Connect-SqlDataCatalog -ServerUrl $CatalogServer -AuthToken $authToken 

#Connect to your Azure Subscription
Connect-AzAccount -Subscription $AzureSub

#Get current active DDM Masks from Azure
$DdmMasks = Get-AzSqlDatabaseDataMaskingRule `
    -ResourceGroupName $ResourceGroup `
    -ServerName $ServerName `
    -DatabaseName $DatabaseName
$ListOfDDMColumns = $DdmMasks | ForEach-Object {$_.SchemaName + '.' + $_.TableName + '.' + $_.ColumnName}

#Get columns from Catalog currently marked with "Dynamic Data Masking" as a treatment intent
$CatalogColumns = Get-ClassificationColumn `
    -InstanceName $instance `
    -DatabaseName $DatabaseName | Where-Object {$_.tags.name -eq "Dynamic data masking"} 

#Filter down to a list of columns that need to be masked, that currently aren't configured with DDM
$ColumnsToDDM = $CatalogColumns | Where-Object {($_.SchemaName + '.' + $_.TableName + '.' + $_.ColumnName) -notin $ListOfDDMColumns }


#Set default DDM Masks for identified columns
$ColumnsToDDM | ForEach-Object { `
    New-AzSqlDatabaseDataMaskingRule -ResourceGroupName $ResourceGroup `
                                     -ServerName $ServerName `
                                     -DatabaseName $DatabaseName  `
                                     -SchemaName $_.schemaName `
                                     -TableName $_.tableName `
                                     -ColumnName $_.columnName `
                                     -MaskingFunction "Default"

}

But I have also uploaded it to my GitHub here in case anyone would like to take and adapt as they see fit: https://github.com/ChrisUnwin/PowerShell/blob/master/Demos/Redgate%20Demos/DDMFromCatalog.ps1

And this was the result – here were the two columns I had already being masked:

Customer Firstname and Customer Email with DDM Masks Configured

These were the columns I had marked as Dynamic Data Masking in Data Catalog:

Customer firstname, lastname, street addres and email all marked for DDM in Catalog

and after running the PowerShell it deduced that the delta was street_address and lastname and created the default DDM mask for them in Azure:

All columns now being masked dynamically

Considerations

1 – I have used the default mask in this process, however if you wanted to configure the mask (as per the link to the docs above) to be specific numbers or format you could absolutely do this, simply by modifying the PowerShell to look at the Data Type and then just passing into a different New-AzSqlDatabaseDataMaskingRule for each of those types.

2 – This only applies to Azure SQL DB and does not take into account the considerations when using DDM on say, a 2017 SQL Server Instance running on a VM – however you could use the same approach to pass the columns into some dynamic T-SQL which would in turn run the correct command to add DDM to that/those column(s)

3 – I would still use Static Data Masking (SDM) for non-Production environments, because if anyone bypasses the DDM they will have access to the full data, which we don’t really NEED in less secure non-Prod environments anyway, so Static might well be the way to go!

“But I don’t wanna INSTALL it!”: Data Masker on the fly in Azure DevOps (with an Azure SQL DB)

“There is always a way to go if you look for it.”
Ernest A. Fitzgerald

As many of you know, I really enjoy talking about Data Masking. I fundamentally believe it is an absolutely ESSENTIAL part of Test Data Management and specifically the provisioning of Pre-Production environments. If you hold sensitive PII/PHI/PCI in your Production environments, you have no excuses for porting any of that back into Dev and Test.

I also stand firmly behind the belief (and it is just that, my belief) that masking is much safer than anything you can achieve with encryption or limiting access alone. Static masking, when done correctly, means that even if all other security measures are bypassed, or we accidentally expose data somehow, it doesn’t matter because the PII/PHI has been wiped, but it is still fundamentally useful for development and testing environments.

I’m also a huge proponent of using classification and data masking as part of a DevOps process, and often you’ll find me using Azure DevOps to actually kick off my masking process – but one of the most frequent questions I get is “do I have to install Data Masker somewhere?” and the answer I always have to give is… yes.

When we use SQL Provision to spin up our environments in non-Prod, generally we know they have to be in IaaS VMs or On-Prem, due to the nature of the technology (keep your eyes open on THIS because big changes coming soon *squeals in excitement*) but sometimes we’re working entirely with PaaS DBs and VMs don’t even come into our vocabulary – but we still need them to be masked for non-Prod use.

I have some data in an Azure SQL DB – a copy of DMDatabase (get it from my GitHub):

DM_CUSTOMER table in the DMDatabase

and I would like to get this masked before I create copies of it. Now I have written scripts in the past to make sure that Azure SQL DBs can be masked and then copied into non-Prod environments and you can get those scripts here, and you could easily wrap something like the below INTO a script like that – but much of the work I’ve done on this always involves having Data Masker installed somewhere and invoked on that machine – a VM, my laptop, whatever.

So, how do we avoid having to install Data Masker each time?

Data Masker for SQL Server does not (at time of writing) have publicly available a docker container or method for installing using Choco or something like that – once we do, trust me, I’ll be blogging about it A LOT as I will be very excited. But there is an install file available: https://download.red-gate.com/installers/DataMaskerforSQLServer/ and this might be enough.

The Process

Initially I created a Masking Set locally for the DMDatabase copy in Azure, it was nice and simple just masking a few of the fields on the CUSTOMER table:

It relies on SQL auth for the connection and I’m remembering the credentials, though these could be subbed in later on using the PARFILE (documentation on that here).

Next I put this masking set into a newly created Azure DevOps Git repo, which I cloned down onto my local machine – and then committed and pushed my changes up into Azure DevOps:

Now that this was all in ADO, it was time to set up a pipeline for it – so let’s jump into some YAML! Now, Data Masker currently needs to run on a Windows machine so we’ll set the pool to Windows-Latest:

trigger:
- main

pool:
  vmImage: windows-latest

The next step is to grab the installer – which I know I can easily do with PowerShell. I’m sure you could be more clever about this, but with limited time I chose the most recent version and hard coded that in to a PowerShell Invoke-WebRequest cmdlet:

- task: PowerShell@2
  inputs:
    targetType: 'inline'
    script: |
      $source = 'https://download.red-gate.com/installers/DataMaskerforSQLServer/2021-03-15/DataMaskerforSQLServer.exe'
      New-Item -ItemType directory -Path C:\download
      $destination = 'C:\download\DataMaskerforSQLServer.exe'
      Invoke-WebRequest -Uri $source -OutFile $destination
      New-Item -ItemType directory -Path C:\download\DMlogs
  displayName: 'Download Data Masker'

This PowerShell task is going to grab the most recent exe for Data Masker and pull it down into C:\download on the hosted VM we’re using for the pipeline and it’s also going to create a directory for the Data Masker logs as well (if you wanted to extend the YAML at the end to wrap these logs up and publish them as a result of the pipeline masking, then go for it and tweet me to let me know!)

Next we have to extract and install Data Masker from that download, which is fairly easy to do with a cmd call:

- task: CmdLine@2
  inputs:
    script: |
      "C:\download\DataMaskerforSQLServer.exe" products "Data Masker for SQL Server" log c:\download temp c:\download /IAgreeToTheEula RG_LICESE=%RGLICENSE%
  displayName: 'Install DMS Headlessly'

Note I’m using the guidance from this page, making sure to accept the EULA and I’m passing in my Redgate License as a variable, that I have specified for the pipeline and kept secret. This will put Data Masker in the default location in C:\Program Files\… and means we will then be able to call it. I do however need to make sure that this now works. So I saved my pipeline and ran it to see what happened:

Fabulous, that all works nicely. Now to pass the DMSMaskSet file to Data Masker and get it to run – ah but I forgot, I’m going to need a PARFILE as per the cmdline documentation that specifies where the files etc. are for the run. So I create my PARFILE.txt as such:

MASKINGSET=C:\download\DMDB_MaskingTime.DMSMaskSet
LOGFILEDIR=C:\download\DMlogs
DATASETSDIR=C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataSets 
REPORTSDIR=C:\download\DMlogs
INTERIM_REPORTS=false

As you can see, very simple. But to make sure these files are “simply” in the right folder, and because I don’t have time to explore how I could pass an environment variable into the .txt file itself, I’m going to add a quick file copy task to make sure that my masking set and PARFILE both make it into that location:

- task: CopyFiles@2
  inputs:
    SourceFolder: '$(Build.Repository.LocalPath)'
    Contents: '**'
    TargetFolder: 'C:\download'
  displayName: 'Copy files from working directory'

Of course now that we have Data Masker installed on the pipeline VM, the masking set AND the PARFILE… let’s get masking!

- task: CmdLine@2
  inputs:
    script: '"C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataMaskerCmdLine.exe" PARFILE=C:\download\PARFILE.txt'
  displayName: 'Run Data Masker'

I agree the command is now less than impressive given all the prep work, but when you run all of this in it’s entirety…

Big success! But let’s check the data of course and make sure it is as we expect:

It all works, Lynne has been masked to Muna (and the rest has been masked too!), and I didn’t need to have Data Masker installed on a VM in my environment, with an Azure DevOps self hosted agent to run it – I could just do it programmatically. #Winning.

I’ll put the full YAML below however some caveats:

  • You cannot run this pipeline a LOT in a short space of time, I found that out. The RedGate downloads page is not particularly designed for this process, so it should probably be run sparingly otherwise the pipeline times out because the server, by way of balancing, will prevent you from pulling the file too often.
  • You will need to update the PowerShell step reasonably often if you want the DMS most recent installer, or make it more futureproof to grab the latest version – I just didn’t investigate that.
  • Data Masker is dependent on the power of the machine it is running on in MANY ways and Azure DevOps pipeline VMs are not particularly the most powerful beasts in the world – so if you have a lot of masking that needs doing, I would be weary of this method and might stick to an Application Server VM you’ve got hanging around, just in case.
  • This is likely to change a lot in the future so may not be relevant when you’re reading it after a few months – so before implementing anything like this, if it’s been a few months, contact me or Redgate and just confirm that there isn’t some better way of doing this, if I haven’t already blogged it!

Thanks for stopping by and have a great week!

Full YAML:

trigger:
- main

pool:
  vmImage: windows-latest

steps:
- task: PowerShell@2
  inputs:
    targetType: 'inline'
    script: |
      $source = 'https://download.red-gate.com/installers/DataMaskerforSQLServer/2021-03-15/DataMaskerforSQLServer.exe'
      New-Item -ItemType directory -Path C:\download
      $destination = 'C:\download\DataMaskerforSQLServer.exe'
      Invoke-WebRequest -Uri $source -OutFile $destination
      New-Item -ItemType directory -Path C:\download\DMlogs
  displayName: 'Download Data Masker'

- task: CmdLine@2
  inputs:
    script: |
      "C:\download\DataMaskerforSQLServer.exe" products "Data Masker for SQL Server" log c:\download temp c:\download /IAgreeToTheEula RG_LICESE=%RGLICENSE%
  displayName: 'Install DMS Headlessly'

- task: CopyFiles@2
  inputs:
    SourceFolder: '$(Build.Repository.LocalPath)'
    Contents: '**'
    TargetFolder: 'C:\download'
  displayName: 'Copy files from working directory'

- task: CmdLine@2
  inputs:
    script: '"C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataMaskerCmdLine.exe" PARFILE=C:\download\PARFILE.txt'
  displayName: 'Run Data Masker'

Automated Dev Database Branch-Switching with AzureSQL, PowerShell and GitHooks

“Keep it simple, stupid!”
– My year 12 & 13 English Lit. Teacher

Recently I’ve been fascinated with something really cool. A couple of my colleagues at Redgate wrote a GitHook which allows you to easily switch branches using Redgate SQL Clone. You can see the hook here with full instructions – and I thought it was pretty neat.

But it got me thinking – I’ve posted a lot about when people are using just Azure SQL DBs (PaaS), about Masking and DB Change Automation, but when you’re using AzureSQL for Dev and Test DBs as well as Prod, you still don’t have the same agility one would expect from a local copy, like a clone.

But, the above GitHook leverages PowerShell (among some other fancy wizardry) so, what if we could do this exact same thing, using the PowerShell Az module to dynamically create and switch Azure SQL DBs in our own private resource groups every time we checkout a branch?

my hero academia wtf GIF by Funimation

I don’t imagine it would be fast because I’m restricted to using very low tier SQL DBs by my tiny allowance of (as Kendra Little calls them) “Azure Bucks”, but it should absolutely be possible!

So I decided to write a PowerShell script to do just that.

The first question I had to come up with an answer to was, how do I replace the Clone “Image” in this process, because I need something that is effectively a copy of our Production (or as near as possible) environment so we have something to base EVERY copy from – so I created the idea of a Golden Copy within the script; effectively this golden copy could be created by copying masking and copying back down from Production using something like Redgate Data Masker and my scripts here in GitHub but as a stop-gap, if it doesn’t find one in the Resource Group and Server you select, it will simply create one from your Dev DB. Best answer I could think of, you’re welcome to improve it!

All that remained was effectively to go through and just mimic the functionality of the Clone script but using Az: so if you are switching to a new branch where you don’t already have an existing Dev DB, then you get a new copy of Golden. If you’re switching to a branch you’ve checked out before, it renames the DBs to swap you back to the correct branch.

Here is an empty AzureSQL DB called DMDatabase_Dev:

When I now issue the git checkout “feature/newfeature” command it asks me to sign in to my Azure account:

and then gets to work:

And… that was it really.

I now have an Azure SQL DB called DMDatabase_Dev_master as I switched from the master branch, and I have a branch new DMDatabase_Dev DB that I can use for my featurebranch. You’ll notice I didn’t include -b in my git command, let’s assume a colleague is already working on this branch. I can now just update my copy (of my golden copy) with their work:

And we’re good to go!

But now if I switch back to my main branch, the object is gone and I can carry on with work on this branch:

It was really straightforward I can’t believe I haven’t seen this in use in more places, but hey guess what? The PowerShell is yours right here if you want it:

https://github.com/ChrisUnwin/PowerShell/blob/master/Demos/Redgate%20Demos/GitHookAzureSQL.ps1

The pre-requisites for it are:

  • You should have a Dev DB and you should update the values at the top of the script with the Dev DB name, server and resource group it is in
  • The script make reference to and creates a Golden copy DB so that you have something you should always be creating from, for consistency – so when you get started, create your own “Golden copy” back from Test/UAT or something if you can – maybe using the script mentioned above – it should be the name of your Dev DB appended with “_Golden”
  • If you want to change how it is authenticated so you don’t have to enter your credentials each time, then go for it – this was just the simplest method for me (and it’s currently 11:05pm so I’m going to bed!)

Feel free to improve it, I’m sure there are plenty improvements that can be made, but it’s a starter for 10 for anyone out there just getting started with development in Azure SQL. Plus it’s kinda neat!

3 methods for seeding test data during CI builds with Flyway

“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.
Sir Arthur Conan Doyle, Sherlock Holmes

EDIT: Modified 23/12/2020 to include updates to Flyway in v7, method 4 below

Can you tell I’m loving Flyway at the moment? Well I am. It’s JUST SO GOOD! Honestly there are so many things you can do with it! Don’t know what I’m talking about? Check out my posts on xRDBMS DevOps with Flyway and tSQLt unit tests with Flyway and you’ll see what I mean!

As a result of the above posts though I was asked a question that I had to think about for a little bit before having the best possible answer, how can we seed some testing data INTO the build database so that we can run some meaningful tests against it?

This makes perfect sense to me, but there’s also a few different ways to do this – so let’s go fly(way)!

flying i believe i can fly GIF

1 – Test Data Migration Scripts

In my previous posts on Flyway (above) I talked about having an entirely separate build folder present within the repository, and a folder of test migrations alongside our schema migrations – I called these the Build_Config folder, (containing the build configuration file) and the Test_Migrations folder (unsurprisingly containing testing migrations) in the _Migrations location:

I was using the same build config for 2 purposes; 1) to build the schema migrations from the base version, by passing it the Schema_Migrations location dynamically and 2) then building the tSQLt framework and testing objects by passing it the Test_Migrations location dynamically.

This actually worked surprisingly well, but even beyond this – the same method can be repurposed, or added to, by augmenting your testing scripts and adding a data insertion task (as an additional script or group of scripts). In my folder, I can simply add a migration like this:

Because of course I like dogs.

lana del rey yes GIF

and once pushed to the repository and the build has run we should be able to verify our testing data is present:

A bonus win for this step of course, is that where Devs have their own Flyway config files locally for their development databases they could also overwrite this behavior and point the testing and/or data scripts at their own database so they have some seed data to work with too!

2 – Add a data generation step to the pipeline

There are SO MANY technologies out on the inter-webs for generating data. SO MANY. Many of them also have a command line or PowerShell module that we can use to easily invoke them against a target, especially if that target is going to be persistent like my Flyway Azure SQL Build DBs!

Because I have access to it and because I’m using essentially SQL Server DBs, I could easily use Redgate SQL Data Generator – but to get the data you need you could use anything from DBATools Data Generation (also SQL Server) to FillDB for MySQL (which looks awesome and you could easily use this for Step 1 above too!)

There are numerous ways to invoke tools and applications and fortunately good CI/CD tools like Azure DevOps offer multiple ways to, for instance, run PowerShell or CLI steps from within the pipeline – so we could easily invoke SQL Data Generator on a VM or physical machine we have an Azure DevOps agent on – but this thinking also opens up the possibility of using something like Chocolatey to dynamically install the software on the Azure DevOps hosted pool VM during build (for the Redgate tools at the moment I suppose you’d need a Windows VM).

sassy pants chocolate GIF

I will be writing a future blog post about this step because it sounds _very_ interesting, but I’m not sure yet what can be done specifically using Chocolatey or if I’ll have to look elsewhere, although I have read this post in the past (thanks Paul!) detailing limitations and a great workaround using Azure DevOps, so it’s likely that’ll be my first port of call!

Just to give you an idea of end result with SQL Data Generator specifically though:

3 – Use existing data, don’t generate

Ok this one is going to be controversial already, I can tell! Let’s all stay calm!

happy chill GIF

The best data to be tested is our data. What we have in Production is what will have these changes deployed to it… eventually! So shouldn’t we just test against that? Well. Maybe, maybe not depending on what is in there.

There’s a few methods to achieve this – my personal favorite would be to use a SQL Clone, spin that up on a build VM rather than using an Azure SQL DB, and we can have all the data in an instant. Of course if we hold any sensitive PII/PHI then we should ensure that is protected first!

Of course there are lots of other options, like restoring a backup or spinning up a container etc. and these can all just be a stage in the YAML file before invoking Flyway but the point is, if we use an existing copy of our Prod database from some source or another, it will have 2 things we really care about:

  1. Data. Ready to go, ready to test, ready to give us the best possible insight into our changes.
  2. The flyway_schema_history table. Instead of running EVERY migration we’ve ever written, which could take a while for a large team, we run only the latest migrations to check that they would deploy happily to the Production target.

To get this stage to work though, you would need to do a couple of things differently:

  1. The build DB would have to be created from the clone/backup/other every time instead of simply cleaning the schema down.
  2. You would need to remove the Flyway Clean step from the pipeline in my previous post, because it would otherwise drop all the tables (and then we wouldn’t have any data!)
  3. By extension, this also makes the callback to remove the tSQLt objects void, so you can remove that too.

4 (Bonus Method) – Script Migrations

In Flyway v7 the team added the ability to also run script Migrations and Callbacks which mean it is possible to invoke .ps1, .bat, .cmd, .sh, .bash, and .py files as part of the version control > build and migration process.

This means that you can include a script to invoke any loading or processing of data you may prefer – you could invoke a data generation utility, data masking and of course anything else that can be invoked with these file formats. A good example of this might be calling Data Generator as above, or you could use DBATools, DTM Data Generator or even a more platform agnostic approach by using a Hazy generator to produce and then load an incredibly realistic data set.

Conclusion

There are a lot of different ways to generate data, you can generate completely synthetic data, you can mask data or use Prod data, it’s up to you! Ultimately it will just for another part of your pipeline – just be careful of ordering! You don’t want to try generating data into a table that hasn’t been built yet.

Respect your YAML file and you’ll get schema, data and unit tests and this will lead to one thing. Greater insight, earlier.

thumbs up GIF

xRDBMS Database Continuous Integration with Flyway, Azure DevOps and Docker… the simple way.

“Some people try to make everything complicated, be the person who tries to make everything simple.”
Dave Waters

Simplicity is in my blood. That’s not to say I am ‘simple’ in the sense I cannot grasp more than the most basic concepts, but more that I am likely to grasp more complex problems and solutions when they are phrased in simple ways.

This stems from my love of teaching others (on the rare occasion it falls to me to do so), where I find the moment that everything just ‘clicks’ and the realization comes over them to be possibly one of the most satisfying moments one can enjoy in life.

shocked star trek GIF

Now recently I’ve been enjoying getting my head around Flyway – an open source JDBC based migrations tool that brings the power of schema versioning and deployments together with the agility that developers need to focus on innovation in Development. There’s something about Flyway that just… ‘clicks’.

It doesn’t really matter what relational database you’re using; MySQL, IBM DB2, even SAP HANA! You can achieve at least the core tenants of database DevOps with this neat and simple little command line tool – there’s not even an installer, you just have to unzip!

Now I’ve had a lot of fun working with Flyway so far and, thanks to a few people (Kendra, Julia – i’m looking at you both!) I have been able to wrap my head around it to, I would say, a fair standard. Caveat on that – being a pure SQL person please don’t ask me about Java based migrations, I’m not quite there yet!! But there is one thing that I kept asking myself:

“When I’m talking to colleagues and customers about Database DevOps, I’m always talking about the benefits of continuous integration; building the database from scratch to ensure that everything builds and validates…” etc. etc. so why haven’t I really come across this with Flyway yet?

think tom hanks GIF by The Late Show With Stephen Colbert

Probably for a few reasons. You can include Flyway as a plugin in your Maven and Gradle configurations, so people writing java projects already get that benefit. It can easily form part Flyway itself by virtue is simply small incremental scripts and developers can go backwards and forwards however and as many times as they like with the Flyway Migrate, Undo and Clean commands, so is there really a need for a build? And most importantly, Flyway’s API just allows you to build it in. So naturally you’re building WITH the application.

But naturally when you’re putting your code with other people’s code, things have to be tested and verified, and I like to do this in isolation too – especially for databases that are decoupled from the application, or if you have a number of micro-service style databases you’d want to test all in parallel etc. it’s a great way to shift left. So I started asking myself if there was some way I could implement a CI build using Flyway in Azure DevOps, like I would any of the other database tooling I use on a regular basis? Below you’ll find the product of my tinkering, and a whole heap of help from Julia and Kendra, without whom I would still be figuring out what Baseline does!

Option 1) The simplest option – cmdline

Flyway can be called via the command line and it doesn’t get more simple than that.

You can pass any number of arguments and switches to Flyways command line, including specifying what config files it’s going to be using – which means that all you have to do, is unzip the Flyway components on a dedicated build server (VM or on-prem) and then, after refreshing the migrations available, invoke the command line using Azure DevOps pipelines (or another CI tool) to run Flyway with the commands against a database on the build server (or somewhere accessible to the build server) and Bingo!

No Idea Build GIF by Rooster Teeth

And that’s all there is to it! You get to verify that all of the migrations up to the very latest in your VCS will run, and even if you don’t have the VERY base version as a baseline migration, you can still start with a copy of the database – you could even use a Clone for that!

But yes, this does require somewhere for Flyway to exist prior to us running with our migrations… wouldn’t it be even easier if we could do it without even having to unzip Flyway first?

Option 2) Also simple, but very cool! Flyway with Docker

Did you know that Flyway has it’s own docker image? No? Well it does!* Not only that but we can map our own version controlled Migration scripts and Config files to the container so that, if it can point at a database, you sure as heck know it’s going to migrate to it!

*Not sure what the heck all of this Docker/Container stuff is? You’re not alone! Check out this great video on all things containers from The Simple Engineer!

This was the method I tried, and it all started with putting a migration into Version Control. Much like I did for my post on using SQL Change Automation with Azure SQL DB – I set up a repo in Azure DevOps, cloned it down to my local machine and I added a folder for the migrations:

Into this I proceeded to add my base script for creating the DMDatabase (the database I use for EVERYTHING, for which you can find the scripts here):

Once I had included my migration I did the standard

Git add .
Git commit -m "Here is some code"
Git push

and I had a basis from which to work.

Next step then was making sure I had a database to work with. Now the beauty of Flyway means that it can easily support 20+ RDBMS’ so I was like a child at a candy store! I didn’t know what to pick!

For pure ease and again, simplicity, I went for good ol’ SQL Server – or to be precise, I created an Azure SQL Database (at the basic tier too so it’s only costing £3 per month!):

Now here’s where it gets customizable. You don’t NEED to actually even pass in a whole config file to this process. Because the Flyway container is going to spin up everything that would come with an install of Flyway, you can pass it switches to override the default behavior specified in the config file. You can adapt this either by hard-coding strings or by using Environment Variables alongside the native switches – this means you could pass in everything you might need securely through Azure Pipeline’s own methods.

I, on the other hand, was incredibly lazy and decided to use the same config file I use for my Dev environment, but I swapped out the JDBC connection to instead be my Build database:

I think saved this new conf file in my local repo under a folder named Build Configuration – in case I want to add any logic later on to include in the build (like the tSQLt framework and tests! Hint Hint!)

This means that I would only need to specify 2 things as variables, the location of my SQL migrations, and the config file. So the next challenge was getting the docker container up and running, which fortunately it’s very easy to do in Azure Pipelines, here was the entirety of the YAML to run Flyway in a container (and do nothing with it yet):

trigger:
- master

pool:
  vmImage: 'ubuntu-latest'

steps:
- task: DockerInstaller@0
  inputs:
    dockerVersion: '17.09.0-ce'
  displayName: 'Install Docker'

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run flyway/flyway -v
  displayName: 'Run Flyway'

So, on any changes to the main branch we’ll be spinning up a Linux VM, grabbing Docker and firing up the Flyway container. That’s it. Simple.

So now I just have to pass in my config file, which is already in my ‘build config’ folder, and my migrations which are in my VCS root. To do this it was a case of mapping where Azure DevOps stores the files from Git during the build to the containers own mount location in which it expects to find the relevant conf and sql files. Fortunately Flyway and Docker have some pretty snazzy and super clear documentation on this – so it was a case of using:

-v [my sql files in vcs]:/flyway/sql

as part of the run – though I had to ensure I also cleaned the build environment first, otherwise it would just be like deploying to a regular database, and we want to make sure we can build from the ground up every single time! This lead to me having the following environment variables:

As, rather helpfully, all of our files from Git are copied to the working directory during the build and we can use the environment variable $(Build.Repository.LocalPath) to grab them! This lead to me updating my YAML to actually do some Flyway running when we spin up the container!

trigger:
- master

pool:
  vmImage: 'ubuntu-latest'

steps:
- task: DockerInstaller@0
  inputs:
    dockerVersion: '17.09.0-ce'
  displayName: 'Install Docker'

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(FLYWAY_LOCATIONS):/flyway/sql -v $(FLYWAY_CONFIG_FILES):/flyway/conf flyway/flyway clean -enterprise
  displayName: 'Clean build schema'

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(FLYWAY_LOCATIONS):/flyway/sql -v $(FLYWAY_CONFIG_FILES):/flyway/conf flyway/flyway migrate -enterprise
  displayName: 'Run flyway for schema'

Effectively, this will spin up the VM in ADO, download and install Docker, fire up the Flyway container and then 1) clean the target schema (my Azure SQL DB in this case) and 2) then migrate all of the migrations scripts in the repo up to the latest version – and this all seemed to work great!*

*Note: I have an enterprise Flyway licenses which enables loads of great features and support, different version comparisons can be found described here.

So now, whenever I add Flyway SQL migrations to my repo as part of a branch, I can create a PR, merge them back into Trunk and trigger an automatic build against my Flyway build DB in Azure SQL:

Conclusion

Getting up and running with Flyway is so very very easy, anyone can do it – it’s part of the beauty of the technology, but it turns out getting the build up and running too, when you’re not just embedding it directly within your application, is just as straightforward and it was a great learning curve for me!

The best part about this though – is that everything above can be achieved using pretty much any relational database management system you would like, either via the command line and a dedicated build server, or via the Docker container at build time. So get building!

ready lets go GIF