Moving from Redgate SQL Source Control pipelines to Flyway Desktop with Redgate Deploy

“Like all magnificent things, it’s very simple.”
Natalie Babbitt

There has been a lot of change over the years in the Redgate solutions – I hasten to add this is a good thing. Back in my day it was SQL Source Control to store your database in Version Control; at the time it was probably a 50/50 split between people who used Git and people who used other systems like SVN, TFVC (TFS/VSTS) and Vault or Mercurial etc. and you could then use DLM Automation to build and deploy this state-based database project to Test, Prod and so on.

SQL Source Control and DLM Automation (later SQL Change Automation) have formed the basis for many a pipeline for many many years, and they have been reliable, in some cases life changing for those who have used them… but the times, they are-a changing!

These technologies are still a great option and are still present in Redgate Deploy for those whom they work for, however with the rise of still further distributed computing topologies, and the dominance of cloud-hosted architecture and PaaS databases in todays world – something new is needed.

Enter Flyway Desktop.

As you’ve seen in some of my previous posts, Flyway Desktop is really really easy to get up and running with, not only that but it combines the State and Migrations models together creating one repo with ALL the benefits, and none of the deciding which model is best for you. It was architected from the ground up to be 3 things:

  • Ingeniously simple: to set up, to use, to everything.
  • Cloud ready: designed for use with IaaS and PaaS database options
  • A combination of the best of the best: all of the benefits of previous Redgate solutions, few to none of the drawbacks

...but what if you’re already using Redgate?

Yes Flyway Desktop and Redgate Deploy in general are super easy to get up and running with for new databases, even difficult, monolithic databases (thank you Clone as shadow!), but what about projects you already have under source control? Like I mentioned, SQL Source Control has been around for years and is beloved by many, and SQL Change Automation is still in use by thousands too. We want to maintain the history of our changes for reference, and we don’t want to simply disregard the whole pipeline. So the big question is how do we upgrade our state-based pipeline? Let’s find out together!

Note: This post is for people who want to or are interested in moving to a newer solution (and to give them an idea of what to expect) and in no way reflects any level of urgency you should be feeling – I’m certainly not pushing you to move any of your pipelines now, especially if you’re happy with what you have!

Setup

For starters I set up an end to end SQL Source Control and SQL Change Automation pipeline in Azure DevOps – my understanding of the approach I’m going to take is that this should work wherever your pipeline is (TeamCity & Octopus Deploy, Bamboo, whatever) so don’t feel that this post is not for you just because I used Azure DevOps.

I set up a copy of the DMDatabase on my local SQL Developer Instance, and then created an Azure DevOps repo and cloned it down to my machine:

I linked my database to the repo, created a filter to filter out users and committed it to my repo – then I set up the YAML for the build, and the Release steps for SQL Change Automation:

My SQL Source Control Project in Azure DevOps (Git)
The YAML to build my SQL Source Control Project
Release Steps in Azure DevOps
Deployment Steps

Everything seems to be deploying ok, I’ve even set up an Azure SQL Database as the target for my database changes. Now we have this SQL Source Control -> SQL Change Automation pipeline running, lets investigate replacing it.

SQL Source Control

The first thing I did was to open Flyway Desktop and create a new project – I pointed the project at my Dev DB and at the same local repo that I host my SQL Source Control files in:

and without committing the state to my schema-model folder, only linking to the Dev database, we end up with our repo looking like this:

I’m going to delete the Redgate.ssc file, because we’re no longer in SQL Source Control and I’m going to move every other file to the schema-model folder that is now under my project name (DMDatabase) – full on Copy Paste style:

…and then hit refresh in the Schema Model tab of Flyway Desktop:

and… nothing should happen. Absolutely nothing, because the state of your project, the Schema-Model folder should now exactly match the state of your development database (assuming you had everything committed to SQL Source Control!) – so now we come across to the version control tab aaaand…

WAIT!

If we commit now it will break our CI build, because when we trigger with a new push, my YAML will be expecting $(Pipeline.Workspace)/s/Database as the input, but now we have a slightly altered project we want to build a slightly different path. I’m going to temporarily disable my CI trigger in the YAML pipeline:

and now I’m going to Pull (to get the YAML file in my local repo) and then commit and push my changes:

Now I’m going to change my build YAML file to $(Pipeline.Workspace)/s/Database/DMDatabase/schema-model then save and re-enable Continuous Integration:

et voila!

SQL Change Automation sees it as a regular state based repo and builds and deploys it with no issues whatsoever:

and just like that! SQL Source Control is replaced – our teams can now pull down the latest copy of the Repo with the Flyway Desktop project in and open it. All they will need to do is re-specify their Dev Database Connection. If you are only using SQL Source Control or you’re using SQL Source Control with the SQL Compare GUI for more manual deployments currently then you’re done! When you want to extend your pipeline, you can read below.

SQL Change Automation

This is the step where we have to fundamentally change the way the pipeline works. It’s easy to switch across from a SQL Source Control to Flyway Desktop, which means we get immediate upgrades in speed, reliability and stability in our development process, especially where we’re working with Cloud-hosted databases.

With Redgate Deploy though, we’re fundamentally leveraging the Flyway command line capability for smooth, incremental deployments, and this is always a migrations only deployment – to move across to using Flyway then we’re going to need to make a few alterations to how the pipeline works.

First-things-first: We need some migrations, more specifically: THE migration. When you create a Flyway Desktop project usually you create a Baseline script. This script is the state of your Production environment(s), or a copy of them, and is used to basically be the starting point for your incremental migration scripts in the pipeline. The Baseline, once generated, is run against an empty database referred to in Flyway Desktop as the Shadow Database, although this can of course be a Clone too. Not every developer necessarily needs this – only the ones who will be generating the deployable artifacts, the migrations themselves, and putting them into source control, but they are definitely needed for deployments.

Note: I have some clients I’m working with who want every developer to affect schema changes and then immediately generate the migration for this and share with the team, but equally I have others who want 10 or so developers to share the responsibility of schema changes, and then once they’ve reviewed at the end of a sprint, they generate the Migration for the changes, source control it and approve it.

So in Flyway Desktop we set up our erasable database, our Shadow DB:

I use an empty database I stood up quickly in the Azure Portal:

and on the Generate Migrations tab I’m now prompted to create a baseline script:

I’m going to create the Baseline from my “Prod” environment that I’ve been using for my SQL Source Control deployments and hit baseline:

When you save and finish this will now run the baseline against the Shadow DB to recreate everything – and this is going to give you a chance to detect any changes you still have outstanding in the schema model – Flyway Desktop will compare the environments and detect any outstanding Dev changes, allowing you to also produce a migration for them.

Note: If your plan is to use this process to capture any outstanding code in a V002 “Delta” script to bring all environments back into line, you absolutely can but I would advise you to make the script idempotent – if you add all the necessary IF EXISTS statements for the deployment, you should be ok and it will only create or alter the objects that have to be, in order to sync all the environments up.

First Pull any pending changes from your repo then commit and push this into your Git remote:

and it should look a little like this:

Now for second-things-second, the build. This is actually going to be a very simple step, perhaps the easiest to change. We’re already using YAML, and as you know from previous posts it’s really very easy to leverage the Flyway command line as part of your YAML pipeline, so I’m going to simply swap out the SQL Change Automation build YAML with an updated version of the Flyway YAML from that post:

trigger:
- main

pool:
  vmImage: 'ubuntu-latest'
 
steps:
- task: DockerInstaller@0
  inputs:
    dockerVersion: '17.09.0-ce'
  displayName: 'Install Docker'
 
- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(locations):/flyway/sql flyway/flyway clean -url=$(JDBC) -user=$(userName) -password=$(password)
  displayName: 'Clean build schema'
 
- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(locations):/flyway/sql flyway/flyway migrate -url=$(JDBC) -user=$(userName) -password=$(password)
  displayName: 'Run flyway for build'

My password and username I shall hold back for the JDBC connection variable needs to be encapsulated in quotes, to prevent it being escaped or running partially because of the semi-colon:

jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build”

and the locations variable was my newly created migrations folder:

$(Pipeline.Workspace)/s/Database/DMDatabase/migrations

Fortunately these few changes mean that I now have a green build where I’m cleaning my Build DB and then building all of my files from there:

Deploying to Production is the only thing left. There’s a decision to be made here – because we’re just invoking the Flyway Docker Container, and we already have the YAML pipeline set up for the build we can:

  • As part of the build, zip up the migrations from the repo and publish them as an artifact, which we can then hand off to the Release portion of Azure DevOps, or indeed any other solution such as Octopus Deploy and run Flyway command line from there
  • OR we can simply expand out the YAML file – discard the “Release” pipeline and go FULL pipeline as code (which is also easier to audit changes on).

Given that we’re modernizing our deployment pipeline and introducing lean deployments of these incremental migration scripts, I’m opting for the latter, so I disable and archive my Release pipeline specifically and simply expand my YAML file with an additional step and an additional variable for the ProdJDBC instead of the Build DB:

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(locations):/flyway/sql flyway/flyway migrate -url=$(ProdJDBC) -user=$(userName) -password=$(password) -baselineOnMigrate=true -baselineVersion=001.20211210091210
  displayName: 'Deploy to Prod'

and of course in that YAML not forgetting the all important –baselineOnMigrate and –baselineVersion switches (which I’ve always been forgetting) – these are important because we’ll be marking the baseline script as deployed against our target and not actually running the baseline script – we don’t want to try to recreate all of the objects that already exist there.

This is the result:

Successful deployment to Prod, successful move to Flyway Desktop

Pre- and Post- Deployment Scripts

You might leverage pre- and post-deployment scripts in your SQL Source Control pipeline, something that has to happen each time before or after a deployment – if you want to maintain these in your new repo moving forwards you’ll need to make use of the Flyway callback functionality; take your pre-deployment scripts and turn them into a beforeMigrate callback and turn your post-deployment into an afterMigrate callback. These can sit in your migrations folder but:

  1. You may not need these now – because you have access to the migrations first deployment model, most changes can now be tailor-made to your deployment needs, such as injecting DML. statements in with your DDL scripts
  2. They will also run every time against your Shadow DB when you generate a new migration – just something to be aware of.

Final Word

It was much much easier than I thought it would be to move across, but I by no means believe that this will be as easy for everyone who needs or wants to move in the medium-long term. I am always an advocate of testing things out prior to setting them up in earnest, and would encourage you to try this workflow out for yourself first, perhaps in tandem with your SQL Source Control pipeline against a dummy Prod DB temporarily to see how comfortable your team is with the process, and to give yourself the time to ask the questions you might have.

3 simple pipelines for database development with Redgate Deploy – Part 3: CircleCI

“There is no place to reach.. only places to rest to carry on.”
Jaya Bhateja

SPOILER ALERT – This is part 3 of a 3 part series on enabling database deployments using Redgate Deploy, so if you have not read at least the Setup and Principles section of my previous post (Part 1 which you can find here, and if you’re interested Part 2 here for GitHub Actions) then I would strongly advise you do so! Thanks!

In my setup post we managed to get 3 Flyway Desktop repositories set up: 1 for each CICD system we’ll be using, and a number of Azure SQL Databases to use as “Dev“, “Build“, “PROD” etc. – I have never used CircleCI before so this will be a new experience as I try to figure it out at the same time as set up a database deployment pipeline… but just to recap the principles of what we’re trying to achieve:

Principles

I’m setting up 3 separate pipelines in this post which will all effectively do the same thing, but for different “Prod” copies of databases, however when building and deploying in practice you will have a number of tasks you will want to accomplish in and around the process itself (such as really useful things like Unit Tests, Code Analysis etc.). To keep things simple I will be creating a 6th Database – the “Build” database which will act as our CI validation step and our process for all 3 pipelines will be:

  • Invoking a Flyway Clean against the “Builddatabase – this step will remove every object on the database leaving it empty
  • Invoking a Flyway Migrate against the “Builddatabase – this step will build the database from scratch to validate our baseline script and any further migrations build successfully
  • Invoking a further Flyway Migrate against our respective “Prod” database, to deploy the latest scripts we have generated.

CircleCI

Ok I made my way into CircleCI and it was really easy to get up and running with (the free tier that is) and OHMYGOSH will you look at this sleek beauty:

So far so good – CircleCI seems to be even easier to understand so far than GitLab (and CONSIDERABLY easier than GitHub Actions) – I’m sure there are a lot of major differences (and GitLab was really easy to use) but I’m hoping for a similar experience here by the looks of it!

I create a new project pipeline where it asks me to select a repo for this “project”:

So I hit “Set UP Project” and then “build my own yml script” – now you would think this might just give me a blank script but no, just like GitLab they give us the option of a starter pipe:

I’m going to go ahead and choose the “Hello World” pipeline because normally that’s the easiest to cannibalize!

Much like GitLab it has an indicator to let us know whether our YAML is valid or not (I’m looking at YOU Azure DevOps!!!!) which is a massive help, and in general it’s just pretty easy to see what each step is doing. I built out an example YAML file using similar commands to my GitLab pipeline like so:

version: 2.1
parameters:
  ciJDBC:
    type: string
    default: jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build
  prodJDBC:
    type: string
    default: jdbc:sqlserver://dmproduction.database.windows.net:1433;database=DMDatabase_PROD_CircleCI
  userName:
    type: string
    default: username
  password:
    type: string
    default: password
  migrationPath:
    type: string
    default: .\

jobs:
  clean:
      docker:
        - image: flyway/flyway:latest-alpine
      steps:
        - checkout
        - run:
            name: "Clean Build Database"
            command: "flyway clean -url=${ciJDBC} -user=${userName} -password=${password} -locations=filesystem:${migrationPath}"

  build:
    docker:
      - image: flyway/flyway:latest-alpine
    steps:
      - checkout
      - run:
          name: "Migrate to Build Database"
          command: "flyway migrate -url=${ciJDBC} -user=${userName} -password=${password} -locations=filesystem:${migrationPath}"

  deploy:
    docker:
      - image: flyway/flyway:latest-alpine
    steps:
      - checkout
      - run:
          name: "Deploy to Prod"
          command: "flyway migrate -url=${prodJDBC} -user=${userName} -password=${password} -locations=filesystem:${migrationPath}"

workflows:
  database-deploy-workflow:
    jobs:
      - clean
      - build
      - deploy

and also used the variables reference from the CircleCI documentation which was pretty helpful. But it resulted in this:

Turns out I made a few boo-boos along the way. So variables I was passing in like this: ${Variable} but Circle only really seemed to like it when I used << pipeline.parameters.variable >> because I had defined it at the beginning of the YAML file under parameters.

I also had the jobs running in parallel because I hadn’t defined in my workflow which steps were dependent on which – a lesson I SHOULD really have remembered from GitHub… but oh well. I corrected that:

workflows:
  database-deploy-workflow:
    jobs:
      - clean
      - build:
          requires: 
            - clean
      - deploy:
          requires: 
            - build

Interestingly everything was still failing and although everything was being passed through correctly, the only thing that was ACTUALLY making it to the Flyway Docker container was the first part of the JDBC connection:

Guess what? I had my quote marks in the wrong place.

destroy GIF

It’s ok though because 2 other things failed:

  1. The Prod deploy failed because it found a non-empty schema, a problem I seem to fall over EVERY SINGLE TIME, but which is easily remedied by providing the 2 switches to the Prod deployment: -baselineOnMigrate=true and -baselineVersion=[YourBaselineScriptVersion]
  2. The filepath specified wasn’t a valid path:

Yes, before anyone tells me I know my top level repo folder is still called “GitLab-Flyway“, I figured that out in the last post and I’m still face-palming. So I’m going to quickly alter the Prod Flyway migrate command and then play around with the filesystem locations first to see if I can find a value it likes…

Oh. It just needed a “.”… neat. Well here is the finished YAML that seems to work a treat:

version: 2.1
parameters:
  ciJDBC:
    type: string
    default: "jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build"
  prodJDBC:
    type: string
    default: "jdbc:sqlserver://dmproduction.database.windows.net:1433;database=DMDatabase_PROD_CircleCI"
  userName:
    type: string
    default: "username"
  password:
    type: string
    default: "password"
  migrationPath:
    type: string
    default: "./GitLab-Flyway/migrations"

jobs:
  clean:
      docker:
        - image: flyway/flyway:latest-alpine
      steps:
        - checkout
        - run:
            name: "Clean Build Database"
            command: flyway clean -url="<< pipeline.parameters.ciJDBC >>" -user=<< pipeline.parameters.userName >> -password=<< pipeline.parameters.password >> -locations=filesystem:<< pipeline.parameters.migrationPath >>

  build:
    docker:
      - image: flyway/flyway:latest-alpine
    steps:
      - checkout
      - run:
          name: "Migrate to Build Database"
          command: flyway migrate -url="<< pipeline.parameters.ciJDBC >>" -user=<< pipeline.parameters.userName >> -password=<< pipeline.parameters.password >> -locations=filesystem:<< pipeline.parameters.migrationPath >>

  deploy:
    docker:
      - image: flyway/flyway:latest-alpine
    steps:
      - checkout
      - run:
          name: "Deploy to Prod"
          command: flyway migrate -url="<< pipeline.parameters.prodJDBC >>" -user=<< pipeline.parameters.userName >> -password=<< pipeline.parameters.password >> -locations=filesystem:<< pipeline.parameters.migrationPath >> -baselineOnMigrate=true -baselineVersion=001.20211130101136

workflows:
  database-deploy-workflow:
    jobs:
      - clean
      - build:
          requires: 
            - clean
      - deploy:
          requires: 
            - build

and we have ourselves one nice, lean CircleCI build and deployment pipeline:

Baseline Script successfully marked as Deployed, and 2nd migration successfully deployed as shown by Flyway_Schema_History table on DMDatabase_Prod_CircleCI

Conclusion

Was the purpose of these three blog posts for me to build 3 perfect pipelines, with impeccable secrets handling, automated testing, code analysis and all the best practices that mean they can all be rolled out into Production deployment pipelines tomorrow with no editing?

No. No way. Far from it.

But the purpose was to prove something else – that it can be done. This is the bare bones approach to enabling your database pipelines with Redgate Deploy and the Flyway Docker container in 3 different CICD systems; GitLab, GitHub and CircleCI and what we hoped to observe was that they can all in fact be used, with Redgate Deploy, to deploy schema changes to any of the supported RDBMS’.

That is indeed what we did. Happy Migrating!

Thank you to everyone who has stuck it out through all 3 parts, trust me, I did an awful lot of learning here myself and made COUNTLESS YAML mistakes – although I don’t class myself a Level 20 Warlock-slash-CICD-Pipeline-Guru it has been thoroughly interesting and I hope you managed to use the basis for these posts as success for your own pipelines! If you do – let me know, I love to hear from anyone who reads my posts!

3 simple pipelines for database development with Redgate Deploy – Part 2: GitHub Actions

“Finding new, relevant quotes for every blog post is hard”
Chris Unwin

SPOILER ALERT – This is part 2 of a 3 part series on enabling database deployments using Redgate Deploy, so if you have not read at least the Setup section of my previous post (Part 1 which you can find here) then I would strongly advise you do so! Thanks!

Welcome to another edition of “Chris stumbles around in the dark until eventually it starts working, but don’t question it because it might stop.” – today we’re taking on GitHub actions and unlike GitLab, I’ve never used Actions before. In my previous post we managed to get 3 repositories set up: 1 for each CICD system we’ll be using, and a number of Azure SQL Databases to use as “Dev“, “Build“, “PROD” etc. just to recap the principles of what we’re trying to achieve though:

Principles

I’m setting up 3 separate pipelines in this post which will all effectively do the same thing, but for different “Prod” copies of databases, however when building and deploying in practice you will have a number of tasks you will want to accomplish in and around the process itself (such as really useful things like Unit Tests, Code Analysis etc.). To keep things simple I will be creating a 6th Database – the “Build” database which will act as our CI validation step and our process for all 3 pipelines will be:

  • Invoking a Flyway Clean against the “Builddatabase – this step will remove every object on the database leaving it empty
  • Invoking a Flyway Migrate against the “Builddatabase – this step will build the database from scratch to validate our baseline script and any further migrations build successfully
  • Invoking a further Flyway Migrate against our respective “Prod” database, to deploy the latest scripts we have generated.

GitHub Actions

Like I said, this is the first time I’ve tried to use Actions all by myself and I’m not 100% sure where to start – I’m familiar with the GitHub interface OBVIOUSLY but at first glance I wouldn’t say it’s immediately obvious… well not to a n00b like me anyway! There’s one little “Actions button” in the center and if you miss it you’ll spend far too long not seeing the wood for the trees!

My GitHub repo with just the Flyway Desktop files in

Well anyway after some investigation it turns out that GitHub Actions is called that for a reason, because you build Actions into your YAML pipeline. Duh. Not sure how I missed that… but it kind of makes perfect sense and on top of that its quite similar in a way to Azure DevOps or Jenkins plugins really.

Unfortunately though there was slim pickins’ on the GitHub “Marketplace” for Flyway Actions, the closest one, created by joshuaavalon only migrates and was specifically put together for PostgreSQL deployments:

joshuaavalon’s Action usage example

So I did what any self-respecting person would do… I took the difficult route, forked Joshua’s repo and made the action generic, so that you could pass in whatever command you wanted to, plus changed it to use the latest Alpine version of the Flyway docker container. Plus I bumped the version number tag and fixed the docs that was using an old version of the Action, and by extension, Flyway container.

Forked repo in GitHub for the Flyway Action

2 things on this:

  1. I would assume that there will be an official Flyway Action at some point in the future, and there’s an argument to be made that using different Flyway commands should be different Actions in themselves — so as a result, I’m not going to publish this Action to the marketplace nor am I going to raise a PR for Joshua’s repo. But it was good to test out the theory.
  2. Thanks to Matt and Andrew here at RG for helping me out with a rather annoying problem involving invoking the correct version – this was solved by them and saved me banging my head against a brick wall for MANY MANY hours.

Anyway once I had the Action itself working (and it turns out it just needs to be sat in a public repo for you to invoke it, which is pretty neat – you don’t HAVE to publish if you don’t want/need to) it was time to move on to properly testing my YAML to build and deploy the contents of GitHub_Flyway. This was what I started off with:

name: CI

on:

  push:
    branches: [ trunk ]
  pull_request:
    branches: [ trunk ]

  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: ChrisUnwin/flyway-action@master
        with:
          url: jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build
          user: username
          password: password
          command: clean

…and everything seemed wonderful:

Successful “build” e.g. Flyway Clean in GitHub Actions
Console output of successful Flyway Clean

But then I tried to extend my YAML to also include the Build and Deploy components for the pipeline…

Build failure in GitHub Actions

I have 2 problems:

  1. It turns out it’s not looking at the correct filesystem location (e.g. “.\SQL doesn’t exist”), which is obvious, because I’m a numpty and forgot to change this to instead point at my “migrations” location instead. So I’ll fix that now.
  2. I forgot to include the Baseline on Migrate and Baseline Migration Version arguments like I did with GitLab. Flyway Desktop generates the baseline, mine being V001_20211130101136__baseline.sql so I need it to baseline (but not run) V001_20211130101136 when Flyway goes to deploy. Otherwise Flyway will, like I experienced, fail the deployment because there are existing objects in the PROD database… obviously.

The solution to #1 is SUPER easy fortunately but the solution to #2 has me thinking. If i add -baselineOnMigrate=true to the action, then it will baseline my Build database too, meaning it won’t run the baseline and my build will fail; so I will need to pass an additional argument to my YAML somehow… *sigh* back to the Action.

I also by luck happened to notice that all my actions steps were running at the same time – but thanks to this blog post from Bryan Hogan I was able to find the neat little “uses” command, so that fixed that issue….

So the final version of the Action was:

name: Flyway Pipeline Action
description: Migrate database in Github Action using latest Flyway Docker Container
branding:
  icon: database
inputs:
  url:
    description: The jdbc url to use to connect to the database
    required: true
  user:
    description: The user to use to connect to the database
    required: false
  password:
    description: The password to use to connect to the database
    required: false
  locations:
    description: Comma-separated list of locations to scan recursively for migrations
    required: true
    default: filesystem:./sql
  command:
    description: Which of the Flyway commands you would like Flyway to run
    required: true
    default: migrate
  disableclean:
    description: Would you like clean disabled (true or false)
    required: false
    default: false
  baselineonmigrate:
    description: Would you like to baseline on migrate (true or false)
    required: false
    default: false
  baselineversion:
    description: What is your baseline version
    required: false
    default: 0.0
runs:
  using: docker
  image: docker://flyway/flyway:latest-alpine
  env:
    FLYWAY_URL: ${{ inputs.url }}
    FLYWAY_USER: ${{ inputs.user }}
    FLYWAY_PASSWORD: ${{ inputs.password }}
    FLYWAY_LOCATIONS: ${{ inputs.locations }}
    FLYWAY_CLEAN_DISABLED: ${{ inputs.disableclean }}
    FLYWAY_BASELINE_ON_MIGRATE: ${{ inputs.baselineonmigrate }}
    FLYWAY_BASELINE_VERSION: ${{ inputs.baselineversion }}
  args:
    - ${{ inputs.command }} 

I just re-invented the wheel (which I know you shouldn’t do) but it was super easy – I added the environment variables for Flyway to provide the -baselineOnMigrate and -baselineVersion switches much like I had to in the last post.

I never thought I’d be so happy to see those little green dots:

This resulted from the following pipeline YAML:

name: CI

on:

  push:
    branches: [ trunk ]
  pull_request:
    branches: [ trunk ]

  workflow_dispatch:

jobs:
  clean-build-schema:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: ChrisUnwin/flyway-action@master
        with:
          url: jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build
          user: username
          password: password
          command: clean
          
  build:
    needs: clean-build-schema
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: ChrisUnwin/flyway-action@master
        with:
          url: jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build
          user: username
          password: password
          locations: /GitLab-Flyway/migrations
          command: migrate
      
  deploy-to-prod:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: ChrisUnwin/flyway-action@v4
        with:
          url: jdbc:sqlserver://dmproduction.database.windows.net:1433;database=DMDatabase_PROD_GitHub
          user: username
          password: password
          locations: /GitLab-Flyway/migrations
          command: migrate
          disableclean: false
          baselineonmigrate: true
          baselineversion: 001.20211130101136

Can we take a minute to appreciate the fact that I have only just now realized that I accidentally named the top level folder GitLab-Flyway across all 3 of my repos? Well, this pipeline has nothing to do with GitLab, but it works and that’s what I care about!:

DMDatabase_Prod_GitHub successfully migrated to V002

Conclusion

Getting my head around how actions work as part of the pipeline was definitely one of the hardest aspects to this challenge – however, given that yet again we can simply include the Docker container for Flyway as part of the pipeline, we can very quickly and easily get up and running with Flyway for our Redgate Deploy pipeline, you just need the right YAML for the job.

3 simple pipelines for database development with Redgate Deploy – Part 1: Setup & GitLab

Society must adapt to diamonds, diamonds don’t adapt to society.
Abhijit Naskar

The world is changed… I feel it in the water… I feel it in the earth… smell it in the air. On a totally unrelated note did you know December 2021 marks the 20th Anniversary of the Lord of the Rings films? Just in case you were looking for your reminder to go and watch those masterpieces again, this is that sign!

Seriously though – gone are the days when I would demonstrate database pipelines on 1 or two different technologies. Over the last 6 years I have walked people through database deployments using an array of CICD options; Jenkins, TeamCity, Octopus Deploy, Bamboo… and most recently I’ve spent most of my time on Azure DevOps. At times it can even feel like Azure DevOps is the only solution you’ll need, but increasingly it’s becoming obvious that isn’t the case and there are new, shiny providers who offer some amazing experiences and awesome functionality.

Now seems like the best time to explore 3 of the ones I’m coming across more and more – CircleCI, GitLab and GitHub Actions.

The interesting part of this is that I genuinely believe that this will be incredibly easy. Maybe I’m naïve but from the looks of all 3 they seem straight-forward, understandable… and of course I’ll be using Flyway in my pipeline, which is the easiest, cross-platform friendly solution to use for this.

Note: I will assume you have some familiarity with Flyway in this post, if you don’t read more about the capabilities of Redgate Deploy here.

The Setup

For this “challenge” (if I can call it that) I’m going to be using Flyway Desktop installed on my Windows laptop, GitHub as my Version Control system and 5 Azure SQL Databases: 2 for “Dev” & “Dev_Shadow” (from which I will generate 3 independent repos) and 3 environments for PROD_GitLab, PROD_CircleCI and PROD_GitHub respectively. The structure of the database will be the DMDatabase, unsurprisingly the database I use for pretty much everything I do on this blog.

Note: Everything I’m doing today uses SQL Server (well… Azure SQL Database) however everything here is cross OS – you can set up similar pipelines for everything from Oracle to PostgreSQL to CockroachDB if you would like!

5 Databases ready to go – as shown in the Azure Portal

Fortunately CTRL+C, CTRL+V exists, so I’ll only have to setup once and then I’ll just copy the files across into the other two repos; I set up a new private Repo in my GitHub specifically for GitLab but you could easily repeat these steps below separately for GitHub Actions or CircleCI:

GitLab repo in GitHub

I clone this down onto my Windows machine using Git Bash and then linked to and created my Flyway Desktop project (don’t know how? Try this!):

Link the development database and the shadow, generate the Schema Model and the Baseline Migration from DMDatabase_PROD_GitLab (I just grab the relevant JDBC connectors from the Azure Portalthis makes it much easier!), don’t forget to specify the list of Schemas, I did and it ain’t pretty (but is an easy enough fix)

Then I commit and push the schema model files and the baseline migration up into GitHub:

For good measure I also changed the DM_CUSTOMER table on the Dev environment and generated a new schema-model and migration change so I know what is going to be deployed to my “Prod” environments as part of this test:

Then after committing and pushing to my repo, I copied all of the files over to my GitHub and CircleCI repositories too:

A quick check of my other repos and everything seems good to go!

Principles

I’m setting up 3 separate pipelines in this post which will all effectively do the same thing, but for different “Prod” copies of databases, however when building and deploying in practice you will have a number of tasks you will want to accomplish in and around the process itself (such as really useful things like Unit Tests, Code Analysis etc.). To keep things simple I will be creating a 6th Database – the “Build” database which will act as our CI validation step and our process for all 3 pipelines will be:

  • Invoking a Flyway Clean against the “Builddatabase – this step will remove every object on the database leaving it empty
  • Invoking a Flyway Migrate against the “Builddatabase – this step will build the database from scratch to validate our baseline script and any further migrations build successfully
  • Invoking a further Flyway Migrate against our respective “Prod” database, to deploy the latest scripts we have generated.

GitLab

After following the Setup instructions above, in GitLab I need to create a New Project and I want it to Build/Deploy from my GitHub repo, so I pick “Run CI/CD for external repository

Fortunately it’s very easy to connect directly from GitHub, but you will have to generate a Personal Authentication Token which you can do by going to https://github.com/settings/tokens and then authorizing the main repo you want to build from – for me this is GitLab_Flyway:

Painless! From here I select CI/CD template and because I’m starting from scratch I’m going to use the starter 3 stage template:

It has a rather neat layout and is pretty darn easy to get up and running with:

I may have tried several combinations to get the Flyway Docker container up and running but essentially the code I ended up running for my pipeline was:

stages:          # List of stages for jobs, and their order of execution
  - build
  - deploy

variables:
    userName: "MyUserName"
    password: "MyPassword"
    prodJDBC: "jdbc:[TheJDBCConnectionToTheProdDBYoureUsing]"
    ciJDBC: "jdbc:[TheJDBCConnectionToTheBuildDBYoureUsing]"
    migrationPath: $CI_PROJECT_DIR

build-job:       # This job runs in the build stage, which runs first.
  image:
    name: flyway/flyway:latest-alpine
    entrypoint: [""]
  stage: build
  script:
    - flyway clean -url="$ciJDBC" -user="$userName" -password="$password" -locations="filesystem:$migrationPath"
    - flyway migrate -url="$ciJDBC" -user="$userName" -password="$password" -locations="filesystem:$migrationPath"

deploy-job:      # This job runs in the deploy stage.
  image:
    name: flyway/flyway:latest-alpine
    entrypoint: [""]
  stage: deploy  # It only runs when *both* jobs in the test stage complete successfully.
  script:
    - flyway migrate -url="$prodJDBC" -user="$userName" -password="$password" -locations="filesystem:$migrationPath" -baselineOnMigrate=true -baselineVersion="MyBaselineVersion"

It was actually quite easy to spot where I had failed in previous runs and GitLab breaks things up quite nicely for us to see:

Some interesting things I noted using this setup:

  • Because we are deploying up stream to an environment that hasn’t been deployed to with Flyway before you have to pass in the -baselineOnMigrate switch, what was interesting though is that I also needed to specify the -baselineVersion, otherwise it tried to baseline V1, which of course did not exist as my baseline was named something completely different (V001_DateTimeStamp_blahlablah)
  • The entrypoint is specified as it is because it drops you right at the Flyway executable so you can issue the Flyway commands – without this it doesn’t work
  • You can ABSOLUTELY pass your variables in (like *cough* password and username) in a much more secure way through GitHub using variables, but this was a great start for me
  • To pass in the files using a hosted repo, I had to use the environment variable $CI_PROJECT_DIR and that’s where the repo is checked out to, where your migrations are

But it ultimately ended up in what I was expecting – the database was migrated using the Flyway command steps:

Conclusion

Is it possible to setup a nice easy pipeline from Dev -> Prod with Redgate Deploy and GitLab? Yes, absolutely it is, and you can build out the pipeline in whatever fashion you want. Thankfully, the Docker container makes things much much easier!

Now, let’s see how we get on with GitHub Actions!

SQL Data Catalog, Data Masker and your DevOps pipeline: How do I make sure everything is being masked?

“However fast regulation moves, technology moves faster. Especially as far as data is concerned.”
Elizabeth Denham

You’re probably sick of me constantly talking about how the cataloging of columns should be part of the DevOps upstream deployment process. I’ve blogged about it. I’ve even produced a video demonstrating this in action. But one question that this always throws up is:

If we include cataloging in the upstream process, how do we make sure our masking sets are also staying up to date?

The benefits of including the classifications in the upstream pipeline is that nothing ever gets to Production that hasn’t been classified – we constantly have a perfectly up to date idea of the nature of our structured data estate, how data is distributed, what risk is associated with which systems? etc. but one of the biggest wins is that we constantly know which fields need to be masked when we’re pulling copies back into non-Production, constantly.

Add a table? Add columns? We know about them, they’re classified, they’re deployed… so now they need to be masked on our next refresh. But how? Well it all depends on which approach we want to take:

  • Automated
  • Manual

Doesn’t it always boil down to those options? What I mean is that either we have an appetite to completely generate our masking set afresh every single time based on our classifications (Automated) or do we want to ensure that we configure each rule ourselves (Manual). They both have benefits and drawbacks.

Automated

How: We can generate a masking set using the SQL Data Catalog PowerShell each and every time as part of our pipeline. Add a column, tag it in the pipeline and then simply wait for the necessary rulesets to be generated in the pipeline (perhaps as part of your build) for you to run when you bring a copy back. Check out my walkthrough for how to set this up.

Benefits: The process is automatic. Its headless and you don’t need to think about it at all. As long as the classifications are provided (and if you follow the steps from the blog post and video you should be providing them) then you’re always generating rules for every classified column.

Drawbacks: This process can be fragile. If we don’t classify correctly we can end up masking in the wrong way or trying to mask the wrong field (e.g. a Primary Key, Constraint, Identity etc.) which can cause masking failures and then you have to spend time fixing the pipeline/masking set. This also means that the nature of the masking is dependent on your classifications, and the values you will get masked into the columns will be less realistic as a result (i.e. you can’t generate Row-Internal Sync Rules using the integration).

Drawback Mitigation: To avoid the process breaking, be sure to really focus on how you set up your API settings / how you pick which categories and tags are used to generate the masking rules (like I discussed here). This will at least help you make sure you map common data sets into columns (and don’t hit columns that have constraints or keys).

Only columns marked with Static Masking as the Treatment Intent will get a rule created for them
An Information Type is given to every column we intend on masking – these are then mapped to templates in masker to ensure more realistic data

Manual

How: Either rely on developers to check in masking set changes along side their code changes or build in a manual intervention step to your upstream process to ensure that someone opens and updates the schema and rules within the masking set, and then put this back into the pipeline.

Benefits: This results in more accurate, more likely to run sets that generate significantly more reliable and realistic data as an end result. Masked DB copies can be more easily used for anything beyond simple development changes, including analytics etc. You can include any specialist rules you need and apply your own understanding and knowledge of the database.

Drawbacks: This is obviously a manual process – less than ideal. Anything that involves a human can invariably go wrong because we are humans and we make mistakes (which we learn from of course). This also takes significantly more time as part of the process.

Drawback Mitigation: This is a harder one to mitigate as we’re reliant on manual intervention, however the best way to check this could be with another team member checking (maybe as part of a pull request) or you could include an automated PowerShell script to effectively rationalize the columns to be masked from Catalog vs the columns currently in the Data Masker masking set – this would help you understand if all the necessary columns have been updated or if any were missed. I have included an example of this PowerShell below.

### CHANGE THIS ###

$MaskingSet = "yourmaskingset.DMSMaskSet" # Your masking set including the DMSMaskSet file extension
$instance = "yourinstance" # The Instance as it is shown in Data Catalog that hosts the database
$DatabaseName = "yourdatabase" # The DB you want classification info for
$CatalogServer="http://yourmachine:15156" # The lcoation of your catalog server, ending on :15156
$authToken="redacted" # Your Data Catalog Auth token from the Settings page
$tagName = "Static Masking" # The tag you're using to identify which columns need to be masked

### DONT CHANGE THIS ###

Invoke-WebRequest -Uri "$CatalogServer/powershell" -OutFile 'data-catalog.psm1' -Headers @{"Authorization"="Bearer $authToken"}
Import-Module .\data-catalog.psm1 -Force
Connect-SqlDataCatalog -ServerUrl $CatalogServer -AuthToken $authToken 
$ColumnsMarkedForMasking = Get-ClassificationColumn `
    -InstanceName $instance `
    -DatabaseName $DatabaseName | Where-Object {$_.tags.name -eq $tagName} 
$MaskingSetXML = [xml](Get-Content -Path $MaskingSet)
$subrules = $MaskingSetXML.SelectNodes('//DMSSetContainer_MaskingSet/DMSSetContainer/DMSRuleBindingList/RuleSubstitution')
$internalrules = $MaskingSetXML.SelectNodes('//DMSSetContainer_MaskingSet/DMSSetContainer/DMSRuleBindingList/RuleRowInternal')
$shufflerules = $MaskingSetXML.SelectNodes('//DMSSetContainer_MaskingSet/DMSSetContainer/DMSRuleBindingList/RuleShuffle')
$searchreplacerules = $MaskingSetXML.SelectNodes('//DMSSetContainer_MaskingSet/DMSSetContainer/DMSRuleBindingList/RuleSearchReplace')
$TablesAndColumns = @()

$subrules | ForEach-Object {`
    $CurrentTable = $_.TargetTableName.value
    $_.DMSPickedColumnAndDataSetCollection.DMSPickedColumnAndDataSet.N2KSQLServerEntity_PickedColumn.ColumnName.value | ForEach-Object {$TablesAndColumns+= $CurrentTable + "." + $_ }
}

$internalrules | ForEach-Object {`
    $TablesAndColumns+= $_.TargetTableName.value + "." + $_.TargetColumnName.value
}

$shufflerules | ForEach-Object {`
    $CurrentTable = $_.TargetTableName.value
    $_.DMSPickedColumnCollection.DMSPickedColumn.N2KSQLServerEntity_PickedColumn.ColumnName.value | ForEach-Object {$TablesAndColumns+= $CurrentTable + "." + $_ }
}

$searchreplacerules | ForEach-Object {`
    $TablesAndColumns+= $_.TargetTableName.value + "." + $_.TargetColumnName.value
}

$result = $TablesAndColumns | Sort -Unique
$ColumnsNeedingRules = $ColumnsMarkedForMasking | Where-Object {($_.tableName + "." + $_.columnName) -notin $result}

"`nThere are " + $ColumnsMarkedForMasking.count + " columns that require masking for database " + $DatabaseName + "in SQL Data Catalog."
"You are masking " + $result.count + " distinct columns in masking set: " + $MaskingSet
"`nThe columns that do not currently have a mask configured are:`n"

$ColumnsNeedingRules | ForEach-Object {$_.tableName + "." + $_.columnName + "     (" + $_.dataType + ")"}

$next = Read-Host -Prompt "`nWould you like to see the columns currently in your masking set? (Y/N)"
if ($next -in ("Y", "y")) {$result}

This code can also be found on my GitHub here.

The output of running the script – 13 columns in the masking set, 14 columns outstanding to be masked

You can technically even use this same check approach for the automated masking set generation option, to ensure that everything has been tagged correctly.

PASS Data Community Summit 2021

“Education is the kindling of a flame, not the filling of a vessel.”
Socrates

I will be speaking at PASS Data Community Summit 2021

I have spoken at previous PASS Summits; both through the virtue of working for Redgate, and off my own back through dedication and passion to the subject matter I speak about: Data Privacy and Protection.

In 2018 I stood on stage with Microsoft to speak about the nature of Static Data Masking, how it differs from Dynamic Masking and what challenges need to be considered for a successful static masking rollout.

In 2019 I stood on stage alone to talk about creating a strategy for masking non-Production environments, including a walkthrough of the dbatools.io masking functionality utilized alongside Azure SQL Database classifications. PASS Summit 2019 was also when Kendra Little encouraged me to set up this blog, for which I’m forever grateful.

In 2020… well. You know what happened.

In 2021 Summit sees a new lease of life. Data Community Summit will be entirely online (no surprises there) but one big surprise you might not know is that it is completely free to attend. Never before will there have been SUCH a swathe of incredible speakers, with such a huge variety of topics and learning pathways for free and available on demand afterwards.

The dates for your diary? November 8-12, 2021

As it happens, I will also be speaking about setting up an end to end deployment pipeline using the Flyway Community Edition, Azure SQL Database and Azure DevOps it would be great to see you but with so much on offer I could absolutely understand if you watched on catch up!

You can see all the speakers here, but here’s a short list of some oft he sessions I will definitely be tuning in to!

  • Erin Stellato – Demystifying Statistics in SQL Server
  • Grant Fritchey – Identify Poorly Performing Queries – Three Tools You Already Own
  • Tracy Boggiano – Azure SQL Fundamentals
  • Angela Tidwell – Azure Devops Dashboards EZ as pie-charts!
  • Indira Bandari – Getting started with Python for Data professionals
  • Jess Pomfret – Azure SQL & Our Toolbox To Manage It
  • Taiob Ali – Think like the Cardinality Estimator
  • Neil Hambly – Azure Notebooks – Data Science fundamentals
  • and many more!

So please go check it out & register, support the community and do a bunch of learning in the process – it will be amazing to see you there and hopefully I’ll even get to see some of you in person in the not-so-far future!

“But I don’t wanna INSTALL it!”: Data Masker on the fly in Azure DevOps (with an Azure SQL DB)

“There is always a way to go if you look for it.”
Ernest A. Fitzgerald

As many of you know, I really enjoy talking about Data Masking. I fundamentally believe it is an absolutely ESSENTIAL part of Test Data Management and specifically the provisioning of Pre-Production environments. If you hold sensitive PII/PHI/PCI in your Production environments, you have no excuses for porting any of that back into Dev and Test.

I also stand firmly behind the belief (and it is just that, my belief) that masking is much safer than anything you can achieve with encryption or limiting access alone. Static masking, when done correctly, means that even if all other security measures are bypassed, or we accidentally expose data somehow, it doesn’t matter because the PII/PHI has been wiped, but it is still fundamentally useful for development and testing environments.

I’m also a huge proponent of using classification and data masking as part of a DevOps process, and often you’ll find me using Azure DevOps to actually kick off my masking process – but one of the most frequent questions I get is “do I have to install Data Masker somewhere?” and the answer I always have to give is… yes.

When we use SQL Provision to spin up our environments in non-Prod, generally we know they have to be in IaaS VMs or On-Prem, due to the nature of the technology (keep your eyes open on THIS because big changes coming soon *squeals in excitement*) but sometimes we’re working entirely with PaaS DBs and VMs don’t even come into our vocabulary – but we still need them to be masked for non-Prod use.

I have some data in an Azure SQL DB – a copy of DMDatabase (get it from my GitHub):

DM_CUSTOMER table in the DMDatabase

and I would like to get this masked before I create copies of it. Now I have written scripts in the past to make sure that Azure SQL DBs can be masked and then copied into non-Prod environments and you can get those scripts here, and you could easily wrap something like the below INTO a script like that – but much of the work I’ve done on this always involves having Data Masker installed somewhere and invoked on that machine – a VM, my laptop, whatever.

So, how do we avoid having to install Data Masker each time?

Data Masker for SQL Server does not (at time of writing) have publicly available a docker container or method for installing using Choco or something like that – once we do, trust me, I’ll be blogging about it A LOT as I will be very excited. But there is an install file available: https://download.red-gate.com/installers/DataMaskerforSQLServer/ and this might be enough.

The Process

Initially I created a Masking Set locally for the DMDatabase copy in Azure, it was nice and simple just masking a few of the fields on the CUSTOMER table:

It relies on SQL auth for the connection and I’m remembering the credentials, though these could be subbed in later on using the PARFILE (documentation on that here).

Next I put this masking set into a newly created Azure DevOps Git repo, which I cloned down onto my local machine – and then committed and pushed my changes up into Azure DevOps:

Now that this was all in ADO, it was time to set up a pipeline for it – so let’s jump into some YAML! Now, Data Masker currently needs to run on a Windows machine so we’ll set the pool to Windows-Latest:

trigger:
- main

pool:
  vmImage: windows-latest

The next step is to grab the installer – which I know I can easily do with PowerShell. I’m sure you could be more clever about this, but with limited time I chose the most recent version and hard coded that in to a PowerShell Invoke-WebRequest cmdlet:

- task: PowerShell@2
  inputs:
    targetType: 'inline'
    script: |
      $source = 'https://download.red-gate.com/installers/DataMaskerforSQLServer/2021-03-15/DataMaskerforSQLServer.exe'
      New-Item -ItemType directory -Path C:\download
      $destination = 'C:\download\DataMaskerforSQLServer.exe'
      Invoke-WebRequest -Uri $source -OutFile $destination
      New-Item -ItemType directory -Path C:\download\DMlogs
  displayName: 'Download Data Masker'

This PowerShell task is going to grab the most recent exe for Data Masker and pull it down into C:\download on the hosted VM we’re using for the pipeline and it’s also going to create a directory for the Data Masker logs as well (if you wanted to extend the YAML at the end to wrap these logs up and publish them as a result of the pipeline masking, then go for it and tweet me to let me know!)

Next we have to extract and install Data Masker from that download, which is fairly easy to do with a cmd call:

- task: CmdLine@2
  inputs:
    script: |
      "C:\download\DataMaskerforSQLServer.exe" products "Data Masker for SQL Server" log c:\download temp c:\download /IAgreeToTheEula RG_LICESE=%RGLICENSE%
  displayName: 'Install DMS Headlessly'

Note I’m using the guidance from this page, making sure to accept the EULA and I’m passing in my Redgate License as a variable, that I have specified for the pipeline and kept secret. This will put Data Masker in the default location in C:\Program Files\… and means we will then be able to call it. I do however need to make sure that this now works. So I saved my pipeline and ran it to see what happened:

Fabulous, that all works nicely. Now to pass the DMSMaskSet file to Data Masker and get it to run – ah but I forgot, I’m going to need a PARFILE as per the cmdline documentation that specifies where the files etc. are for the run. So I create my PARFILE.txt as such:

MASKINGSET=C:\download\DMDB_MaskingTime.DMSMaskSet
LOGFILEDIR=C:\download\DMlogs
DATASETSDIR=C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataSets 
REPORTSDIR=C:\download\DMlogs
INTERIM_REPORTS=false

As you can see, very simple. But to make sure these files are “simply” in the right folder, and because I don’t have time to explore how I could pass an environment variable into the .txt file itself, I’m going to add a quick file copy task to make sure that my masking set and PARFILE both make it into that location:

- task: CopyFiles@2
  inputs:
    SourceFolder: '$(Build.Repository.LocalPath)'
    Contents: '**'
    TargetFolder: 'C:\download'
  displayName: 'Copy files from working directory'

Of course now that we have Data Masker installed on the pipeline VM, the masking set AND the PARFILE… let’s get masking!

- task: CmdLine@2
  inputs:
    script: '"C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataMaskerCmdLine.exe" PARFILE=C:\download\PARFILE.txt'
  displayName: 'Run Data Masker'

I agree the command is now less than impressive given all the prep work, but when you run all of this in it’s entirety…

Big success! But let’s check the data of course and make sure it is as we expect:

It all works, Lynne has been masked to Muna (and the rest has been masked too!), and I didn’t need to have Data Masker installed on a VM in my environment, with an Azure DevOps self hosted agent to run it – I could just do it programmatically. #Winning.

I’ll put the full YAML below however some caveats:

  • You cannot run this pipeline a LOT in a short space of time, I found that out. The RedGate downloads page is not particularly designed for this process, so it should probably be run sparingly otherwise the pipeline times out because the server, by way of balancing, will prevent you from pulling the file too often.
  • You will need to update the PowerShell step reasonably often if you want the DMS most recent installer, or make it more futureproof to grab the latest version – I just didn’t investigate that.
  • Data Masker is dependent on the power of the machine it is running on in MANY ways and Azure DevOps pipeline VMs are not particularly the most powerful beasts in the world – so if you have a lot of masking that needs doing, I would be weary of this method and might stick to an Application Server VM you’ve got hanging around, just in case.
  • This is likely to change a lot in the future so may not be relevant when you’re reading it after a few months – so before implementing anything like this, if it’s been a few months, contact me or Redgate and just confirm that there isn’t some better way of doing this, if I haven’t already blogged it!

Thanks for stopping by and have a great week!

Full YAML:

trigger:
- main

pool:
  vmImage: windows-latest

steps:
- task: PowerShell@2
  inputs:
    targetType: 'inline'
    script: |
      $source = 'https://download.red-gate.com/installers/DataMaskerforSQLServer/2021-03-15/DataMaskerforSQLServer.exe'
      New-Item -ItemType directory -Path C:\download
      $destination = 'C:\download\DataMaskerforSQLServer.exe'
      Invoke-WebRequest -Uri $source -OutFile $destination
      New-Item -ItemType directory -Path C:\download\DMlogs
  displayName: 'Download Data Masker'

- task: CmdLine@2
  inputs:
    script: |
      "C:\download\DataMaskerforSQLServer.exe" products "Data Masker for SQL Server" log c:\download temp c:\download /IAgreeToTheEula RG_LICESE=%RGLICENSE%
  displayName: 'Install DMS Headlessly'

- task: CopyFiles@2
  inputs:
    SourceFolder: '$(Build.Repository.LocalPath)'
    Contents: '**'
    TargetFolder: 'C:\download'
  displayName: 'Copy files from working directory'

- task: CmdLine@2
  inputs:
    script: '"C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataMaskerCmdLine.exe" PARFILE=C:\download\PARFILE.txt'
  displayName: 'Run Data Masker'

Moving teams and users between multiple SQL Clone Servers

“Portability should be the default.”
Larry Wall

Greetings one-and-all! It’s been a long time hasn’t it?

Well – I’ll be honest. I’ve not been great… I’ve been struggling. With the pressures of everything going on; from National Lockdown 3: The Lockdown Strikes Back, to working hard, and trying to lose weight it’s all been a bit too much – but that’s not really any excuse for neglecting y’all is it?

As we go further into 2021 I can’t promise that you’ll see LOADS more here, but I’m determined to bring you something, and so long as it helps 1 person out there, I’ll consider it a success.

Well I was asked a question yesterday I wasn’t really sure of the answer to, but is definitely a nice-to-know and I thought I would share it with all of you.

“If I have multiple SQL Clone Servers, or even just a new SQL Clone Server, how do I move the teams and permissions from one to the other without manually setting everything up again?”

Quite the head-scratcher indeed!

Emoji Emotion GIF by Anne Arundel Community College

Well fortunately we know where to start – in SQL Clone every action, every change is recorded by virtue of a back-end SQL Server DB called the SQL Clone Config DB (you may remember this from such classics as Using things weirdly part 1!) and this holds all of the information we might need to move things over.

Let’s start by taking a look at the schema in the DB:

There is already fairly comprehensive documentation on moving the whole SQL Clone Config DB and even the SQL Clone Server to another instance here, but we ONLY care about users and permissions for this particular run, so we’ll focus on that.

The schema is reasonably straight forward and well-named thankfully which means we can isolate the tables we’ll need for the move. Anything that mentions Teams or Principles (Users) is generally going to be in our remit. I took a reasonably fresh install of SQL Clone and added a few users and teams (I’ve been watching a lot of Sword Art Online lately, so you might sense a trend with these teams…

Now when we investigate some of the tables in the config DB we’ll notice that everything is really nicely normalized and mapped together:

But what does this look like in a fresh install with NO teams defined?

Wait… no teams defined/ activated but there’s… a team. The default team.

aaron paul what GIF by Breaking Bad

It turns out once you hit the “Enable Teams” option in the SQL Clone Dashboard, that entry magically goes away though and it looks more like we’re expecting:

Now we know the lay of the land – how do we move our users? Well to answer that I’m going to work on 3 assumptions here:

  • I’m going to stick to just these 3 tables (Principles, Teams and the mappings). There’s a wider question around Images and Clones etc. that we’ve added for those teams and any overlap that may occur, but we’re working on the belief that the SQL Clone Server we’re moving to is segregated and independent (so will not have access to the same images), so it only needs the users and teams.
  • The target SQL Clone Config DB we’re moving these to is brand new, and does not already have any users/permissions or teams set up except clicking the “enable teams” button in the dash.
  • The same user has set both SQL Clone Servers up, and is therefore position 1 in the dbo.SqlClonePrinciples table, however you could easily overwrite this one value manually if needed or use a local copy with the same #1 user to generate the script initially E.g. Chris.Unwin is ID1 in both the Source and Target Config DBs below, and then just add in the missing user manually to the target config DB.

1. SQL Data Compare – my recommended

SQL Data Compare is our friend here. It knows all about constraints and identities and anything we may come across. This is a fairly small scale need however if you open it up and simply compare those 3 tables we’ve been looking at, it should be able to pick up the users and permissions to insert, and generate us a script quickly and easily we can run against the target Config DB:

I’m going to generate the SQL Script and then run it against my target – notice how Data Compare takes care of any issues we might face with constraints and identity inserts to make sure everything 100% ties up? Good isn’t it!?

This was my script:

SET NUMERIC_ROUNDABORT OFF
GO
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT, QUOTED_IDENTIFIER, ANSI_NULLS, NOCOUNT ON
GO
SET DATEFORMAT YMD
GO
SET XACT_ABORT ON
GO
SET TRANSACTION ISOLATION LEVEL Serializable
GO
BEGIN TRANSACTION

PRINT(N'Drop constraints from [dbo].[TeamSqlClonePrincipalMappings]')
ALTER TABLE [dbo].[TeamSqlClonePrincipalMappings] NOCHECK CONSTRAINT [FK_TeamSqlClonePrincipalMappings_SqlClonePrincipals_SqlClonePrincipalId]
ALTER TABLE [dbo].[TeamSqlClonePrincipalMappings] NOCHECK CONSTRAINT [FK_TeamSqlClonePrincipalMappings_Teams_TeamId]

PRINT(N'Drop constraint FK_TeamImageMappings_Teams_TeamId from [dbo].[TeamImageMappings]')
ALTER TABLE [dbo].[TeamImageMappings] NOCHECK CONSTRAINT [FK_TeamImageMappings_Teams_TeamId]

PRINT(N'Drop constraint FK_TeamInstanceMappings_Teams_TeamId from [dbo].[TeamInstanceMappings]')
ALTER TABLE [dbo].[TeamInstanceMappings] NOCHECK CONSTRAINT [FK_TeamInstanceMappings_Teams_TeamId]

PRINT(N'Add rows to [dbo].[SqlClonePrincipals]')
SET IDENTITY_INSERT [dbo].[SqlClonePrincipals] ON
INSERT INTO [dbo].[SqlClonePrincipals] ([Id], [FriendlyName], [Role], [Sid], [IsGroup]) VALUES (2, N'RED-GATE\Chris.Kerswell', 2, 0x010500000000000515000000fd43461e19525f12828ba62895680000, 0)
INSERT INTO [dbo].[SqlClonePrincipals] ([Id], [FriendlyName], [Role], [Sid], [IsGroup]) VALUES (3, N'RED-GATE\David.Ong', 3, 0x010500000000000515000000fd43461e19525f12828ba628cb7b0000, 0)
SET IDENTITY_INSERT [dbo].[SqlClonePrincipals] OFF
PRINT(N'Operation applied to 2 rows out of 2')

PRINT(N'Add rows to [dbo].[Teams]')
SET IDENTITY_INSERT [dbo].[Teams] ON
INSERT INTO [dbo].[Teams] ([Id], [Name], [Color], [IncludeAllPrincipals], [IncludeAllImages], [IncludeAllInstances]) VALUES (2, N'TeamKirito', N'#373737', 0, 0, 0)
INSERT INTO [dbo].[Teams] ([Id], [Name], [Color], [IncludeAllPrincipals], [IncludeAllImages], [IncludeAllInstances]) VALUES (3, N'TeamAsuna', N'#fc9003', 0, 1, 1)
SET IDENTITY_INSERT [dbo].[Teams] OFF
PRINT(N'Operation applied to 2 rows out of 2')

PRINT(N'Add rows to [dbo].[TeamSqlClonePrincipalMappings]')
INSERT INTO [dbo].[TeamSqlClonePrincipalMappings] ([TeamId], [SqlClonePrincipalId]) VALUES (2, 3)
INSERT INTO [dbo].[TeamSqlClonePrincipalMappings] ([TeamId], [SqlClonePrincipalId]) VALUES (3, 2)
PRINT(N'Operation applied to 2 rows out of 2')

PRINT(N'Add constraints to [dbo].[TeamSqlClonePrincipalMappings]')
ALTER TABLE [dbo].[TeamSqlClonePrincipalMappings] WITH CHECK CHECK CONSTRAINT [FK_TeamSqlClonePrincipalMappings_SqlClonePrincipals_SqlClonePrincipalId]
ALTER TABLE [dbo].[TeamSqlClonePrincipalMappings] WITH CHECK CHECK CONSTRAINT [FK_TeamSqlClonePrincipalMappings_Teams_TeamId]
ALTER TABLE [dbo].[TeamImageMappings] WITH CHECK CHECK CONSTRAINT [FK_TeamImageMappings_Teams_TeamId]
ALTER TABLE [dbo].[TeamInstanceMappings] WITH CHECK CHECK CONSTRAINT [FK_TeamInstanceMappings_Teams_TeamId]
COMMIT TRANSACTION
GO

and success! Everything has been replicated as we expect:

I also tried this on a VM that does not have the same AD access as I do locally and the users were imported successfully, but because they can’t be resolved they don’t show in the Clone dash so it doesn’t cause us any dead or invalid relationship issues.

2. & 3. Script from one DB to the other or SSIS load

Another simple-ish option would be to write the INSERT INTO SELECT FROM statement, or create a very simple SSIS package if you can connect to both – you would have to take care of the Constraints and Identity Inserts yourself, but it would be equally effective:

Important Note(s)

You will have to manually tie up any differences if you have already started using the new install with Teams and Permissions as THAT will get MUCH more complicated, especially if you have already assigned images and instances and users to teams – so this process is best done at the very start of life for the new SQL Clone Server.

Also. You will have to restart the SQL Clone Server Service after loading the users and teams in, as SQL Clone won’t recognize they’re there straight away.

Conclusion

This is going to be a rare occasion for most – the SQL Clone Server is not designed nor intended to deal with being duplicated many times. Although there is a relatively easy method for MOVING SQL Clone Servers simply moving a part of the Config like Users & Teams is definitely possible, but is not 100% fool-proof, so on said rare occasion, just be very, very careful!

3 RDBMS’, 3 models, 3 end-to-end deployment pipelines with Azure DevOps and Redgate Deploy

“Choice is the most powerful tool we have. Everything boils down to choice. Every choice we make shuts an infinite number of doors and opens an infinite number of doors.”
– Lori Deschene (https://tinybuddha.com/)

PLEASE NOTE (edit 18/12/2021): Two of the three (and soon the third) models below have been superceded by Flyway Desktop – please see my posts on Flyway Desktop setup (first here) and shadow database setup here to setup any workflows past today’s date, the below should only be used for legacy purposes or with Flyway Teams for PostgreSQL.

Picking a Set-Up

One of the hardest parts of my job is that at any moments notice we could be asked to walk through better database change management processes. That’s not the challenge, the problem is that it could be with any kind of tech stack. I might need a Git Repo of some shape or form (Azure DevOps, plain ol’ Git, Bitbucket etc.) and then a CI server of some kind (Azure DevOps, GitLab, TeamCity, Bamboo etc.) and finally something to handle releases (Azure DevOps, Octopus Deploy, Bamboo etc.) – this is fairly easy to reproduce in multiple combinations with automation, terraform etc. but when you’re actually helping someone set it up – you’ve got to know where all the bits go.

The Redgate tools work with all of these options and combinations so making sure we’re setting everything up right usually means questions about the Repo/CI/CD tooling people choose.

The commonality above and the one I run into the most for all 3 stages, is Azure DevOps. Its straightforward to understand, all in the same place and just plain fun to use (AND it supports emojis ^_^).

Finally now, we have to pick a Relational Database Management System (RDBMS) to use – Redgate Deploy is one of the newest offerings from Redgate and it comprises capabilities for “Database DevOps” across MS SQL Server, Oracle Database and 18 (well actually 19 now thanks to Flyway v7!) other RDBMs‘! So instead of choosing, I’m going to pick the two key ones there, and one of the 18 others: MSSQL, Oracle DB and PostgreSQL.

One final question I had to ask of myself was what models I wanted to use. There are a couple of choices available within the Redgate solution, specifically for MSSQL and Oracle at the moment, so I decided that I would do State based deployments for Oracle and Hybrid deployments for MSSQL, given that PostgreSQL will have to be migrations anyway. Fear not though, the setup is not hugely dissimilar when it comes to the actual pipelines!

Setting up Azure DevOps Repos

This stage was relatively easy – I simply created 3 new projects in my DefaultCollection where I’m going to put the repos for each of the DBs.

and then I created 3 readme files, and cloned all 3 git repos down onto my machine as local repos:

and we’re ready to go!

A quick note: I’m using a mixture of Azure DevOps hosted (for PostgreSQL) and Azure DevOps Server locally installed on my Virtual Machine (for MSSQL/Oracle) with a local agent present to run everything below – you can adopt this methodology or you can use the hosted version, but for the Oracle solution below at least you will need a local agent available (unless you use the DockerHub Image for Schema/Data Compare).

Microsoft SQL Server

The first thing I need to do for all of these is to pick the databases I’ll be working on – for me I’m rather lucky as our demonstration environment has a rather nifty set of databases for me to choose from!

I’m going with SQL Source Control (the MSSQL State component in Redgate Deploy) and SQL Change Automation (the MSSQL Migrations component) both plugged into Management Studio (SSMS) with a set of databases called the ScaryDBA_Dev/Test/Prod environments (which I used SQL Clone to create the copies of), in homage to the wonderful Grant Fritchey.

So the first thing we need to do is get Dev under source control – we’ve refreshed back from Prod so there shouldn’t be any differences and we’re using the Hybrid model, so we’ll need to create the State first. I do this by going to SQL Source Control in SSMS, and linking my DB to Git, creating a State Folder in the top level of my local repo as I do so:

Then once linked I go ahead and source control the initial schema (not sure how? Watch the Redgate University videos here):

Next I setup my Migrations project using SQL Change Automation, creating the Migrations folder in the same top level of my local repo, but now instead of pointing to the database, I’m pointing to my SQL Source Control generated State folder:

Now at this point we get the options to choose filters and comparison options – I would recommend if you’re not sure speak to someone at Redgate or look up the documentation – I often see people wanting to filter out Security/Users/Roles at this stage so it might be worth a look! I just carried on as I only have a few objects anyway!

Connect to the target and create a baseline script (i.e. what does Prod look like now?) again, because I have a minimal setup I’ll go straight from my “Prod” database:

Commit and push and we’re on our way – everything is in version control:

Now i may have cheated by doing MSSQL first – because now actually building and deploying the project is pretty straight forward – much like I have done in previous posts here and here I just used the SQL Change Automation plugins from the Azure DevOps marketplace to first build:

and then deploy the project:

and it all succeeded… the 2nd time around when I remembered to specify which DB I was deploying to!

Oracle Database

The first thing I need to do for all of these is to pick the schemas I’ll be working on… wait, Deja Vu! – well once again I have a little set of schemas present on the demonstration machine that will serve me just fine!

Because we’re working in the State setup, out of Redgate Deploy I’m going to use Source Control for Oracle which allows me to specify the remote repo, the folder to create and even the fact I’m using Azure DevOps Git:

(Step 1 was simply providing the connection details to my Oracle Database, hence why I was on step 2!) – I select the Schema I’ll be putting in Source Control and even get a nifty run down of the structure:

Hit next and give a name to the Project (unsurprisingly I went with HR) and then check in all of your initial objects:

Now one thing that you may have noticed if you’re following along that I should clarify (and which I forgot when setting up this blog post):

  1. You don’t need to specify the local repo you cloned down because Source Control for Oracle handles this itself in the back end, if you want it to be part of a local repo with other code in it, use the Working Folder instead
  2. If you are using Git and NOT the working folder, committing will also Push your objects to the remote – you’ve been warned!

As above, I now head over to Pipelines and hit Create New Pipeline! I check out my repo with the schema objects in it, and add a job to my agent. But what am I going to pick? Well unlike SQL Change Automation there’s not a plugin available on the Azure DevOps Marketplace, we’ll need some good old fashioned command line calls!

First, let’s clean out the CI Schema, I’m going to use the script to remove all objects from the Redgate documentation site and make a call to run the script using sqlplus (I’m storing the file locally but you could even include it in your repo under a build folder maybe?)

echo on
Call exit | sqlplus hr/[passwordredacted]@//localhost:1521/CI @C:\DemoFiles\DropAllObjects.sql
echo off

Next we’ll add a call to the cmdline of Schema Compare for Oracle to build the database from our repo, using the files that were checked out by the agent (an Azure DevOps pre-defined environment variable) – again we’re using a similar script from the Redgate DevOps for Oracle site but because we’re deploying ALL objects from version control, we don’t really want a report per say, this is just to test the schema can be built from the ground up:

"C:\Program Files\Red Gate\Schema Compare for Oracle 5\sco.exe" /deploy /source $(Build.SourcesDirectory)\Schema{HR} /target SYSTEM/[passwordredacted]@localhost:1521/CI{HR} AS SYSDBA /indirect 

echo Build database from state:%ERRORLEVEL%
 
rem IF ERRORLEVEL is 0 then there are no changes.
IF %ERRORLEVEL% EQU 0 (
    echo ========================================================================================================
    echo == Warning - No schema changes detected. == echo ========================================================================================================
)
 
rem IF ERRORLEVEL is 61 there are differences, which we expect.
IF %ERRORLEVEL% EQU 61 (
    echo ========================================================================================================
    echo == Objects were found and built. ==
    echo ========================================================================================================
    rem Reset the ERRORLEVEL to 0 so the build doesn't fail 
    SET ERRORLEVEL=0
)

and assuming this all works, we’ll package up the files into a zip and publish them as an artifact so we can consume them at the release stage!

and guess what? It all just worked *cough* on build #23 when I got the syntax right finally…

Of course we can add additional stages to the build as well, such as a check for Invalid Objects and some Unit Testing, but I’ll keep this pretty lean for now!

Now, just like we did for MSSQL we’re going to set up a new deployment pipeline, grab the artifact we’re publishing from the build, enable a CD trigger and we’re going to deploy to, in this case, Acceptance.

Let’s first create a job on the agent to unpack the zip file and see how far we get – I’m just going to dump them in a DeploymentState folder in the working directory:

and… awww thanks Azure DevOps, I needed to hear that!

and now we add yet another command line task, but this one is just going to do a comparison, it’s not actually going to deploy anything – because we’re going to add a manual intervention step to approve the deployment first! I had a little help again from the Redgate docs for this one, because I keep having to catch cmdline error codes – if I was wise like Alex Yates I probably would have just handled this with PowerShell…

echo off
rem  We generate the deployment preview script artifact here
"C:\Program Files\Red Gate\Schema Compare for Oracle 5\sco.exe" /abortonwarnings:high /b:hdre /i:sdwgvac /source $(System.DefaultWorkingDirectory)\DeploymentState\Schema{HR} /target SYSTEM/Redgate1@localhost:1521/Acceptance{HR} AS SYSDBA /indirect /report:$(System.DefaultWorkingDirectory)\DeploymentState\changes_report.html /scriptfile:$(System.DefaultWorkingDirectory)\DeploymentState\deployment_script.sql > $(System.DefaultWorkingDirectory)\DeploymentState\Warnings.txt

echo Warnings exit code:%ERRORLEVEL%
rem In the unlikely event that the exit code is 63, this mean that a deployment warning has exceeded the allowable threshold (eg, data loss may have been detected)
rem If this occurs it is recommended to review the script, customize it, and perform a manual deployment
 
IF %ERRORLEVEL% EQU 0 (
    echo ========================================================================================================
    echo == No schema changes to deploy
    echo ========================================================================================================

    GOTO END
)
 
IF %ERRORLEVEL% EQU 63 (
    echo ========================================================================================================
    echo == High Severity Warnings Detected! Aborting the build. 
    echo == Review the deployment script and consider deploying manually.
    echo ========================================================================================================
    rem Aborting deployment because high severity warnings were detected
        SET ERRORLEVEL=1
    GOTO END
)
 
rem This is the happy path where we've identified changes and not detected any high warnings
IF %ERRORLEVEL% EQU 61 (
    echo ========================================================================================================
    echo == Schema changes found to deploy - generating deployment script for review
    echo ========================================================================================================
    rem Set ERROLEVEL to 0 so the build job doesn't fail
	SET ERRORLEVEL=0
    GOTO END
)
 
:END
EXIT /B %ERRORLEVEL%

I then throw in an agentless job (Manual Intervention Step) and then finally (once I have reviewed the deployment report that is produced) one further cmdline call to actually run the deployment script again my Acceptance target:

echo on
Call exit | sqlplus hr/[passwordRedacted]@//localhost:1521/Acceptance @$(System.DefaultWorkingDirectory)/DeploymentState\deployment_script.sql
echo off

I have saved my pipeline, now it’s time to test. So I’m going to make a very quick change (so that something is produced) and see what happens…

Boom. Pipeline done.

One word on this though – I haven’t included an awful lot of frills (error handling, checks, NuGet instead of Zip etc.) so you’re free to bulk this out how you see fit, but by golly it works! Also make sure you tick this on the second Agent Job, else it’ll wipe out your working directory – something that obviously definitely did not happen to me…

PostgreSQL

This one might be cheating a little. As you know I’ve already setup a CI pipeline with Flyway before, using Azure SQL DBs and the Flyway Docker container as part of the build, and in some cases even tSQLt for Unit Testing too! But this is PostgreSQL, and this is a new blog post, darn it!

Still getting your head around Flyway? Check out the Redgate University videos!

I started out by creating myself a PostgreSQL 10 server in the Azure Portal, because:

  • I can
  • I didn’t want a local install of PostgreSQL
  • I’m not self sabotaging

and I set up a Dev and Test database on it – that is once I remembered to allow my client IP address *sigh* and then connected from Azure Data Studio:

I already have some basic scripts from my last demo that I can use – so I pulled down the latest version of Flyway (V7) and unzipped it into my files:

Then I created a SQL folder in my local repository for the PostgreSQLPipeline (and popped a couple of migrations in – I’m using the StackOverflow scripts, adapted for PostgreSQL from Kendra Little’s GitHub, thank you Kendra!) – in the previous posts we’ve had to source control the state or initial baseline of the database, however as we’re using Flyway for PostgreSQL this requires us to create and name/order the migrations ourselves, so we have plenty of control over that – hence why we can jump straight into building some scripts this time around.

Finally, I pointed the config file for Flyway to that, also taking the opportunity to point it at my Dev DB using the PostgreSQL JDBC:

Now i didn’t really NEED to do this step and try things out against Dev, because I already have the scripts, so I could have just started building the pipeline – but it’s always worthwhile getting local validation first by running things against Dev and then migrating up!

A quick Flyway Info later and we were good to go – the scripts are recognized so we know we’ve set everything up correctly.

One git add / commit / push and everything is in my repo:

Now as you may know from my other post we can do 1 of 2 things here – we can now either build what we eventually push to the repo using a cmdline call (like we did with the Oracle build) to a machine where we have Flyway installed, or we can use the Docker image.

I’m actually going to use Docker again but this time, instead of specifying the various credentials in a config file that was getting passed to the container, I’m actually going to use Azure DevOps environment variables and build the connection string that way – it’s really easy to keep the variables secret in Pipelines, so I can pass my JDBC connection, complete with Username and Password, as well as my Flyway license key, without worrying someone might get hold of them!

I’m actually going to build against a live PostgreSQL database before deploying, so I also created another DB for me to use: demodb_ci

I actually stole the YAML from my previous pipeline (below) and updated the variables accordingly:

trigger:
- master
 
pool:
  vmImage: 'ubuntu-latest'
 
steps:
- task: DockerInstaller@0
  inputs:
    dockerVersion: '17.09.0-ce'
  displayName: 'Install Docker'

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run --rm -v $(FLYWAY_LOCATIONS):/flyway/sql flyway/flyway clean -url=$(JDBC) -licenseKey=$(licenseKey) -user=$(userName) -password=$(password) -enterprise 
  displayName: 'Clean build schema'
 
- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run --rm -v $(FLYWAY_LOCATIONS):/flyway/sql flyway/flyway migrate -url=$(JDBC) -licenseKey=$(licenseKey) -user=$(userName) -password=$(password) -enterprise 
  displayName: 'Run flyway build'

and it ran just fine! Well actually it failed first, because I didn’t have permissions from the IP address that the container was running from, but fortunately Azure has a handy switch in the PostgreSQL Server settings to simply allow Azure Services traffic through the firewall:

Once that was sorted, the first stage (as always) is to download Docker and then we have 2 Flyway containers steps:

1 – Clean the schema and make sure the database is empty
2 – Migrate the schema changes

Then we have two options – we could do like we did in the Oracle pipeline and zip up the files, spitting them out at Release stage and consuming them, either calling Flyway from the command line, or we can go ahead and promote our deployment using the same pipeline.

I’m lazy, so I’m going for the latter!

In a normal “production like” situation I would probably take the opportunity to test and check etc. like I did above, but let’s keep this super lean – if the build works, I trust the deployment. Lets go ahead and deploy to Production – I’ll add this as an additional task in my YAML:

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run --rm -v $(FLYWAY_LOCATIONS):/flyway/sql flyway/flyway migrate -url=$(ProdJDBC) -licenseKey=$(licenseKey) -user=$(userName) -password=$(password) -enterprise 
  displayName: 'Promote to Production'

And the deployment was successful! Phew – I think I’ve earned a cup of tea!

Conclusion

In this blog post I have demonstrated 3 different (and initially very simple*) approaches to the source control and deployment of database changes – but there’s actually a much wider combination we could have adopted – all 3 models with MSSQL, all 3 models with Oracle, and Migrations for up to 18 other systems like DB2, Snowflake and even SAP HANA! But what did I need to do ALL of this? A single solutionRedgate Deploy**.

Thank you for stopping by! Have an amazing week!

*There is a lot missing from the code I have provided, like additional error handling, tests etc. and all of the above CAN be improved – but did we manage to build and deploy across three different systems all using Azure DevOps? Yes we did. If you intend on using any of the above, please ensure you build in the necessary controls and process around it and always pick what is best for you and your team.

**Redgate Deploy is going from strength to strength, expect to see a wide range of improvements made over the coming months – I won’t be surprised if this blog post is already out of date by the time I finish writing it – that’s how awesome the teams working on all of this are!

Using things weirdly – Part 3: Moving from State to Hybrid-ish Source Control with Microsoft SSDT and Redgate SQL Change Automation

“It works on MY machine”
Everyone

Some time back I blogged “which database source control model works well for you” and it has come in SO handy when explaining the different models to people – but the one question I get asked not infrequently, is:

Is there any way that I can use Redgate SQL Change Automation with Visual Studio based SSDT?

It’s always a really difficult question to answer because fundamentally SQL Source Control (Redgate’s state based tool) and SSDT (Microsoft’s state based tool) functionally seek to do the same thing, making them competitors. However there are, on the odd occasion, good reasons as to why I am asked the question and one of those same scenarios came up today:

  • Our developers work in Visual Studio and have already been using SSDT for a long time, it works for them, we just want to extend it with Migrations to handle complex changes.

So the option here is, leave it as it is, or try to work with both. Not always going to be my first choice but it got me thinking.

Starting from a memory

A few months ago, when life was “normal” and I was working in my office with *gulp* PEOPLE, I tried to make this scenario work by simply linking SQL Change Automation to the project folder created by SSDT but unfortunately it was riddled with problems. The SSDT importer and repo look like this:

And if you point SQL Change Automation at the local repo with this in it will correctly tell you:

Now of course this wasn’t unexpected. It’s not designed to work this way, is it? No. But way back then I did figure out, shrewdly, that if I used SQL Source Control to carry out an initial commit just to a working folder, it would generate a RedGate.ssc and RedGateDatabaseInfo.xml file and you can copy them into the SSDT repo to trick SQL Change Automation into thinking that it’s a SQL Source Control repo… unfortunately this trick no longer works. Sad.

sad a christmas story GIF

Add a hop and a step

But what got me thinking today was the context with which the question was asked. It was more about separation of duties. Once the developers have effectively done their job and delivered the change into the repo, their job was effectively done! “That’s how it should look moving forwards. What’s next?” – and then I had an idea.

Given that SSDT allows you to push and pull the code and apply it to your own database, what is stopping us from using SQL Change Automation to pick up on the changes against the database we sync our changes to from our SSDT project?

Genius. Evil genius.

So I created a new Database to simulate having another developer on my instance and gave it to Peter Parker:

You can then do a schema compare to another DB from your project, effectively PULL down changes from the remote to your local repository, and then sync them back up to your local development DB; this is how Devs stay up to date with each other but could, in this methodology, be how DBAs or senior developers pull down the changes to their local DB, where they test the new state, and then generate a new migration from it.

So I made a change on my dev database and captured it in the project right click on the project name > schema compare > dev db compare to project > update) and then committed and pushed:

and sure enough my repository was updated:

But then I simulated pulling down the change and applying it to Peter Parker’s DB (again using Schema Compare) and then I created a SQL Change Automation project in VS, in the same solution but pointing the project to a migrations folder in the repo:

Yes I accidentally called the project Database1 don’t remind me I’m embarrassed enough!

Then I added my baseline database:

It created the baseline and the project immediately with no issues and picked up on the changes I had made using SSDT:

and I was able to commit my project and changes into my repository in Azure DevOps:

It was just that easy! Now what this means for the development process is that developers _could_ feasibly work with SSDT, as they are comfortable with it, and then more senior members of the team can generate migration scripts from there, building the database from scratch and deploying in a reliable, repeatable fashion.

Just to prove to you my build even ran green from this:

So in summary what this gives us is the ability to adapt a regular SSDT workflow, one that developers are comfortable with and which has been in the team for months or years, add in the knowledge of DBAs or team leads, a greater separation of duties for high risk schema changes, and the control and flexibility (and peace of mind) that comes with a migrations based deployment process.

Nice.

The fine print

I’m sure by now you’ve realized something: this is not, nor will it ever (I believe) be a supported workflow. If you implement the above in a production sense for something other than just testing then it’s not something you’ll be able to get help with from one of the Redgate engineers if you need to troubleshoot.

Also, if you’re going to introduce a sequence of changes like this to achieve the hybrid model, it does make more sense that you implement SQL Source Control for the state side (given that it’s right there in the SQL Toolbelt with SQL Change Automation anyway).

But IS IT POSSIBLE to achieve a similar, Visual Studio based* hybrid workflow with SSDT and SQL Change Automation by using a database to ‘hop’ the changes across?

Yes, it certainly looks that way!

*If you’re planning on using SSDT in Azure Data Studio too then this workflow could also work for you, SQL Change Automation is present in SSMS and VS so it’s really up to you!