Moving from Redgate SQL Source Control pipelines to Flyway Desktop with Redgate Deploy

“Like all magnificent things, it’s very simple.”
Natalie Babbitt

There has been a lot of change over the years in the Redgate solutions – I hasten to add this is a good thing. Back in my day it was SQL Source Control to store your database in Version Control; at the time it was probably a 50/50 split between people who used Git and people who used other systems like SVN, TFVC (TFS/VSTS) and Vault or Mercurial etc. and you could then use DLM Automation to build and deploy this state-based database project to Test, Prod and so on.

SQL Source Control and DLM Automation (later SQL Change Automation) have formed the basis for many a pipeline for many many years, and they have been reliable, in some cases life changing for those who have used them… but the times, they are-a changing!

These technologies are still a great option and are still present in Redgate Deploy for those whom they work for, however with the rise of still further distributed computing topologies, and the dominance of cloud-hosted architecture and PaaS databases in todays world – something new is needed.

Enter Flyway Desktop.

As you’ve seen in some of my previous posts, Flyway Desktop is really really easy to get up and running with, not only that but it combines the State and Migrations models together creating one repo with ALL the benefits, and none of the deciding which model is best for you. It was architected from the ground up to be 3 things:

  • Ingeniously simple: to set up, to use, to everything.
  • Cloud ready: designed for use with IaaS and PaaS database options
  • A combination of the best of the best: all of the benefits of previous Redgate solutions, few to none of the drawbacks

...but what if you’re already using Redgate?

Yes Flyway Desktop and Redgate Deploy in general are super easy to get up and running with for new databases, even difficult, monolithic databases (thank you Clone as shadow!), but what about projects you already have under source control? Like I mentioned, SQL Source Control has been around for years and is beloved by many, and SQL Change Automation is still in use by thousands too. We want to maintain the history of our changes for reference, and we don’t want to simply disregard the whole pipeline. So the big question is how do we upgrade our state-based pipeline? Let’s find out together!

Note: This post is for people who want to or are interested in moving to a newer solution (and to give them an idea of what to expect) and in no way reflects any level of urgency you should be feeling – I’m certainly not pushing you to move any of your pipelines now, especially if you’re happy with what you have!

Setup

For starters I set up an end to end SQL Source Control and SQL Change Automation pipeline in Azure DevOps – my understanding of the approach I’m going to take is that this should work wherever your pipeline is (TeamCity & Octopus Deploy, Bamboo, whatever) so don’t feel that this post is not for you just because I used Azure DevOps.

I set up a copy of the DMDatabase on my local SQL Developer Instance, and then created an Azure DevOps repo and cloned it down to my machine:

I linked my database to the repo, created a filter to filter out users and committed it to my repo – then I set up the YAML for the build, and the Release steps for SQL Change Automation:

My SQL Source Control Project in Azure DevOps (Git)
The YAML to build my SQL Source Control Project
Release Steps in Azure DevOps
Deployment Steps

Everything seems to be deploying ok, I’ve even set up an Azure SQL Database as the target for my database changes. Now we have this SQL Source Control -> SQL Change Automation pipeline running, lets investigate replacing it.

SQL Source Control

The first thing I did was to open Flyway Desktop and create a new project – I pointed the project at my Dev DB and at the same local repo that I host my SQL Source Control files in:

and without committing the state to my schema-model folder, only linking to the Dev database, we end up with our repo looking like this:

I’m going to delete the Redgate.ssc file, because we’re no longer in SQL Source Control and I’m going to move every other file to the schema-model folder that is now under my project name (DMDatabase) – full on Copy Paste style:

…and then hit refresh in the Schema Model tab of Flyway Desktop:

and… nothing should happen. Absolutely nothing, because the state of your project, the Schema-Model folder should now exactly match the state of your development database (assuming you had everything committed to SQL Source Control!) – so now we come across to the version control tab aaaand…

WAIT!

If we commit now it will break our CI build, because when we trigger with a new push, my YAML will be expecting $(Pipeline.Workspace)/s/Database as the input, but now we have a slightly altered project we want to build a slightly different path. I’m going to temporarily disable my CI trigger in the YAML pipeline:

and now I’m going to Pull (to get the YAML file in my local repo) and then commit and push my changes:

Now I’m going to change my build YAML file to $(Pipeline.Workspace)/s/Database/DMDatabase/schema-model then save and re-enable Continuous Integration:

et voila!

SQL Change Automation sees it as a regular state based repo and builds and deploys it with no issues whatsoever:

and just like that! SQL Source Control is replaced – our teams can now pull down the latest copy of the Repo with the Flyway Desktop project in and open it. All they will need to do is re-specify their Dev Database Connection. If you are only using SQL Source Control or you’re using SQL Source Control with the SQL Compare GUI for more manual deployments currently then you’re done! When you want to extend your pipeline, you can read below.

SQL Change Automation

This is the step where we have to fundamentally change the way the pipeline works. It’s easy to switch across from a SQL Source Control to Flyway Desktop, which means we get immediate upgrades in speed, reliability and stability in our development process, especially where we’re working with Cloud-hosted databases.

With Redgate Deploy though, we’re fundamentally leveraging the Flyway command line capability for smooth, incremental deployments, and this is always a migrations only deployment – to move across to using Flyway then we’re going to need to make a few alterations to how the pipeline works.

First-things-first: We need some migrations, more specifically: THE migration. When you create a Flyway Desktop project usually you create a Baseline script. This script is the state of your Production environment(s), or a copy of them, and is used to basically be the starting point for your incremental migration scripts in the pipeline. The Baseline, once generated, is run against an empty database referred to in Flyway Desktop as the Shadow Database, although this can of course be a Clone too. Not every developer necessarily needs this – only the ones who will be generating the deployable artifacts, the migrations themselves, and putting them into source control, but they are definitely needed for deployments.

Note: I have some clients I’m working with who want every developer to affect schema changes and then immediately generate the migration for this and share with the team, but equally I have others who want 10 or so developers to share the responsibility of schema changes, and then once they’ve reviewed at the end of a sprint, they generate the Migration for the changes, source control it and approve it.

So in Flyway Desktop we set up our erasable database, our Shadow DB:

I use an empty database I stood up quickly in the Azure Portal:

and on the Generate Migrations tab I’m now prompted to create a baseline script:

I’m going to create the Baseline from my “Prod” environment that I’ve been using for my SQL Source Control deployments and hit baseline:

When you save and finish this will now run the baseline against the Shadow DB to recreate everything – and this is going to give you a chance to detect any changes you still have outstanding in the schema model – Flyway Desktop will compare the environments and detect any outstanding Dev changes, allowing you to also produce a migration for them.

Note: If your plan is to use this process to capture any outstanding code in a V002 “Delta” script to bring all environments back into line, you absolutely can but I would advise you to make the script idempotent – if you add all the necessary IF EXISTS statements for the deployment, you should be ok and it will only create or alter the objects that have to be, in order to sync all the environments up.

First Pull any pending changes from your repo then commit and push this into your Git remote:

and it should look a little like this:

Now for second-things-second, the build. This is actually going to be a very simple step, perhaps the easiest to change. We’re already using YAML, and as you know from previous posts it’s really very easy to leverage the Flyway command line as part of your YAML pipeline, so I’m going to simply swap out the SQL Change Automation build YAML with an updated version of the Flyway YAML from that post:

trigger:
- main

pool:
  vmImage: 'ubuntu-latest'
 
steps:
- task: DockerInstaller@0
  inputs:
    dockerVersion: '17.09.0-ce'
  displayName: 'Install Docker'
 
- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(locations):/flyway/sql flyway/flyway clean -url=$(JDBC) -user=$(userName) -password=$(password)
  displayName: 'Clean build schema'
 
- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(locations):/flyway/sql flyway/flyway migrate -url=$(JDBC) -user=$(userName) -password=$(password)
  displayName: 'Run flyway for build'

My password and username I shall hold back for the JDBC connection variable needs to be encapsulated in quotes, to prevent it being escaped or running partially because of the semi-colon:

jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build”

and the locations variable was my newly created migrations folder:

$(Pipeline.Workspace)/s/Database/DMDatabase/migrations

Fortunately these few changes mean that I now have a green build where I’m cleaning my Build DB and then building all of my files from there:

Deploying to Production is the only thing left. There’s a decision to be made here – because we’re just invoking the Flyway Docker Container, and we already have the YAML pipeline set up for the build we can:

  • As part of the build, zip up the migrations from the repo and publish them as an artifact, which we can then hand off to the Release portion of Azure DevOps, or indeed any other solution such as Octopus Deploy and run Flyway command line from there
  • OR we can simply expand out the YAML file – discard the “Release” pipeline and go FULL pipeline as code (which is also easier to audit changes on).

Given that we’re modernizing our deployment pipeline and introducing lean deployments of these incremental migration scripts, I’m opting for the latter, so I disable and archive my Release pipeline specifically and simply expand my YAML file with an additional step and an additional variable for the ProdJDBC instead of the Build DB:

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run -v $(locations):/flyway/sql flyway/flyway migrate -url=$(ProdJDBC) -user=$(userName) -password=$(password) -baselineOnMigrate=true -baselineVersion=001.20211210091210
  displayName: 'Deploy to Prod'

and of course in that YAML not forgetting the all important –baselineOnMigrate and –baselineVersion switches (which I’ve always been forgetting) – these are important because we’ll be marking the baseline script as deployed against our target and not actually running the baseline script – we don’t want to try to recreate all of the objects that already exist there.

This is the result:

Successful deployment to Prod, successful move to Flyway Desktop

Pre- and Post- Deployment Scripts

You might leverage pre- and post-deployment scripts in your SQL Source Control pipeline, something that has to happen each time before or after a deployment – if you want to maintain these in your new repo moving forwards you’ll need to make use of the Flyway callback functionality; take your pre-deployment scripts and turn them into a beforeMigrate callback and turn your post-deployment into an afterMigrate callback. These can sit in your migrations folder but:

  1. You may not need these now – because you have access to the migrations first deployment model, most changes can now be tailor-made to your deployment needs, such as injecting DML. statements in with your DDL scripts
  2. They will also run every time against your Shadow DB when you generate a new migration – just something to be aware of.

Final Word

It was much much easier than I thought it would be to move across, but I by no means believe that this will be as easy for everyone who needs or wants to move in the medium-long term. I am always an advocate of testing things out prior to setting them up in earnest, and would encourage you to try this workflow out for yourself first, perhaps in tandem with your SQL Source Control pipeline against a dummy Prod DB temporarily to see how comfortable your team is with the process, and to give yourself the time to ask the questions you might have.

3 simple pipelines for database development with Redgate Deploy – Part 3: CircleCI

“There is no place to reach.. only places to rest to carry on.”
Jaya Bhateja

SPOILER ALERT – This is part 3 of a 3 part series on enabling database deployments using Redgate Deploy, so if you have not read at least the Setup and Principles section of my previous post (Part 1 which you can find here, and if you’re interested Part 2 here for GitHub Actions) then I would strongly advise you do so! Thanks!

In my setup post we managed to get 3 Flyway Desktop repositories set up: 1 for each CICD system we’ll be using, and a number of Azure SQL Databases to use as “Dev“, “Build“, “PROD” etc. – I have never used CircleCI before so this will be a new experience as I try to figure it out at the same time as set up a database deployment pipeline… but just to recap the principles of what we’re trying to achieve:

Principles

I’m setting up 3 separate pipelines in this post which will all effectively do the same thing, but for different “Prod” copies of databases, however when building and deploying in practice you will have a number of tasks you will want to accomplish in and around the process itself (such as really useful things like Unit Tests, Code Analysis etc.). To keep things simple I will be creating a 6th Database – the “Build” database which will act as our CI validation step and our process for all 3 pipelines will be:

  • Invoking a Flyway Clean against the “Builddatabase – this step will remove every object on the database leaving it empty
  • Invoking a Flyway Migrate against the “Builddatabase – this step will build the database from scratch to validate our baseline script and any further migrations build successfully
  • Invoking a further Flyway Migrate against our respective “Prod” database, to deploy the latest scripts we have generated.

CircleCI

Ok I made my way into CircleCI and it was really easy to get up and running with (the free tier that is) and OHMYGOSH will you look at this sleek beauty:

So far so good – CircleCI seems to be even easier to understand so far than GitLab (and CONSIDERABLY easier than GitHub Actions) – I’m sure there are a lot of major differences (and GitLab was really easy to use) but I’m hoping for a similar experience here by the looks of it!

I create a new project pipeline where it asks me to select a repo for this “project”:

So I hit “Set UP Project” and then “build my own yml script” – now you would think this might just give me a blank script but no, just like GitLab they give us the option of a starter pipe:

I’m going to go ahead and choose the “Hello World” pipeline because normally that’s the easiest to cannibalize!

Much like GitLab it has an indicator to let us know whether our YAML is valid or not (I’m looking at YOU Azure DevOps!!!!) which is a massive help, and in general it’s just pretty easy to see what each step is doing. I built out an example YAML file using similar commands to my GitLab pipeline like so:

version: 2.1
parameters:
  ciJDBC:
    type: string
    default: jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build
  prodJDBC:
    type: string
    default: jdbc:sqlserver://dmproduction.database.windows.net:1433;database=DMDatabase_PROD_CircleCI
  userName:
    type: string
    default: username
  password:
    type: string
    default: password
  migrationPath:
    type: string
    default: .\

jobs:
  clean:
      docker:
        - image: flyway/flyway:latest-alpine
      steps:
        - checkout
        - run:
            name: "Clean Build Database"
            command: "flyway clean -url=${ciJDBC} -user=${userName} -password=${password} -locations=filesystem:${migrationPath}"

  build:
    docker:
      - image: flyway/flyway:latest-alpine
    steps:
      - checkout
      - run:
          name: "Migrate to Build Database"
          command: "flyway migrate -url=${ciJDBC} -user=${userName} -password=${password} -locations=filesystem:${migrationPath}"

  deploy:
    docker:
      - image: flyway/flyway:latest-alpine
    steps:
      - checkout
      - run:
          name: "Deploy to Prod"
          command: "flyway migrate -url=${prodJDBC} -user=${userName} -password=${password} -locations=filesystem:${migrationPath}"

workflows:
  database-deploy-workflow:
    jobs:
      - clean
      - build
      - deploy

and also used the variables reference from the CircleCI documentation which was pretty helpful. But it resulted in this:

Turns out I made a few boo-boos along the way. So variables I was passing in like this: ${Variable} but Circle only really seemed to like it when I used << pipeline.parameters.variable >> because I had defined it at the beginning of the YAML file under parameters.

I also had the jobs running in parallel because I hadn’t defined in my workflow which steps were dependent on which – a lesson I SHOULD really have remembered from GitHub… but oh well. I corrected that:

workflows:
  database-deploy-workflow:
    jobs:
      - clean
      - build:
          requires: 
            - clean
      - deploy:
          requires: 
            - build

Interestingly everything was still failing and although everything was being passed through correctly, the only thing that was ACTUALLY making it to the Flyway Docker container was the first part of the JDBC connection:

Guess what? I had my quote marks in the wrong place.

destroy GIF

It’s ok though because 2 other things failed:

  1. The Prod deploy failed because it found a non-empty schema, a problem I seem to fall over EVERY SINGLE TIME, but which is easily remedied by providing the 2 switches to the Prod deployment: -baselineOnMigrate=true and -baselineVersion=[YourBaselineScriptVersion]
  2. The filepath specified wasn’t a valid path:

Yes, before anyone tells me I know my top level repo folder is still called “GitLab-Flyway“, I figured that out in the last post and I’m still face-palming. So I’m going to quickly alter the Prod Flyway migrate command and then play around with the filesystem locations first to see if I can find a value it likes…

Oh. It just needed a “.”… neat. Well here is the finished YAML that seems to work a treat:

version: 2.1
parameters:
  ciJDBC:
    type: string
    default: "jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build"
  prodJDBC:
    type: string
    default: "jdbc:sqlserver://dmproduction.database.windows.net:1433;database=DMDatabase_PROD_CircleCI"
  userName:
    type: string
    default: "username"
  password:
    type: string
    default: "password"
  migrationPath:
    type: string
    default: "./GitLab-Flyway/migrations"

jobs:
  clean:
      docker:
        - image: flyway/flyway:latest-alpine
      steps:
        - checkout
        - run:
            name: "Clean Build Database"
            command: flyway clean -url="<< pipeline.parameters.ciJDBC >>" -user=<< pipeline.parameters.userName >> -password=<< pipeline.parameters.password >> -locations=filesystem:<< pipeline.parameters.migrationPath >>

  build:
    docker:
      - image: flyway/flyway:latest-alpine
    steps:
      - checkout
      - run:
          name: "Migrate to Build Database"
          command: flyway migrate -url="<< pipeline.parameters.ciJDBC >>" -user=<< pipeline.parameters.userName >> -password=<< pipeline.parameters.password >> -locations=filesystem:<< pipeline.parameters.migrationPath >>

  deploy:
    docker:
      - image: flyway/flyway:latest-alpine
    steps:
      - checkout
      - run:
          name: "Deploy to Prod"
          command: flyway migrate -url="<< pipeline.parameters.prodJDBC >>" -user=<< pipeline.parameters.userName >> -password=<< pipeline.parameters.password >> -locations=filesystem:<< pipeline.parameters.migrationPath >> -baselineOnMigrate=true -baselineVersion=001.20211130101136

workflows:
  database-deploy-workflow:
    jobs:
      - clean
      - build:
          requires: 
            - clean
      - deploy:
          requires: 
            - build

and we have ourselves one nice, lean CircleCI build and deployment pipeline:

Baseline Script successfully marked as Deployed, and 2nd migration successfully deployed as shown by Flyway_Schema_History table on DMDatabase_Prod_CircleCI

Conclusion

Was the purpose of these three blog posts for me to build 3 perfect pipelines, with impeccable secrets handling, automated testing, code analysis and all the best practices that mean they can all be rolled out into Production deployment pipelines tomorrow with no editing?

No. No way. Far from it.

But the purpose was to prove something else – that it can be done. This is the bare bones approach to enabling your database pipelines with Redgate Deploy and the Flyway Docker container in 3 different CICD systems; GitLab, GitHub and CircleCI and what we hoped to observe was that they can all in fact be used, with Redgate Deploy, to deploy schema changes to any of the supported RDBMS’.

That is indeed what we did. Happy Migrating!

Thank you to everyone who has stuck it out through all 3 parts, trust me, I did an awful lot of learning here myself and made COUNTLESS YAML mistakes – although I don’t class myself a Level 20 Warlock-slash-CICD-Pipeline-Guru it has been thoroughly interesting and I hope you managed to use the basis for these posts as success for your own pipelines! If you do – let me know, I love to hear from anyone who reads my posts!

3 simple pipelines for database development with Redgate Deploy – Part 2: GitHub Actions

“Finding new, relevant quotes for every blog post is hard”
Chris Unwin

SPOILER ALERT – This is part 2 of a 3 part series on enabling database deployments using Redgate Deploy, so if you have not read at least the Setup section of my previous post (Part 1 which you can find here) then I would strongly advise you do so! Thanks!

Welcome to another edition of “Chris stumbles around in the dark until eventually it starts working, but don’t question it because it might stop.” – today we’re taking on GitHub actions and unlike GitLab, I’ve never used Actions before. In my previous post we managed to get 3 repositories set up: 1 for each CICD system we’ll be using, and a number of Azure SQL Databases to use as “Dev“, “Build“, “PROD” etc. just to recap the principles of what we’re trying to achieve though:

Principles

I’m setting up 3 separate pipelines in this post which will all effectively do the same thing, but for different “Prod” copies of databases, however when building and deploying in practice you will have a number of tasks you will want to accomplish in and around the process itself (such as really useful things like Unit Tests, Code Analysis etc.). To keep things simple I will be creating a 6th Database – the “Build” database which will act as our CI validation step and our process for all 3 pipelines will be:

  • Invoking a Flyway Clean against the “Builddatabase – this step will remove every object on the database leaving it empty
  • Invoking a Flyway Migrate against the “Builddatabase – this step will build the database from scratch to validate our baseline script and any further migrations build successfully
  • Invoking a further Flyway Migrate against our respective “Prod” database, to deploy the latest scripts we have generated.

GitHub Actions

Like I said, this is the first time I’ve tried to use Actions all by myself and I’m not 100% sure where to start – I’m familiar with the GitHub interface OBVIOUSLY but at first glance I wouldn’t say it’s immediately obvious… well not to a n00b like me anyway! There’s one little “Actions button” in the center and if you miss it you’ll spend far too long not seeing the wood for the trees!

My GitHub repo with just the Flyway Desktop files in

Well anyway after some investigation it turns out that GitHub Actions is called that for a reason, because you build Actions into your YAML pipeline. Duh. Not sure how I missed that… but it kind of makes perfect sense and on top of that its quite similar in a way to Azure DevOps or Jenkins plugins really.

Unfortunately though there was slim pickins’ on the GitHub “Marketplace” for Flyway Actions, the closest one, created by joshuaavalon only migrates and was specifically put together for PostgreSQL deployments:

joshuaavalon’s Action usage example

So I did what any self-respecting person would do… I took the difficult route, forked Joshua’s repo and made the action generic, so that you could pass in whatever command you wanted to, plus changed it to use the latest Alpine version of the Flyway docker container. Plus I bumped the version number tag and fixed the docs that was using an old version of the Action, and by extension, Flyway container.

Forked repo in GitHub for the Flyway Action

2 things on this:

  1. I would assume that there will be an official Flyway Action at some point in the future, and there’s an argument to be made that using different Flyway commands should be different Actions in themselves — so as a result, I’m not going to publish this Action to the marketplace nor am I going to raise a PR for Joshua’s repo. But it was good to test out the theory.
  2. Thanks to Matt and Andrew here at RG for helping me out with a rather annoying problem involving invoking the correct version – this was solved by them and saved me banging my head against a brick wall for MANY MANY hours.

Anyway once I had the Action itself working (and it turns out it just needs to be sat in a public repo for you to invoke it, which is pretty neat – you don’t HAVE to publish if you don’t want/need to) it was time to move on to properly testing my YAML to build and deploy the contents of GitHub_Flyway. This was what I started off with:

name: CI

on:

  push:
    branches: [ trunk ]
  pull_request:
    branches: [ trunk ]

  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: ChrisUnwin/flyway-action@master
        with:
          url: jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build
          user: username
          password: password
          command: clean

…and everything seemed wonderful:

Successful “build” e.g. Flyway Clean in GitHub Actions
Console output of successful Flyway Clean

But then I tried to extend my YAML to also include the Build and Deploy components for the pipeline…

Build failure in GitHub Actions

I have 2 problems:

  1. It turns out it’s not looking at the correct filesystem location (e.g. “.\SQL doesn’t exist”), which is obvious, because I’m a numpty and forgot to change this to instead point at my “migrations” location instead. So I’ll fix that now.
  2. I forgot to include the Baseline on Migrate and Baseline Migration Version arguments like I did with GitLab. Flyway Desktop generates the baseline, mine being V001_20211130101136__baseline.sql so I need it to baseline (but not run) V001_20211130101136 when Flyway goes to deploy. Otherwise Flyway will, like I experienced, fail the deployment because there are existing objects in the PROD database… obviously.

The solution to #1 is SUPER easy fortunately but the solution to #2 has me thinking. If i add -baselineOnMigrate=true to the action, then it will baseline my Build database too, meaning it won’t run the baseline and my build will fail; so I will need to pass an additional argument to my YAML somehow… *sigh* back to the Action.

I also by luck happened to notice that all my actions steps were running at the same time – but thanks to this blog post from Bryan Hogan I was able to find the neat little “uses” command, so that fixed that issue….

So the final version of the Action was:

name: Flyway Pipeline Action
description: Migrate database in Github Action using latest Flyway Docker Container
branding:
  icon: database
inputs:
  url:
    description: The jdbc url to use to connect to the database
    required: true
  user:
    description: The user to use to connect to the database
    required: false
  password:
    description: The password to use to connect to the database
    required: false
  locations:
    description: Comma-separated list of locations to scan recursively for migrations
    required: true
    default: filesystem:./sql
  command:
    description: Which of the Flyway commands you would like Flyway to run
    required: true
    default: migrate
  disableclean:
    description: Would you like clean disabled (true or false)
    required: false
    default: false
  baselineonmigrate:
    description: Would you like to baseline on migrate (true or false)
    required: false
    default: false
  baselineversion:
    description: What is your baseline version
    required: false
    default: 0.0
runs:
  using: docker
  image: docker://flyway/flyway:latest-alpine
  env:
    FLYWAY_URL: ${{ inputs.url }}
    FLYWAY_USER: ${{ inputs.user }}
    FLYWAY_PASSWORD: ${{ inputs.password }}
    FLYWAY_LOCATIONS: ${{ inputs.locations }}
    FLYWAY_CLEAN_DISABLED: ${{ inputs.disableclean }}
    FLYWAY_BASELINE_ON_MIGRATE: ${{ inputs.baselineonmigrate }}
    FLYWAY_BASELINE_VERSION: ${{ inputs.baselineversion }}
  args:
    - ${{ inputs.command }} 

I just re-invented the wheel (which I know you shouldn’t do) but it was super easy – I added the environment variables for Flyway to provide the -baselineOnMigrate and -baselineVersion switches much like I had to in the last post.

I never thought I’d be so happy to see those little green dots:

This resulted from the following pipeline YAML:

name: CI

on:

  push:
    branches: [ trunk ]
  pull_request:
    branches: [ trunk ]

  workflow_dispatch:

jobs:
  clean-build-schema:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: ChrisUnwin/flyway-action@master
        with:
          url: jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build
          user: username
          password: password
          command: clean
          
  build:
    needs: clean-build-schema
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: ChrisUnwin/flyway-action@master
        with:
          url: jdbc:sqlserver://dmnonproduction.database.windows.net:1433;database=DMDatabase_Build
          user: username
          password: password
          locations: /GitLab-Flyway/migrations
          command: migrate
      
  deploy-to-prod:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: ChrisUnwin/flyway-action@v4
        with:
          url: jdbc:sqlserver://dmproduction.database.windows.net:1433;database=DMDatabase_PROD_GitHub
          user: username
          password: password
          locations: /GitLab-Flyway/migrations
          command: migrate
          disableclean: false
          baselineonmigrate: true
          baselineversion: 001.20211130101136

Can we take a minute to appreciate the fact that I have only just now realized that I accidentally named the top level folder GitLab-Flyway across all 3 of my repos? Well, this pipeline has nothing to do with GitLab, but it works and that’s what I care about!:

DMDatabase_Prod_GitHub successfully migrated to V002

Conclusion

Getting my head around how actions work as part of the pipeline was definitely one of the hardest aspects to this challenge – however, given that yet again we can simply include the Docker container for Flyway as part of the pipeline, we can very quickly and easily get up and running with Flyway for our Redgate Deploy pipeline, you just need the right YAML for the job.

3 simple pipelines for database development with Redgate Deploy – Part 1: Setup & GitLab

Society must adapt to diamonds, diamonds don’t adapt to society.
Abhijit Naskar

The world is changed… I feel it in the water… I feel it in the earth… smell it in the air. On a totally unrelated note did you know December 2021 marks the 20th Anniversary of the Lord of the Rings films? Just in case you were looking for your reminder to go and watch those masterpieces again, this is that sign!

Seriously though – gone are the days when I would demonstrate database pipelines on 1 or two different technologies. Over the last 6 years I have walked people through database deployments using an array of CICD options; Jenkins, TeamCity, Octopus Deploy, Bamboo… and most recently I’ve spent most of my time on Azure DevOps. At times it can even feel like Azure DevOps is the only solution you’ll need, but increasingly it’s becoming obvious that isn’t the case and there are new, shiny providers who offer some amazing experiences and awesome functionality.

Now seems like the best time to explore 3 of the ones I’m coming across more and more – CircleCI, GitLab and GitHub Actions.

The interesting part of this is that I genuinely believe that this will be incredibly easy. Maybe I’m naïve but from the looks of all 3 they seem straight-forward, understandable… and of course I’ll be using Flyway in my pipeline, which is the easiest, cross-platform friendly solution to use for this.

Note: I will assume you have some familiarity with Flyway in this post, if you don’t read more about the capabilities of Redgate Deploy here.

The Setup

For this “challenge” (if I can call it that) I’m going to be using Flyway Desktop installed on my Windows laptop, GitHub as my Version Control system and 5 Azure SQL Databases: 2 for “Dev” & “Dev_Shadow” (from which I will generate 3 independent repos) and 3 environments for PROD_GitLab, PROD_CircleCI and PROD_GitHub respectively. The structure of the database will be the DMDatabase, unsurprisingly the database I use for pretty much everything I do on this blog.

Note: Everything I’m doing today uses SQL Server (well… Azure SQL Database) however everything here is cross OS – you can set up similar pipelines for everything from Oracle to PostgreSQL to CockroachDB if you would like!

5 Databases ready to go – as shown in the Azure Portal

Fortunately CTRL+C, CTRL+V exists, so I’ll only have to setup once and then I’ll just copy the files across into the other two repos; I set up a new private Repo in my GitHub specifically for GitLab but you could easily repeat these steps below separately for GitHub Actions or CircleCI:

GitLab repo in GitHub

I clone this down onto my Windows machine using Git Bash and then linked to and created my Flyway Desktop project (don’t know how? Try this!):

Link the development database and the shadow, generate the Schema Model and the Baseline Migration from DMDatabase_PROD_GitLab (I just grab the relevant JDBC connectors from the Azure Portalthis makes it much easier!), don’t forget to specify the list of Schemas, I did and it ain’t pretty (but is an easy enough fix)

Then I commit and push the schema model files and the baseline migration up into GitHub:

For good measure I also changed the DM_CUSTOMER table on the Dev environment and generated a new schema-model and migration change so I know what is going to be deployed to my “Prod” environments as part of this test:

Then after committing and pushing to my repo, I copied all of the files over to my GitHub and CircleCI repositories too:

A quick check of my other repos and everything seems good to go!

Principles

I’m setting up 3 separate pipelines in this post which will all effectively do the same thing, but for different “Prod” copies of databases, however when building and deploying in practice you will have a number of tasks you will want to accomplish in and around the process itself (such as really useful things like Unit Tests, Code Analysis etc.). To keep things simple I will be creating a 6th Database – the “Build” database which will act as our CI validation step and our process for all 3 pipelines will be:

  • Invoking a Flyway Clean against the “Builddatabase – this step will remove every object on the database leaving it empty
  • Invoking a Flyway Migrate against the “Builddatabase – this step will build the database from scratch to validate our baseline script and any further migrations build successfully
  • Invoking a further Flyway Migrate against our respective “Prod” database, to deploy the latest scripts we have generated.

GitLab

After following the Setup instructions above, in GitLab I need to create a New Project and I want it to Build/Deploy from my GitHub repo, so I pick “Run CI/CD for external repository

Fortunately it’s very easy to connect directly from GitHub, but you will have to generate a Personal Authentication Token which you can do by going to https://github.com/settings/tokens and then authorizing the main repo you want to build from – for me this is GitLab_Flyway:

Painless! From here I select CI/CD template and because I’m starting from scratch I’m going to use the starter 3 stage template:

It has a rather neat layout and is pretty darn easy to get up and running with:

I may have tried several combinations to get the Flyway Docker container up and running but essentially the code I ended up running for my pipeline was:

stages:          # List of stages for jobs, and their order of execution
  - build
  - deploy

variables:
    userName: "MyUserName"
    password: "MyPassword"
    prodJDBC: "jdbc:[TheJDBCConnectionToTheProdDBYoureUsing]"
    ciJDBC: "jdbc:[TheJDBCConnectionToTheBuildDBYoureUsing]"
    migrationPath: $CI_PROJECT_DIR

build-job:       # This job runs in the build stage, which runs first.
  image:
    name: flyway/flyway:latest-alpine
    entrypoint: [""]
  stage: build
  script:
    - flyway clean -url="$ciJDBC" -user="$userName" -password="$password" -locations="filesystem:$migrationPath"
    - flyway migrate -url="$ciJDBC" -user="$userName" -password="$password" -locations="filesystem:$migrationPath"

deploy-job:      # This job runs in the deploy stage.
  image:
    name: flyway/flyway:latest-alpine
    entrypoint: [""]
  stage: deploy  # It only runs when *both* jobs in the test stage complete successfully.
  script:
    - flyway migrate -url="$prodJDBC" -user="$userName" -password="$password" -locations="filesystem:$migrationPath" -baselineOnMigrate=true -baselineVersion="MyBaselineVersion"

It was actually quite easy to spot where I had failed in previous runs and GitLab breaks things up quite nicely for us to see:

Some interesting things I noted using this setup:

  • Because we are deploying up stream to an environment that hasn’t been deployed to with Flyway before you have to pass in the -baselineOnMigrate switch, what was interesting though is that I also needed to specify the -baselineVersion, otherwise it tried to baseline V1, which of course did not exist as my baseline was named something completely different (V001_DateTimeStamp_blahlablah)
  • The entrypoint is specified as it is because it drops you right at the Flyway executable so you can issue the Flyway commands – without this it doesn’t work
  • You can ABSOLUTELY pass your variables in (like *cough* password and username) in a much more secure way through GitHub using variables, but this was a great start for me
  • To pass in the files using a hosted repo, I had to use the environment variable $CI_PROJECT_DIR and that’s where the repo is checked out to, where your migrations are

But it ultimately ended up in what I was expecting – the database was migrated using the Flyway command steps:

Conclusion

Is it possible to setup a nice easy pipeline from Dev -> Prod with Redgate Deploy and GitLab? Yes, absolutely it is, and you can build out the pipeline in whatever fashion you want. Thankfully, the Docker container makes things much much easier!

Now, let’s see how we get on with GitHub Actions!

When the unused become used: Using SQL Data Catalog to spot the pitfalls of a data classification process

“The surprise is that you continue to be surprised.”
Jill A. Davis

But what surprise am I alluding to? Of course it is the surprise that “all of a sudden” you have sensitive data in tables where there ought not be!

What I mean is; we have a fantastic process in place: we have completely classified our entire environments, developers move changes to classification up-stream just like schema changes and we have an automatic process to check and update classifications as part of our pipeline.

But as part of this process we “descoped” everything that was non-sensitive, including (crucially) empty tables. Now empty tables are a common occurrence and most people I speak to have a bunch of them, for a multitude of reasons; they’re using an ERP/CRM and aren’t using some of the modules, some of them are older and are gradually being phased out etc.

What happens over time however, is that some tables which we originally weren’t using suddenly start to populate with data, and others which were in-scope now suddenly aren’t. But because this isn’t technically a schema change and we can’t necessarily rely on the developers to know when this change will happen – how do we catch it?

The easiest thing to do in this instance is an automated check – write once, run many times. As we’ve seen from the many, many blog posts before this the answer is pretty straight forward – SQL Data Catalog’s PowerShell module once again saves the day!

(You can find the full documentation on the PowerShell module and a cmdlet reference here in the documentation.)

The Process

The most obvious thing to do is to re-scan our DB so Catalog is up to date and then fetch all of our columns first into a collection:

# Connect to SQL Data Catalog
Connect-SqlDataCatalog -AuthToken $authToken -ServerUrl $serverUrl

# Quickly refresh the scan of the Instance to get accurate row counts
Start-ClassificationDatabaseScan -FullyQualifiedInstanceName $instanceName -DatabaseName $databaseName | Wait-SqlDataCatalogOperation

# Get all columns into a collection 
$allColumns = Get-ClassificationColumn -instanceName $instanceName -databaseName $databaseName

Each of the columns returned by Data Catalog here has a number of properties that we can use:

The properties of each column object returned by the SQL Data Catalog PowerShell

All of these properties can be useful for something (and have been in the past) – in this case though the properties we care about are the id, the tags collection (name property) and the tableRowCount. The reason being that id uniquely identifies our columns so can be used to easily sort and filter on without having to match on text, the tags collection means we can search for any columns that do or do not have the Out of Scope – Unused tag and the tableRowCount lets us know if the table is or is not populated with data.

Its then fairly easy to filter these down:

# Collection for columns that are on empty tables

$emptyTableColumns = $allColumns | Where-Object { $_.tableRowCount -eq 0 }

# Collection for columns that are tagged as empty

$ColumnstaggedAsEmpty = $allColumns | Where-Object { $_.tags.name -eq $unusedTag }

# Collections for columns both tagged as empty but not, and empty columns not tagged as such

$ColumnsNowInUse = $ColumnstaggedAsEmpty | Where-Object { $_.id -notin $emptyTableColumns.id}

$ColumnsMissingClassification = $emptyTableColumns | Where-Object { $_.id -notin $ColumnstaggedAsEmpty.id}

You can easily visualize these in Data Catalog itself too:

Columns that are on empty tables, regardless of tag
Columns that are tagged as Out of Scope Unused regardless of if they have data or not

We can already see a difference in the lists – if we run the entire PowerShell however (which you can find right here in my GitHub, and I’ll include it at the end) then we actually find those differences out:

The PowerShell result: 2 classified as empty columns now with data, 3 columns not classified as out of scope, now on empty tables

So we can correct these by classifying the 3 now empty columns as now Out Of Scope – Unused, and we can change the 2 now-populated columns to be In Scope:

Suppliers is now in use, and corrected to be In-Scope
The missing Out Of Scope – Unused tags are corrected on the empty columns

…and now when we run the PowerShell again we should find everything is rectified:

Everything has been corrected, now no columns are incorrectly labelled or missing labels

Ongoing

Now we have the script we can run to identify these issues, you could implement it on the back-end of the DevOps process described by the blog post or video linked at the start of this blog post – and this would allow you to constantly be sure that your view of the Production database is accurate and up to date.

The key to a good classification process as part of on-going governance and an up-to-date view of your structured data estate is minimizing the manual effort that has to go into maintaining this process, which with the help from the Data Catalog PowerShell module – we can do easily (if we wanted a process that would go out of date after 2 weeks, we could have just used MS Excel…)

Happy Classifying!

SQL Data Catalog, Data Masker and your DevOps pipeline: How do I make sure everything is being masked?

“However fast regulation moves, technology moves faster. Especially as far as data is concerned.”
Elizabeth Denham

You’re probably sick of me constantly talking about how the cataloging of columns should be part of the DevOps upstream deployment process. I’ve blogged about it. I’ve even produced a video demonstrating this in action. But one question that this always throws up is:

If we include cataloging in the upstream process, how do we make sure our masking sets are also staying up to date?

The benefits of including the classifications in the upstream pipeline is that nothing ever gets to Production that hasn’t been classified – we constantly have a perfectly up to date idea of the nature of our structured data estate, how data is distributed, what risk is associated with which systems? etc. but one of the biggest wins is that we constantly know which fields need to be masked when we’re pulling copies back into non-Production, constantly.

Add a table? Add columns? We know about them, they’re classified, they’re deployed… so now they need to be masked on our next refresh. But how? Well it all depends on which approach we want to take:

  • Automated
  • Manual

Doesn’t it always boil down to those options? What I mean is that either we have an appetite to completely generate our masking set afresh every single time based on our classifications (Automated) or do we want to ensure that we configure each rule ourselves (Manual). They both have benefits and drawbacks.

Automated

How: We can generate a masking set using the SQL Data Catalog PowerShell each and every time as part of our pipeline. Add a column, tag it in the pipeline and then simply wait for the necessary rulesets to be generated in the pipeline (perhaps as part of your build) for you to run when you bring a copy back. Check out my walkthrough for how to set this up.

Benefits: The process is automatic. Its headless and you don’t need to think about it at all. As long as the classifications are provided (and if you follow the steps from the blog post and video you should be providing them) then you’re always generating rules for every classified column.

Drawbacks: This process can be fragile. If we don’t classify correctly we can end up masking in the wrong way or trying to mask the wrong field (e.g. a Primary Key, Constraint, Identity etc.) which can cause masking failures and then you have to spend time fixing the pipeline/masking set. This also means that the nature of the masking is dependent on your classifications, and the values you will get masked into the columns will be less realistic as a result (i.e. you can’t generate Row-Internal Sync Rules using the integration).

Drawback Mitigation: To avoid the process breaking, be sure to really focus on how you set up your API settings / how you pick which categories and tags are used to generate the masking rules (like I discussed here). This will at least help you make sure you map common data sets into columns (and don’t hit columns that have constraints or keys).

Only columns marked with Static Masking as the Treatment Intent will get a rule created for them
An Information Type is given to every column we intend on masking – these are then mapped to templates in masker to ensure more realistic data

Manual

How: Either rely on developers to check in masking set changes along side their code changes or build in a manual intervention step to your upstream process to ensure that someone opens and updates the schema and rules within the masking set, and then put this back into the pipeline.

Benefits: This results in more accurate, more likely to run sets that generate significantly more reliable and realistic data as an end result. Masked DB copies can be more easily used for anything beyond simple development changes, including analytics etc. You can include any specialist rules you need and apply your own understanding and knowledge of the database.

Drawbacks: This is obviously a manual process – less than ideal. Anything that involves a human can invariably go wrong because we are humans and we make mistakes (which we learn from of course). This also takes significantly more time as part of the process.

Drawback Mitigation: This is a harder one to mitigate as we’re reliant on manual intervention, however the best way to check this could be with another team member checking (maybe as part of a pull request) or you could include an automated PowerShell script to effectively rationalize the columns to be masked from Catalog vs the columns currently in the Data Masker masking set – this would help you understand if all the necessary columns have been updated or if any were missed. I have included an example of this PowerShell below.

### CHANGE THIS ###

$MaskingSet = "yourmaskingset.DMSMaskSet" # Your masking set including the DMSMaskSet file extension
$instance = "yourinstance" # The Instance as it is shown in Data Catalog that hosts the database
$DatabaseName = "yourdatabase" # The DB you want classification info for
$CatalogServer="http://yourmachine:15156" # The lcoation of your catalog server, ending on :15156
$authToken="redacted" # Your Data Catalog Auth token from the Settings page
$tagName = "Static Masking" # The tag you're using to identify which columns need to be masked

### DONT CHANGE THIS ###

Invoke-WebRequest -Uri "$CatalogServer/powershell" -OutFile 'data-catalog.psm1' -Headers @{"Authorization"="Bearer $authToken"}
Import-Module .\data-catalog.psm1 -Force
Connect-SqlDataCatalog -ServerUrl $CatalogServer -AuthToken $authToken 
$ColumnsMarkedForMasking = Get-ClassificationColumn `
    -InstanceName $instance `
    -DatabaseName $DatabaseName | Where-Object {$_.tags.name -eq $tagName} 
$MaskingSetXML = [xml](Get-Content -Path $MaskingSet)
$subrules = $MaskingSetXML.SelectNodes('//DMSSetContainer_MaskingSet/DMSSetContainer/DMSRuleBindingList/RuleSubstitution')
$internalrules = $MaskingSetXML.SelectNodes('//DMSSetContainer_MaskingSet/DMSSetContainer/DMSRuleBindingList/RuleRowInternal')
$shufflerules = $MaskingSetXML.SelectNodes('//DMSSetContainer_MaskingSet/DMSSetContainer/DMSRuleBindingList/RuleShuffle')
$searchreplacerules = $MaskingSetXML.SelectNodes('//DMSSetContainer_MaskingSet/DMSSetContainer/DMSRuleBindingList/RuleSearchReplace')
$TablesAndColumns = @()

$subrules | ForEach-Object {`
    $CurrentTable = $_.TargetTableName.value
    $_.DMSPickedColumnAndDataSetCollection.DMSPickedColumnAndDataSet.N2KSQLServerEntity_PickedColumn.ColumnName.value | ForEach-Object {$TablesAndColumns+= $CurrentTable + "." + $_ }
}

$internalrules | ForEach-Object {`
    $TablesAndColumns+= $_.TargetTableName.value + "." + $_.TargetColumnName.value
}

$shufflerules | ForEach-Object {`
    $CurrentTable = $_.TargetTableName.value
    $_.DMSPickedColumnCollection.DMSPickedColumn.N2KSQLServerEntity_PickedColumn.ColumnName.value | ForEach-Object {$TablesAndColumns+= $CurrentTable + "." + $_ }
}

$searchreplacerules | ForEach-Object {`
    $TablesAndColumns+= $_.TargetTableName.value + "." + $_.TargetColumnName.value
}

$result = $TablesAndColumns | Sort -Unique
$ColumnsNeedingRules = $ColumnsMarkedForMasking | Where-Object {($_.tableName + "." + $_.columnName) -notin $result}

"`nThere are " + $ColumnsMarkedForMasking.count + " columns that require masking for database " + $DatabaseName + "in SQL Data Catalog."
"You are masking " + $result.count + " distinct columns in masking set: " + $MaskingSet
"`nThe columns that do not currently have a mask configured are:`n"

$ColumnsNeedingRules | ForEach-Object {$_.tableName + "." + $_.columnName + "     (" + $_.dataType + ")"}

$next = Read-Host -Prompt "`nWould you like to see the columns currently in your masking set? (Y/N)"
if ($next -in ("Y", "y")) {$result}

This code can also be found on my GitHub here.

The output of running the script – 13 columns in the masking set, 14 columns outstanding to be masked

You can technically even use this same check approach for the automated masking set generation option, to ensure that everything has been tagged correctly.

PASS Data Community Summit 2021

“Education is the kindling of a flame, not the filling of a vessel.”
Socrates

I will be speaking at PASS Data Community Summit 2021

I have spoken at previous PASS Summits; both through the virtue of working for Redgate, and off my own back through dedication and passion to the subject matter I speak about: Data Privacy and Protection.

In 2018 I stood on stage with Microsoft to speak about the nature of Static Data Masking, how it differs from Dynamic Masking and what challenges need to be considered for a successful static masking rollout.

In 2019 I stood on stage alone to talk about creating a strategy for masking non-Production environments, including a walkthrough of the dbatools.io masking functionality utilized alongside Azure SQL Database classifications. PASS Summit 2019 was also when Kendra Little encouraged me to set up this blog, for which I’m forever grateful.

In 2020… well. You know what happened.

In 2021 Summit sees a new lease of life. Data Community Summit will be entirely online (no surprises there) but one big surprise you might not know is that it is completely free to attend. Never before will there have been SUCH a swathe of incredible speakers, with such a huge variety of topics and learning pathways for free and available on demand afterwards.

The dates for your diary? November 8-12, 2021

As it happens, I will also be speaking about setting up an end to end deployment pipeline using the Flyway Community Edition, Azure SQL Database and Azure DevOps it would be great to see you but with so much on offer I could absolutely understand if you watched on catch up!

You can see all the speakers here, but here’s a short list of some oft he sessions I will definitely be tuning in to!

  • Erin Stellato – Demystifying Statistics in SQL Server
  • Grant Fritchey – Identify Poorly Performing Queries – Three Tools You Already Own
  • Tracy Boggiano – Azure SQL Fundamentals
  • Angela Tidwell – Azure Devops Dashboards EZ as pie-charts!
  • Indira Bandari – Getting started with Python for Data professionals
  • Jess Pomfret – Azure SQL & Our Toolbox To Manage It
  • Taiob Ali – Think like the Cardinality Estimator
  • Neil Hambly – Azure Notebooks – Data Science fundamentals
  • and many more!

So please go check it out & register, support the community and do a bunch of learning in the process – it will be amazing to see you there and hopefully I’ll even get to see some of you in person in the not-so-far future!

Automating best practice checks at build time using the SQL Code Analysis cmdline (and failing the build)

Quality is not an act, it is a habit.”
Aristotle

I’ve always been thoroughly impressed with the static analysis code report that you can get from SQL Change Automation when it creates a Release Artifact prior to deploying upstream, and the fact we can use SQL Prompt to carry out on-the-fly static analysis as we write our T-SQL code in SSMS, but it has always struck me as odd that there doesn’t appear to be a way to include these checks at build time.

This seems like the perfect opportunity to build the DB from scratch (check), run unit tests (check) and check no poor coding practices have been checked in onto our branch (uh… not check?)

Enter SQL Code Analysis!

Don’t know what I’m talking about? I got you: https://documentation.red-gate.com/scg/sql-code-analysis-documentation/code-analysis-for-sql-server-command-line but to be fair I didn’t know this was a thing myself until yesterday!

One of my esteemed colleagues asked this question yesterday an being me I just couldn’t wait to have a go – is there a way to include this cmdline as part of an Azure DevOps build and FAIL the build if issues are found?

First things’ first: How does it work?

I downloaded the command line and it was pretty simple to get my head around. you can use windows or sql auth, you can point it at a scripts folder or a live DB and you can output the results to xml, html or the console if you’d like. Simple.

SqlCodeGuard.Cmd.exe /s:localhost\TOOLS /d:DMDatabase2019 /out:helloworld.html
SQL Code Analysis Console Output: 4 issues with the DMDatabase2019 found
Code Analysis HTML Output: 4 issues found with the DMDatabase2019, but much easier to read this time

Next: Wrap it in some PowerShell

I am neither a windows command line nor a PowerShell guru, but my first instinct when I’m going to include something like this in a build or deployment is to use PowerShell. It’s easy and mostly non-confusing to pass variables through the pipeline to PowerShell, easy to customize scripts and include if/else logic and to capture exit codes.

So I wrapped the cmdline call in some PowerShell (take it, it’s all yours!) that made it easier to:

  • Import the XML output generated by Code Analysis
  • Count the number of issues generated
  • Exit with error code 1 (failure) if any issues were found
  • Exit clean with 0 if no issues are found
#Set Path for Code Guard, server/instance, database and output location for XML
$codeGuardPath = "C:\Users\chris.unwin\Downloads\SCG-2019-10-17-11-40-22-46"
$server = "REDACTED"
$database = "REDACTED"
$outLocation = "$codeGuardPath\myoutput.xml"
#$user = "REDACTED"
#$password = "REDACTED"

#Invoke SQL Code Guard against the DB (could be the Build Database)
& "$codeGuardPath\SqlCodeGuard.Cmd.exe" /s:$server /d:$database /out:$outLocation #/u:$user /p:$password

#Import output xml file and count contents
$blah = [xml](Get-Content -Path $outLocation)
$files = $blah.SelectNodes('//file') #Objects with issues
$issues = $blah.SelectNodes('//file/issue') #Total issues themselves

#If number of issues > zero, exit with non-zero exit code and output list of affected objects
if ( $issues.count -gt 0 ) {

    "You have: " + $files.count + " objects, containing a total of: " + $issues.count + " issues."
    $files.fullname
    "Please review the xml output for more information."

    exit 1

}

#Else continue with no issues
else {

    "No code issues discovered."

}

This works like a charm:

Static Analysis Output in PowerShell: 2 objects with 30 issues

Finally: Put it in a pipeline

Unsurprisingly, putting it in a pipeline was the easiest part. I took a pipeline I had that was already running a local Azure DevOps agent in my default pool, made sure SQL Code Analysis was present in the correct directory on that machine and voilà! The build fails if it finds any issues.

N.B. I just stuck the raw PowerShell in the pipeline, you would be better off passing connection and location variables to the PowerShell script using custom Azure DevOps environment (and secret) variables. Oh, and having a better install directory for Code Analysis than Downloads, my bad…

Example YAML containing the PowerShell step
Code Analysis finds issues, so the PowerShell exists with Code 1, causing the build to fail

Time to choose.

There are a couple of things I’ve assumed here – I’m running it locally on a server and running against a database, and that database could be one that I’ve just built during my CI pipeline, absolutely – but you could also run this against a scripts folder / set of scripts, so even if you don’t yet have a full build / deploy process, or you have a different process that works for you – you can still include SQL Code Analysis with fairly minimal overhead! Enjoy!

Refreshing SQL Server Development workflows with Redgate SQL Provision

“If you quit on the process, you are quitting on the result.
Idowu Koyenikan

SQL Provision is really cool. But you knew that didn’t you? It’s obvious – we get teeny-tiny clones, based on an image with completely sanitized data we can use for just about anything in dev and test, and if we break them? Boom! There’s a new one.

I’m not just talking about refreshing Dev & Test environments though, oh no! I’m talking:

  • Clones as baseline with SQL Change Automation – baseline scripts for projects are a thing of the past, goodbye invalid object headaches!
  • Clones every single time you switch a branch – keeping everything separate and not cross-pollenating database work between branches
  • Clones to check Pull Requests instead of relying solely on the code itself in Version Control

Watch my session on all 3 of these from Redgate Streamed back in August: https://www.red-gate.com/hub/events/redgate-events/redgate-streamed/redgate-streamed-global-august-26

But one question always comes up about clones in any workflow and that is – how often should I refresh Images and Clones?

This question obviously depends a lot on the process but in reality I think the question should be less about clones and more about the images themselves. Clones are transient and can be flipped at a moments notice, but the image, or the “clone tax” as Steve Jones calls it, is the thing that takes time, resource and space.

I’m going to take my own go at answering this question as I would in any customer meeting or architecture session – but if you want some excellent detailed advice and examples, check out this awesome documentation page here: https://documentation.red-gate.com/clone/how-sql-clone-improves-database-devops/self-service-disposable-databases-for-development-and-testing

Q: So, how often should we refresh it?

A: It depends on your use of the Clone – how often do you need up to date data?

As a rule of thumb though, I tend to see the following behaviours:

  • Customer Support – overnight during the working week: Where you have data that needs people to troubleshoot customer issues, it always helps to have data as close to now as possible to help resolve issues. You want an image on standby ready so that at any second a member of support can pull down a copy to look through (if it NEEDS to have sensitive data for this purpose, you can restrict who can create clones from these images by using SQL Clone’s Teams functionality)
  • BI / MIS and Report testing – once a week (if not more often): Business Intelligence and reporting workflows can just mean that you’re reading a lot from your clones in which case they should stay small and you should be able to move seamlessly between clones. But. If your ETL process puts a very heavy load on your clones (like truncating and re-populating tables) you may cause bloat and need to rethink your refresh frequency to be more often where possible, perhaps overnight so that any transformations are captured in the new images, and clones by extension.
  • “BAU” Development (Schema and Static Data Changes) – Every 1 or 2 weeks: If you’re not affecting a large number of changes to your clone, or they are limited to schema and static data only then you should be absolutely fine with a wider refresh cadence – keeping the clones around for the whole sprint or only refreshing once during the sprint can mean everyone more easily stays up to date with the same environment consistently.
  • Ad-Hoc and Test workflows – once per month: There are going to be times where you occasionally need a copy of the live DB, but the fact it is 99% similar in terms of schema, and the data is a few weeks out of date isn’t a big deal. You can pull one down from this “cold copy” for any kind of test, destructive or even to validate certain behaviors / sense check if an update or query will work. It’s also handy to maintain a slightly older copy where possible if you need to start digging into failed updates made in development, so need to have a milestone to compare from.

Again – these workflows may vary and you may have needs to be more or less frequent based on differences being recorded, bloat, space available on the fileshare etc. but generally I find customers are pretty happy with this.

Q: Once we have our refresh rate in place – how do we move developers across?

This is a great question I get a lot of the time, and it stems from the fact a developer may have made a few dozen changes to a clone, and then the frequent refresh rate blows their clones away (and they forgot to commit to version control – D’oh!) – so it’s important to bear in mind that development work, and as a result the cloning of environments is not “cut and dried“. We should give developers a chance to move across as-and-when they’re ready, so I often end up recommending the below workflow, to ease this process.

For the sake of this proposed workflow I’m assuming a couple of things:

  1. The selected workflow is BAU Development and we want to refresh once per week
  2. We have enough space available on our fileshare to allow for 2 (or more) distinct copies of the primary image
  3. Clones are being delivered to jump boxes / VMs within the network that are always connected (and not developer machines), and we can control when they are deleted
  4. We operate on a standard western work schedule where the week begins on Sunday, Saturday and Sunday are considered non-working days and developers typically work anywhere between 8am and 6pm
  5. This can all be automated using SQL Clone’s PowerShell module

Week 1 – Sunday night

  • We create Image A of Primary Database from most recent backup file onto fileshare, applying data masking

Week 1 – Monday to Friday

  • Developers X, Y and Z create their own clones of Database A as they begin the working week
  • The clones are linked to a Git repo where, using SQL Change Automation, the developers commit all changes they make to their clones throughout the week
  • Developer X finishes with their changes, makes their final commit and push on Thursday and works on a different task on Friday

Week 2 – Sunday night

  • We create Image B of Database A – with slightly more up to date (and sanitized) data and capturing any deployed changes the team committed and pushed to git previously
  • We retain Image A for now but do a check for which developers have clones remaining (Developers Y and Z) and either nudge them in the team stand up that they only have a few days left, or automate the sending of an email to those developers warning them their clones are now 1 week old

Week 2 – Monday morning

  • Developer X creates their new clone from Image B and links it to Git ready to start making changes

Week 2 – Tuesday to Friday

  • Gradually over the course of the week as Developers Y and Z finish with their tasks and commit their changes they remove their clones and create new ones from Image B
  • A final reminder, as an email or a notification in MS Teams / Slack goes out on Friday morning that any clones of Image A will be deleted over the weekend

Week 3 – Sunday night

  • Image A with no clones remaining is deleted (or any remaining clones are deleted first) and Image C is created to begin the cycle again

Conclusion

Although this workflow requires the duplication of the central image, it has a number of benefits:

  • It is easily automated using PowerShell
  • The source control process suffers minimal disruption and developers don’t need to rush to finish anything
  • We don’t accidentally destroy developer work – the onus is on the developer to ensure work is committed
  • If, for any reason the image creation process fails, you still have a persisting image, so you don’t prevent developers from doing any work / waiting for the image process to manually complete
  • Moving to newer clones is a more organic process
  • If you wanted to maintain an image throughout the week and refresh a second image overnight for more up to date data, you can simply re-purpose the above principles. This could then be used for a number of the different teams and workflows simultaneously

Bonus Point – Naming Conventions

Many people choose to append the images they create with a date stamp like Image_A_16102020 so we know when it was taken and what the latest is. This is good practice but be warned if you’re using Clones as baseline or for branch switching etc. you will need to have a persistent name else that link will break. An alternative is always having the same name for the most current image and then simply renaming the older image with the date time stamp e.g. Image_A is current, but before creation of a new Image_A, it is renamed to Image_A_16102020 – this will not disrupt the clones that already exist on it, and it allows you to always know which one is most recent.

Delays are not DevOps, delaying DevOps is worse: Why we need better working practices now more than ever.

“The time is always right to do what is right.”
– Martin Luther King Jr.

Over the past few months, we have been on lock-down. The product of a devastating and deadly disease that has well and truly stamped it’s legacy on human history forever. But it is out of these times that we receive a glimpse, a look into what is possible, and what humanity can do. It is out of this fight, out of these ever decreasing odds that we finally see what a combined effort can do, and what a focus on our fellow people can bring about. Don’t know what I mean, take a look at the Good News Network and subscribe (just like me) to see the best of us.

But it only works, we only triumph, when we work together.

Recent Example: Scientists at Oxford University have seen a tremendous breakthrough with their virus efforts and are making unprecedented strides towards a viable vaccine – but it involves an incredibly strong partnership with pharmaceutical companies and governments worldwide for staged testing, large scale results gathering and continuous improvement.

So. Why then in the last few months have I been speaking to people who say things like:

  • We have delayed our non-BAU process implementation, because we have seen a spike in usage, and we need all hands on deck.
  • We have been investigating tooling and processes to help our development teams, but this is on hold as we come to terms with this newer way of working.
  • Our teams are keen to adopt more agile ways of working, but they are overburdened at the moment, so we have decided to postpone any research into this for the next few months.

It is feasible, amid a global panic, that people and companies will do (and definitely have done) what comes naturally to us; that is to “bunker down“. We believe that if we shift all efforts from projects and ongoing testing / new processes, we can have all hands available to deal with anything that comes our way. Processes are established for a reason, right? Legacy methods of dealing with ad-hoc changes and semi-frequent deployments, waterfall-esk development cycles and decade(s) old systems represent the familiar, the safe… Supposedly.

Now so, more than ever, it is time to actually change direction and to put more effort into some of the key principles and processes that will lead us to DevOps nirvana; it is this trinity of people, processes and tooling that can ultimately be the salvation for many global companies as they try to maintain their agility and competitiveness within an uncertain, shifting post-pandemic international market. There are many reasons why I, and many others, believe this but I have detailed 3 key reasons below:

1 – Delaying DevOps creates waste and costs businesses money

DevOps is a culmination of learning, experience and effort and it cannot be classified as one single thing, however it is possible to define a number of things that DevOps is and what it most certainly is not. One of the things that DevOps is, is “the constant delivery of value to end users“; the idea that by adopting certain technical measures and working practices we can minimize the time to delivery of new features and functions, which equates to greater value for us, our end users and significantly more agility to shift in different directions as required.

These ideas of flow and value streams are covered quite nicely by Lean IT, which extended from Lean Manufacturing principles and it is nicely explained here (and I would highly recommend you read The Phoenix Project if you haven’t already) – but the purpose of it (in a similar vein to agile principles) in this sense, is that it defines a number of things that don’t add value to the resulting product or service. These ‘things’ are referred to as waste and this is precisely what we should be looking to remove from our existing, legacy processes because why would we work on anything that doesn’t deliver any value?

There are a few different types of waste but I want to highlight three important ones that often exist as a result of legacy processes still being in place:

  • Defects Includes lack of testing (poor execution) and hot fixing environments (unauthorized changes)
  • Waiting Including everything from waiting for refreshed environments to waiting for feedback/results and even manual processes like deployment approvals
  • Motion (excess)Effectively doing the same thing over and over again, fire-fighting problems that arise on a near daily basis, engaging and monopolizing resources constantly who could otherwise be working on other, more important or value-add tickets (for those of you who HAVE read the Phoenix Project, see Brent as an example!)

The product of this waste is very simple and it fits into 3 main buckets: poor customer experience, increased costs and lost productivity. All of these things boil down to one fundamental truth – bad practices cost us money and reputation, transformation now could help us prevent this, and people will remember us for stepping up when we needed to.

2 – The workforce is increasingly distributed/remote and needs to collaborate better

At the beginning of 2020, one of the biggest questions faced by companies all around the globe was “how do I find and retain talent?”; this is not a new question and had already been around for years. Companies restricted to their offices (base OR satellite) realized they were increasingly fighting for candidates in one of the most competitive markets served by an ever dwindling local pool of options.

This was a situation which necessitated companies to stretch beyond their existing capabilities to enable a better quality experience for remote and distance workers and/or teams – a situation which would later be exacerbated 100x-fold by the global pandemic crisis. Not only has this crisis confirmed that most companies, certainly those feeding software markets, can work remotely, it has also posed the question of if we should work remotely more often, and has fundamentally changed the way we as a workforce will continue to work in the future.

One thing is for sure, whilst we will try to “return to normal” as much as possible, normal has been forever changed and remote working and collaboration is here to stay. That’s the important word in play here, collaboration.

The spirit of DevOps as I’ve mentioned before is good quality communication, collaboration and accountability. But at the heart of those three ideas is visibility. In an office we can over-hear, we can drop-in or bring things up “over the water cooler”/”at the coffee machine”. In a remote working context, that isn’t possible. So we have to adhere to 2 of the most important practices in modern day software development: transfer knowledge and record decisions.

When we work and communicate better in a remote/distributed workforce, and use tools and processes available to us, people don’t make unauthorized changes, or make decisions that affect you that you weren’t aware of until 3 weeks after they were made. It becomes easier to make decisions and generate better work faster, rather than being paralyzed by indecision and uncertainty as to whether you hold the latest version of the truth or if it is outdated. When we adopt the right processes and tools into our DevOps methodology, we know for certain what we should be doing, and why we should be doing it.

Process-wise this can easily take the form of common functions, many already at the disposal of teams when remote; stand-ups, retrospectives, mob- and pair- programming, OKR and sprint planning, there are lots of different ways for us to work well and know what we’re supposed to be working on at all times (and why). Tooling-wise we can then match these how we will be doing something with the respective record of what is being done, what decisions have been made and crucially, why. Using work management software like Trello, bug/feature tracking software like JIRA or Azure DevOps work items, source controlling everything (even having a strong branching and merging strategy to control workflow) with rigorous testing routines, policies and pull requests and automation all lead to better informed, happier*, well-performing developers with a crucial sense of purpose.

*Important side note: It is also crucial that we don’t simply lose ourselves in the business benefits completely – developers, testers etc. are all human and we all crave job satisfaction and happiness in our roles. Yes you might be able to increase your deployment frequency, minimize costs or complaints, but nothing compares to a satisfied, motivated team, which DevOps can help breed and inspire.

3 – DevOps breeds innovation and improves company performance, with a tangible return on investment (and not just financial!)

Automation is one of the single greatest ways we can modernize our processes, and is often the first principle we think of when adopting DevOps practices; taking something that is manual or held together by legacy scripts that forever fail and cause outages, and instead continuously integrating, continuously testing and continuously improving using the latest processes and tooling available to us. Automation allows us to create high cycle rates, enhance and multiply the feedback options we have within our pipeline(s) and allows us to reduce manual concerns and issues, to dedicate teams to the very thing they were employed to do: innovate. We only need watch how Netflix does DevOps to realize what we can unlock.

On a weekly basis I discuss existing processes with developers all over the world and one key trend always emerges that we need to focus on: they have a process that is currently manual, and it needs improving. I have lost track of the number of times I’ve been told about a process where developers generate scripts, manually test themselves and then “do x” with it, whether that be just deploying to Production themselves (without review), or putting it in an ever mounting pile of scripts in a folder on a file share for someone to sift through periodically.

Across every single one of the conversations mentioned above that I have, there is not a single discussion that doesn’t include some kind of quantifiable cost to the business, whether that be downtime, customer refunds, regulatory penalties and even in extreme cases, high developer churn. All can still be expressed in terms of 2 things: Time and Money.

If we remove those roadblocks for our developers, if we give them tools to enable them to more easily do their jobs, we put in place processes that allow them to more easily deliver that innovation, and tight automated controls to remove error-prone, manual jobs – we end up with something more akin to harmony. By this of course I mean the “constant delivery of value to end users“. This creates a more positive user experience, allows us to respond more quickly in an uncertain market and make decisions on what we should or, just as importantly, shouldn’t do, faster.

The world at the moment is a very uncertain place and has destroyed jobs, companies and whole industries. We should expect that consumer confidence is at an all time low and as we all come out of lock-down across the globe, we should be prepared to metaphorically “put our best foot forward” to help our developers believe in what we’re doing, stimulate faith in our industries and ultimately deliver more value bidirectionally.

But this idea only works if we do this now.

Conclusion

DevOps is more than just “picking up some new tools” or “rolling out agile” to development teams. It is fundamentally a mindset change that can drastically and fundamentally alter the underlying motivations and thinking within an organisation, allowing you to focus on the most important thing – delivering value, faster.

There are always times where delaying large scale roll-outs is a pertinent decision to make, and a hard one at that! But DevOps seeks to unify every part of of the development cycle; giving you greater visibility, communication, accountability and control, with maximum flexibility to test, validate and even pivot where needed.

But the time to do DevOps is not “once everything improves“, “once we get back to normal” or “when we have more time“… it is now. Now is the opportunity we all have to capitalize on the wave of change we have been consistently waiting to implement, to strengthen our position and future growth in our markets. DevOps is how we can come out on the other side of this disaster ready to embrace new technologies and ways of thinking, to respond to our customers needs, and deliver value and speed at scale.