When the unused become used: Using SQL Data Catalog to spot the pitfalls of a data classification process

“The surprise is that you continue to be surprised.”
Jill A. Davis

But what surprise am I alluding to? Of course it is the surprise that “all of a sudden” you have sensitive data in tables where there ought not be!

What I mean is; we have a fantastic process in place: we have completely classified our entire environments, developers move changes to classification up-stream just like schema changes and we have an automatic process to check and update classifications as part of our pipeline.

But as part of this process we “descoped” everything that was non-sensitive, including (crucially) empty tables. Now empty tables are a common occurrence and most people I speak to have a bunch of them, for a multitude of reasons; they’re using an ERP/CRM and aren’t using some of the modules, some of them are older and are gradually being phased out etc.

What happens over time however, is that some tables which we originally weren’t using suddenly start to populate with data, and others which were in-scope now suddenly aren’t. But because this isn’t technically a schema change and we can’t necessarily rely on the developers to know when this change will happen – how do we catch it?

The easiest thing to do in this instance is an automated check – write once, run many times. As we’ve seen from the many, many blog posts before this the answer is pretty straight forward – SQL Data Catalog’s PowerShell module once again saves the day!

(You can find the full documentation on the PowerShell module and a cmdlet reference here in the documentation.)

The Process

The most obvious thing to do is to re-scan our DB so Catalog is up to date and then fetch all of our columns first into a collection:

# Connect to SQL Data Catalog
Connect-SqlDataCatalog -AuthToken $authToken -ServerUrl $serverUrl

# Quickly refresh the scan of the Instance to get accurate row counts
Start-ClassificationDatabaseScan -FullyQualifiedInstanceName $instanceName -DatabaseName $databaseName | Wait-SqlDataCatalogOperation

# Get all columns into a collection 
$allColumns = Get-ClassificationColumn -instanceName $instanceName -databaseName $databaseName

Each of the columns returned by Data Catalog here has a number of properties that we can use:

The properties of each column object returned by the SQL Data Catalog PowerShell

All of these properties can be useful for something (and have been in the past) – in this case though the properties we care about are the id, the tags collection (name property) and the tableRowCount. The reason being that id uniquely identifies our columns so can be used to easily sort and filter on without having to match on text, the tags collection means we can search for any columns that do or do not have the Out of Scope – Unused tag and the tableRowCount lets us know if the table is or is not populated with data.

Its then fairly easy to filter these down:

# Collection for columns that are on empty tables

$emptyTableColumns = $allColumns | Where-Object { $_.tableRowCount -eq 0 }

# Collection for columns that are tagged as empty

$ColumnstaggedAsEmpty = $allColumns | Where-Object { $_.tags.name -eq $unusedTag }

# Collections for columns both tagged as empty but not, and empty columns not tagged as such

$ColumnsNowInUse = $ColumnstaggedAsEmpty | Where-Object { $_.id -notin $emptyTableColumns.id}

$ColumnsMissingClassification = $emptyTableColumns | Where-Object { $_.id -notin $ColumnstaggedAsEmpty.id}

You can easily visualize these in Data Catalog itself too:

Columns that are on empty tables, regardless of tag
Columns that are tagged as Out of Scope Unused regardless of if they have data or not

We can already see a difference in the lists – if we run the entire PowerShell however (which you can find right here in my GitHub, and I’ll include it at the end) then we actually find those differences out:

The PowerShell result: 2 classified as empty columns now with data, 3 columns not classified as out of scope, now on empty tables

So we can correct these by classifying the 3 now empty columns as now Out Of Scope – Unused, and we can change the 2 now-populated columns to be In Scope:

Suppliers is now in use, and corrected to be In-Scope
The missing Out Of Scope – Unused tags are corrected on the empty columns

…and now when we run the PowerShell again we should find everything is rectified:

Everything has been corrected, now no columns are incorrectly labelled or missing labels

Ongoing

Now we have the script we can run to identify these issues, you could implement it on the back-end of the DevOps process described by the blog post or video linked at the start of this blog post – and this would allow you to constantly be sure that your view of the Production database is accurate and up to date.

The key to a good classification process as part of on-going governance and an up-to-date view of your structured data estate is minimizing the manual effort that has to go into maintaining this process, which with the help from the Data Catalog PowerShell module – we can do easily (if we wanted a process that would go out of date after 2 weeks, we could have just used MS Excel…)

Happy Classifying!

SQL Data Catalog, Data Masker and your DevOps pipeline: How do I make sure everything is being masked?

“However fast regulation moves, technology moves faster. Especially as far as data is concerned.”
Elizabeth Denham

You’re probably sick of me constantly talking about how the cataloging of columns should be part of the DevOps upstream deployment process. I’ve blogged about it. I’ve even produced a video demonstrating this in action. But one question that this always throws up is:

If we include cataloging in the upstream process, how do we make sure our masking sets are also staying up to date?

The benefits of including the classifications in the upstream pipeline is that nothing ever gets to Production that hasn’t been classified – we constantly have a perfectly up to date idea of the nature of our structured data estate, how data is distributed, what risk is associated with which systems? etc. but one of the biggest wins is that we constantly know which fields need to be masked when we’re pulling copies back into non-Production, constantly.

Add a table? Add columns? We know about them, they’re classified, they’re deployed… so now they need to be masked on our next refresh. But how? Well it all depends on which approach we want to take:

  • Automated
  • Manual

Doesn’t it always boil down to those options? What I mean is that either we have an appetite to completely generate our masking set afresh every single time based on our classifications (Automated) or do we want to ensure that we configure each rule ourselves (Manual). They both have benefits and drawbacks.

Automated

How: We can generate a masking set using the SQL Data Catalog PowerShell each and every time as part of our pipeline. Add a column, tag it in the pipeline and then simply wait for the necessary rulesets to be generated in the pipeline (perhaps as part of your build) for you to run when you bring a copy back. Check out my walkthrough for how to set this up.

Benefits: The process is automatic. Its headless and you don’t need to think about it at all. As long as the classifications are provided (and if you follow the steps from the blog post and video you should be providing them) then you’re always generating rules for every classified column.

Drawbacks: This process can be fragile. If we don’t classify correctly we can end up masking in the wrong way or trying to mask the wrong field (e.g. a Primary Key, Constraint, Identity etc.) which can cause masking failures and then you have to spend time fixing the pipeline/masking set. This also means that the nature of the masking is dependent on your classifications, and the values you will get masked into the columns will be less realistic as a result (i.e. you can’t generate Row-Internal Sync Rules using the integration).

Drawback Mitigation: To avoid the process breaking, be sure to really focus on how you set up your API settings / how you pick which categories and tags are used to generate the masking rules (like I discussed here). This will at least help you make sure you map common data sets into columns (and don’t hit columns that have constraints or keys).

Only columns marked with Static Masking as the Treatment Intent will get a rule created for them
An Information Type is given to every column we intend on masking – these are then mapped to templates in masker to ensure more realistic data

Manual

How: Either rely on developers to check in masking set changes along side their code changes or build in a manual intervention step to your upstream process to ensure that someone opens and updates the schema and rules within the masking set, and then put this back into the pipeline.

Benefits: This results in more accurate, more likely to run sets that generate significantly more reliable and realistic data as an end result. Masked DB copies can be more easily used for anything beyond simple development changes, including analytics etc. You can include any specialist rules you need and apply your own understanding and knowledge of the database.

Drawbacks: This is obviously a manual process – less than ideal. Anything that involves a human can invariably go wrong because we are humans and we make mistakes (which we learn from of course). This also takes significantly more time as part of the process.

Drawback Mitigation: This is a harder one to mitigate as we’re reliant on manual intervention, however the best way to check this could be with another team member checking (maybe as part of a pull request) or you could include an automated PowerShell script to effectively rationalize the columns to be masked from Catalog vs the columns currently in the Data Masker masking set – this would help you understand if all the necessary columns have been updated or if any were missed. I have included an example of this PowerShell below.

### CHANGE THIS ###

$MaskingSet = "yourmaskingset.DMSMaskSet" # Your masking set including the DMSMaskSet file extension
$instance = "yourinstance" # The Instance as it is shown in Data Catalog that hosts the database
$DatabaseName = "yourdatabase" # The DB you want classification info for
$CatalogServer="http://yourmachine:15156" # The lcoation of your catalog server, ending on :15156
$authToken="redacted" # Your Data Catalog Auth token from the Settings page
$tagName = "Static Masking" # The tag you're using to identify which columns need to be masked

### DONT CHANGE THIS ###

Invoke-WebRequest -Uri "$CatalogServer/powershell" -OutFile 'data-catalog.psm1' -Headers @{"Authorization"="Bearer $authToken"}
Import-Module .\data-catalog.psm1 -Force
Connect-SqlDataCatalog -ServerUrl $CatalogServer -AuthToken $authToken 
$ColumnsMarkedForMasking = Get-ClassificationColumn `
    -InstanceName $instance `
    -DatabaseName $DatabaseName | Where-Object {$_.tags.name -eq $tagName} 
$MaskingSetXML = [xml](Get-Content -Path $MaskingSet)
$subrules = $MaskingSetXML.SelectNodes('//DMSSetContainer_MaskingSet/DMSSetContainer/DMSRuleBindingList/RuleSubstitution')
$internalrules = $MaskingSetXML.SelectNodes('//DMSSetContainer_MaskingSet/DMSSetContainer/DMSRuleBindingList/RuleRowInternal')
$shufflerules = $MaskingSetXML.SelectNodes('//DMSSetContainer_MaskingSet/DMSSetContainer/DMSRuleBindingList/RuleShuffle')
$searchreplacerules = $MaskingSetXML.SelectNodes('//DMSSetContainer_MaskingSet/DMSSetContainer/DMSRuleBindingList/RuleSearchReplace')
$TablesAndColumns = @()

$subrules | ForEach-Object {`
    $CurrentTable = $_.TargetTableName.value
    $_.DMSPickedColumnAndDataSetCollection.DMSPickedColumnAndDataSet.N2KSQLServerEntity_PickedColumn.ColumnName.value | ForEach-Object {$TablesAndColumns+= $CurrentTable + "." + $_ }
}

$internalrules | ForEach-Object {`
    $TablesAndColumns+= $_.TargetTableName.value + "." + $_.TargetColumnName.value
}

$shufflerules | ForEach-Object {`
    $CurrentTable = $_.TargetTableName.value
    $_.DMSPickedColumnCollection.DMSPickedColumn.N2KSQLServerEntity_PickedColumn.ColumnName.value | ForEach-Object {$TablesAndColumns+= $CurrentTable + "." + $_ }
}

$searchreplacerules | ForEach-Object {`
    $TablesAndColumns+= $_.TargetTableName.value + "." + $_.TargetColumnName.value
}

$result = $TablesAndColumns | Sort -Unique
$ColumnsNeedingRules = $ColumnsMarkedForMasking | Where-Object {($_.tableName + "." + $_.columnName) -notin $result}

"`nThere are " + $ColumnsMarkedForMasking.count + " columns that require masking for database " + $DatabaseName + "in SQL Data Catalog."
"You are masking " + $result.count + " distinct columns in masking set: " + $MaskingSet
"`nThe columns that do not currently have a mask configured are:`n"

$ColumnsNeedingRules | ForEach-Object {$_.tableName + "." + $_.columnName + "     (" + $_.dataType + ")"}

$next = Read-Host -Prompt "`nWould you like to see the columns currently in your masking set? (Y/N)"
if ($next -in ("Y", "y")) {$result}

This code can also be found on my GitHub here.

The output of running the script – 13 columns in the masking set, 14 columns outstanding to be masked

You can technically even use this same check approach for the automated masking set generation option, to ensure that everything has been tagged correctly.

PASS Data Community Summit 2021

“Education is the kindling of a flame, not the filling of a vessel.”
Socrates

I will be speaking at PASS Data Community Summit 2021

I have spoken at previous PASS Summits; both through the virtue of working for Redgate, and off my own back through dedication and passion to the subject matter I speak about: Data Privacy and Protection.

In 2018 I stood on stage with Microsoft to speak about the nature of Static Data Masking, how it differs from Dynamic Masking and what challenges need to be considered for a successful static masking rollout.

In 2019 I stood on stage alone to talk about creating a strategy for masking non-Production environments, including a walkthrough of the dbatools.io masking functionality utilized alongside Azure SQL Database classifications. PASS Summit 2019 was also when Kendra Little encouraged me to set up this blog, for which I’m forever grateful.

In 2020… well. You know what happened.

In 2021 Summit sees a new lease of life. Data Community Summit will be entirely online (no surprises there) but one big surprise you might not know is that it is completely free to attend. Never before will there have been SUCH a swathe of incredible speakers, with such a huge variety of topics and learning pathways for free and available on demand afterwards.

The dates for your diary? November 8-12, 2021

As it happens, I will also be speaking about setting up an end to end deployment pipeline using the Flyway Community Edition, Azure SQL Database and Azure DevOps it would be great to see you but with so much on offer I could absolutely understand if you watched on catch up!

You can see all the speakers here, but here’s a short list of some oft he sessions I will definitely be tuning in to!

  • Erin Stellato – Demystifying Statistics in SQL Server
  • Grant Fritchey – Identify Poorly Performing Queries – Three Tools You Already Own
  • Tracy Boggiano – Azure SQL Fundamentals
  • Angela Tidwell – Azure Devops Dashboards EZ as pie-charts!
  • Indira Bandari – Getting started with Python for Data professionals
  • Jess Pomfret – Azure SQL & Our Toolbox To Manage It
  • Taiob Ali – Think like the Cardinality Estimator
  • Neil Hambly – Azure Notebooks – Data Science fundamentals
  • and many more!

So please go check it out & register, support the community and do a bunch of learning in the process – it will be amazing to see you there and hopefully I’ll even get to see some of you in person in the not-so-far future!

Configuring Dynamic Data Masking in Azure SQL Database from SQL Data Catalog using PowerShell

Horror is the removal of masks.”
Robert Bloch

I spend a great deal of my time talking about Data Masking, don’t believe me? Checkout here, here, here and even here. I talk about it a LOT, but I’m always talking about Static Data Masking (SDM), which is the process of masking the data at the file level itself so it is irreversibly altered – this is fabulous for non-Production environments like Dev and Test, especially when you pair it with a good cloning technology.

But what about Staging / Production environments?

I often work with teams to implement SDM and one of the fastest routes to successfully generating your masking sets for cloned environments is, of course, SQL Data Catalog (or cataloging solution of your choice) – you’ve already put the effort in to classify your columns and figure out where sensitive information exists within your databases and instances… so doesn’t it make sense that we can just use THAT as a source of truth and generate masking from there?

Note: I actually produced an end-to-end video showing the process of Cataloging, Masking and Cloning in under an hour here: https://www.red-gate.com/hub/university/courses/sql-data-catalog/end-to-end-data-protection-with-sql-data-catalog-and-sql-provision – if you watch/try it let me know how you get on!

Funnily enough though we wouldn’t want to use SDM for Production (and potentially Staging) environments though – as it irreversibly changes the data, it’s just going to completely mess up all of our Prod data. To tackle this then, many people I work with turn to Dynamic Data Masking.

Dynamic Data Masking (DDM)

DDM is a method of masking the data based on your access rights to the data. As far as customers see they have access to their data through our site or application no issues, but if anyone else needs to query that data, or different people need to see different results when querying environments, DDM has been their way to go.

Whilst a lot of people like to pick up on some of the well known downsides of DDM, it’s not like you’re entrusting the entire security of an environment to it alone – there are a ton of measures we can put in place and DDM is just one; like an ex-colleague of mine (someone very wise whom I admired greatly and am still sad to this day I no longer get to work with them) used to say: “It’s about building a defensible position. The more you do the easier it is to prove you’re doing something and the more likely you are to BE protected.

So when a customer asked this week if it was possible to configure Dynamic Data Masking from SQL Data Catalog (because they’d seen the “Treatment Intent” category and the tag that clearly states “Dynamic Data Masking”), just like we’re able to configure Static Data Masking, well now that was a challenge I couldn’t turn down!

The SQL Data Catalog Taxonomy Page – Treatment Intent Category showing Dynamic Data Masking

DDM in Azure SQL DB

Configuring Dynamic Masking in Azure SQL DB is fairly straight forward through the Azure portal, you can go to your Azure SQL DB, click Dynamic Data Masking and it gives you the option to simply pick and save columns to apply Data Masking to, and to whom these rules apply / don’t apply:

DDM in the Azure Portal for the DMDatabase_Dev, with masks configured on customer_firstname and customer_email

However when we potentially have a lot of columns or DBs to configure masks for this is really going to get very old very fast. As with all things, I turned to PowerShell for the answer and fortunately I found it: https://docs.microsoft.com/en-us/azure/azure-sql/database/dynamic-data-masking-overview?#set-up-dynamic-data-masking-for-your-database-using-powershell-cmdlets – basically I can get existing DDM configurations and set new configurations for columns directly from my Azure SQL DB using PowerShell.

Now I’m not expert on DDM, and Redgate Data Masker for SQL Server is not a DDM solution (so I’ve only ever needed to know SDM really) I don’t pretend to be, but it seemed that I had everything I needed to tie Catalog into DDM.

PowerShell time!

I’ve written so much PowerShell to get classifications out of Data Catalog at this point it’s become second nature, but if you’re using the SDC PowerShell module and you need a reference you can view it here: https://documentation.red-gate.com/sql-data-catalog/automation-with-powershell but the standard “stuff” goes:

  • Pull down the PoSh module
  • Connect to catalog where it’s installed using an Auth token
  • Grab out the classifications with Get-ClassificationColumn
  • Shrink this down to just the columns we care about based on the tags

But the Az PowerShell cmdlets were honestly just as easy to use! I was surprised how easy it was to get up and running:

  • Connect to my Azure subscription
  • Get the current list of columns already with DDM masks
  • Remove these from the Catalog list
  • Update the remaining columns to use the default Mask

This was the full code I ended up using:

# This script is intended to be used with Azure SQL Database and Redgate SQL Data Catalog, however you are welcome to adapt and edit as required
# It will pull columns out of azure that are already being masked, and a list of columns that need to be masked with DDM
# It will then rationalise these, and configure Default DDM masks for any columns not already being masked on that Azure SQL DB

#Variables for Azure SQL DB & Catalog
$ResourceGroup = "DMDb"
$ServerName = "dmnonproduction" # Your instance minus .database.windows.net
$instance = "dmnonproduction.database.windows.net" # The instance or logical SQL Server as displayed in SQL Data Catalog
$DatabaseName = "DMDatabase_Dev"
$CatalogServer="http://pse-lt-chrisu:15156" # Your SQL Data Catalog location, leave off the trailing "/"
$authToken="REDACTED" # Your SQL Data Catalog Auth Token
$AzureSub = "Redacted" # Your Sub ID

# Get the SQL Data Catalog PowerShell Module & Connect
Invoke-WebRequest -Uri "$CatalogServer/powershell" -OutFile 'data-catalog.psm1' -Headers @{"Authorization"="Bearer $authToken"}
Import-Module .\data-catalog.psm1 -Force
Connect-SqlDataCatalog -ServerUrl $CatalogServer -AuthToken $authToken 

#Connect to your Azure Subscription
Connect-AzAccount -Subscription $AzureSub

#Get current active DDM Masks from Azure
$DdmMasks = Get-AzSqlDatabaseDataMaskingRule `
    -ResourceGroupName $ResourceGroup `
    -ServerName $ServerName `
    -DatabaseName $DatabaseName
$ListOfDDMColumns = $DdmMasks | ForEach-Object {$_.SchemaName + '.' + $_.TableName + '.' + $_.ColumnName}

#Get columns from Catalog currently marked with "Dynamic Data Masking" as a treatment intent
$CatalogColumns = Get-ClassificationColumn `
    -InstanceName $instance `
    -DatabaseName $DatabaseName | Where-Object {$_.tags.name -eq "Dynamic data masking"} 

#Filter down to a list of columns that need to be masked, that currently aren't configured with DDM
$ColumnsToDDM = $CatalogColumns | Where-Object {($_.SchemaName + '.' + $_.TableName + '.' + $_.ColumnName) -notin $ListOfDDMColumns }


#Set default DDM Masks for identified columns
$ColumnsToDDM | ForEach-Object { `
    New-AzSqlDatabaseDataMaskingRule -ResourceGroupName $ResourceGroup `
                                     -ServerName $ServerName `
                                     -DatabaseName $DatabaseName  `
                                     -SchemaName $_.schemaName `
                                     -TableName $_.tableName `
                                     -ColumnName $_.columnName `
                                     -MaskingFunction "Default"

}

But I have also uploaded it to my GitHub here in case anyone would like to take and adapt as they see fit: https://github.com/ChrisUnwin/PowerShell/blob/master/Demos/Redgate%20Demos/DDMFromCatalog.ps1

And this was the result – here were the two columns I had already being masked:

Customer Firstname and Customer Email with DDM Masks Configured

These were the columns I had marked as Dynamic Data Masking in Data Catalog:

Customer firstname, lastname, street addres and email all marked for DDM in Catalog

and after running the PowerShell it deduced that the delta was street_address and lastname and created the default DDM mask for them in Azure:

All columns now being masked dynamically

Considerations

1 – I have used the default mask in this process, however if you wanted to configure the mask (as per the link to the docs above) to be specific numbers or format you could absolutely do this, simply by modifying the PowerShell to look at the Data Type and then just passing into a different New-AzSqlDatabaseDataMaskingRule for each of those types.

2 – This only applies to Azure SQL DB and does not take into account the considerations when using DDM on say, a 2017 SQL Server Instance running on a VM – however you could use the same approach to pass the columns into some dynamic T-SQL which would in turn run the correct command to add DDM to that/those column(s)

3 – I would still use Static Data Masking (SDM) for non-Production environments, because if anyone bypasses the DDM they will have access to the full data, which we don’t really NEED in less secure non-Prod environments anyway, so Static might well be the way to go!

Automating best practice checks at build time using the SQL Code Analysis cmdline (and failing the build)

Quality is not an act, it is a habit.”
Aristotle

I’ve always been thoroughly impressed with the static analysis code report that you can get from SQL Change Automation when it creates a Release Artifact prior to deploying upstream, and the fact we can use SQL Prompt to carry out on-the-fly static analysis as we write our T-SQL code in SSMS, but it has always struck me as odd that there doesn’t appear to be a way to include these checks at build time.

This seems like the perfect opportunity to build the DB from scratch (check), run unit tests (check) and check no poor coding practices have been checked in onto our branch (uh… not check?)

Enter SQL Code Analysis!

Don’t know what I’m talking about? I got you: https://documentation.red-gate.com/scg/sql-code-analysis-documentation/code-analysis-for-sql-server-command-line but to be fair I didn’t know this was a thing myself until yesterday!

One of my esteemed colleagues asked this question yesterday an being me I just couldn’t wait to have a go – is there a way to include this cmdline as part of an Azure DevOps build and FAIL the build if issues are found?

First things’ first: How does it work?

I downloaded the command line and it was pretty simple to get my head around. you can use windows or sql auth, you can point it at a scripts folder or a live DB and you can output the results to xml, html or the console if you’d like. Simple.

SqlCodeGuard.Cmd.exe /s:localhost\TOOLS /d:DMDatabase2019 /out:helloworld.html
SQL Code Analysis Console Output: 4 issues with the DMDatabase2019 found
Code Analysis HTML Output: 4 issues found with the DMDatabase2019, but much easier to read this time

Next: Wrap it in some PowerShell

I am neither a windows command line nor a PowerShell guru, but my first instinct when I’m going to include something like this in a build or deployment is to use PowerShell. It’s easy and mostly non-confusing to pass variables through the pipeline to PowerShell, easy to customize scripts and include if/else logic and to capture exit codes.

So I wrapped the cmdline call in some PowerShell (take it, it’s all yours!) that made it easier to:

  • Import the XML output generated by Code Analysis
  • Count the number of issues generated
  • Exit with error code 1 (failure) if any issues were found
  • Exit clean with 0 if no issues are found
#Set Path for Code Guard, server/instance, database and output location for XML
$codeGuardPath = "C:\Users\chris.unwin\Downloads\SCG-2019-10-17-11-40-22-46"
$server = "REDACTED"
$database = "REDACTED"
$outLocation = "$codeGuardPath\myoutput.xml"
#$user = "REDACTED"
#$password = "REDACTED"

#Invoke SQL Code Guard against the DB (could be the Build Database)
& "$codeGuardPath\SqlCodeGuard.Cmd.exe" /s:$server /d:$database /out:$outLocation #/u:$user /p:$password

#Import output xml file and count contents
$blah = [xml](Get-Content -Path $outLocation)
$files = $blah.SelectNodes('//file') #Objects with issues
$issues = $blah.SelectNodes('//file/issue') #Total issues themselves

#If number of issues > zero, exit with non-zero exit code and output list of affected objects
if ( $issues.count -gt 0 ) {

    "You have: " + $files.count + " objects, containing a total of: " + $issues.count + " issues."
    $files.fullname
    "Please review the xml output for more information."

    exit 1

}

#Else continue with no issues
else {

    "No code issues discovered."

}

This works like a charm:

Static Analysis Output in PowerShell: 2 objects with 30 issues

Finally: Put it in a pipeline

Unsurprisingly, putting it in a pipeline was the easiest part. I took a pipeline I had that was already running a local Azure DevOps agent in my default pool, made sure SQL Code Analysis was present in the correct directory on that machine and voilà! The build fails if it finds any issues.

N.B. I just stuck the raw PowerShell in the pipeline, you would be better off passing connection and location variables to the PowerShell script using custom Azure DevOps environment (and secret) variables. Oh, and having a better install directory for Code Analysis than Downloads, my bad…

Example YAML containing the PowerShell step
Code Analysis finds issues, so the PowerShell exists with Code 1, causing the build to fail

Time to choose.

There are a couple of things I’ve assumed here – I’m running it locally on a server and running against a database, and that database could be one that I’ve just built during my CI pipeline, absolutely – but you could also run this against a scripts folder / set of scripts, so even if you don’t yet have a full build / deploy process, or you have a different process that works for you – you can still include SQL Code Analysis with fairly minimal overhead! Enjoy!

“But I don’t wanna INSTALL it!”: Data Masker on the fly in Azure DevOps (with an Azure SQL DB)

“There is always a way to go if you look for it.”
Ernest A. Fitzgerald

As many of you know, I really enjoy talking about Data Masking. I fundamentally believe it is an absolutely ESSENTIAL part of Test Data Management and specifically the provisioning of Pre-Production environments. If you hold sensitive PII/PHI/PCI in your Production environments, you have no excuses for porting any of that back into Dev and Test.

I also stand firmly behind the belief (and it is just that, my belief) that masking is much safer than anything you can achieve with encryption or limiting access alone. Static masking, when done correctly, means that even if all other security measures are bypassed, or we accidentally expose data somehow, it doesn’t matter because the PII/PHI has been wiped, but it is still fundamentally useful for development and testing environments.

I’m also a huge proponent of using classification and data masking as part of a DevOps process, and often you’ll find me using Azure DevOps to actually kick off my masking process – but one of the most frequent questions I get is “do I have to install Data Masker somewhere?” and the answer I always have to give is… yes.

When we use SQL Provision to spin up our environments in non-Prod, generally we know they have to be in IaaS VMs or On-Prem, due to the nature of the technology (keep your eyes open on THIS because big changes coming soon *squeals in excitement*) but sometimes we’re working entirely with PaaS DBs and VMs don’t even come into our vocabulary – but we still need them to be masked for non-Prod use.

I have some data in an Azure SQL DB – a copy of DMDatabase (get it from my GitHub):

DM_CUSTOMER table in the DMDatabase

and I would like to get this masked before I create copies of it. Now I have written scripts in the past to make sure that Azure SQL DBs can be masked and then copied into non-Prod environments and you can get those scripts here, and you could easily wrap something like the below INTO a script like that – but much of the work I’ve done on this always involves having Data Masker installed somewhere and invoked on that machine – a VM, my laptop, whatever.

So, how do we avoid having to install Data Masker each time?

Data Masker for SQL Server does not (at time of writing) have publicly available a docker container or method for installing using Choco or something like that – once we do, trust me, I’ll be blogging about it A LOT as I will be very excited. But there is an install file available: https://download.red-gate.com/installers/DataMaskerforSQLServer/ and this might be enough.

The Process

Initially I created a Masking Set locally for the DMDatabase copy in Azure, it was nice and simple just masking a few of the fields on the CUSTOMER table:

It relies on SQL auth for the connection and I’m remembering the credentials, though these could be subbed in later on using the PARFILE (documentation on that here).

Next I put this masking set into a newly created Azure DevOps Git repo, which I cloned down onto my local machine – and then committed and pushed my changes up into Azure DevOps:

Now that this was all in ADO, it was time to set up a pipeline for it – so let’s jump into some YAML! Now, Data Masker currently needs to run on a Windows machine so we’ll set the pool to Windows-Latest:

trigger:
- main

pool:
  vmImage: windows-latest

The next step is to grab the installer – which I know I can easily do with PowerShell. I’m sure you could be more clever about this, but with limited time I chose the most recent version and hard coded that in to a PowerShell Invoke-WebRequest cmdlet:

- task: PowerShell@2
  inputs:
    targetType: 'inline'
    script: |
      $source = 'https://download.red-gate.com/installers/DataMaskerforSQLServer/2021-03-15/DataMaskerforSQLServer.exe'
      New-Item -ItemType directory -Path C:\download
      $destination = 'C:\download\DataMaskerforSQLServer.exe'
      Invoke-WebRequest -Uri $source -OutFile $destination
      New-Item -ItemType directory -Path C:\download\DMlogs
  displayName: 'Download Data Masker'

This PowerShell task is going to grab the most recent exe for Data Masker and pull it down into C:\download on the hosted VM we’re using for the pipeline and it’s also going to create a directory for the Data Masker logs as well (if you wanted to extend the YAML at the end to wrap these logs up and publish them as a result of the pipeline masking, then go for it and tweet me to let me know!)

Next we have to extract and install Data Masker from that download, which is fairly easy to do with a cmd call:

- task: CmdLine@2
  inputs:
    script: |
      "C:\download\DataMaskerforSQLServer.exe" products "Data Masker for SQL Server" log c:\download temp c:\download /IAgreeToTheEula RG_LICESE=%RGLICENSE%
  displayName: 'Install DMS Headlessly'

Note I’m using the guidance from this page, making sure to accept the EULA and I’m passing in my Redgate License as a variable, that I have specified for the pipeline and kept secret. This will put Data Masker in the default location in C:\Program Files\… and means we will then be able to call it. I do however need to make sure that this now works. So I saved my pipeline and ran it to see what happened:

Fabulous, that all works nicely. Now to pass the DMSMaskSet file to Data Masker and get it to run – ah but I forgot, I’m going to need a PARFILE as per the cmdline documentation that specifies where the files etc. are for the run. So I create my PARFILE.txt as such:

MASKINGSET=C:\download\DMDB_MaskingTime.DMSMaskSet
LOGFILEDIR=C:\download\DMlogs
DATASETSDIR=C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataSets 
REPORTSDIR=C:\download\DMlogs
INTERIM_REPORTS=false

As you can see, very simple. But to make sure these files are “simply” in the right folder, and because I don’t have time to explore how I could pass an environment variable into the .txt file itself, I’m going to add a quick file copy task to make sure that my masking set and PARFILE both make it into that location:

- task: CopyFiles@2
  inputs:
    SourceFolder: '$(Build.Repository.LocalPath)'
    Contents: '**'
    TargetFolder: 'C:\download'
  displayName: 'Copy files from working directory'

Of course now that we have Data Masker installed on the pipeline VM, the masking set AND the PARFILE… let’s get masking!

- task: CmdLine@2
  inputs:
    script: '"C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataMaskerCmdLine.exe" PARFILE=C:\download\PARFILE.txt'
  displayName: 'Run Data Masker'

I agree the command is now less than impressive given all the prep work, but when you run all of this in it’s entirety…

Big success! But let’s check the data of course and make sure it is as we expect:

It all works, Lynne has been masked to Muna (and the rest has been masked too!), and I didn’t need to have Data Masker installed on a VM in my environment, with an Azure DevOps self hosted agent to run it – I could just do it programmatically. #Winning.

I’ll put the full YAML below however some caveats:

  • You cannot run this pipeline a LOT in a short space of time, I found that out. The RedGate downloads page is not particularly designed for this process, so it should probably be run sparingly otherwise the pipeline times out because the server, by way of balancing, will prevent you from pulling the file too often.
  • You will need to update the PowerShell step reasonably often if you want the DMS most recent installer, or make it more futureproof to grab the latest version – I just didn’t investigate that.
  • Data Masker is dependent on the power of the machine it is running on in MANY ways and Azure DevOps pipeline VMs are not particularly the most powerful beasts in the world – so if you have a lot of masking that needs doing, I would be weary of this method and might stick to an Application Server VM you’ve got hanging around, just in case.
  • This is likely to change a lot in the future so may not be relevant when you’re reading it after a few months – so before implementing anything like this, if it’s been a few months, contact me or Redgate and just confirm that there isn’t some better way of doing this, if I haven’t already blogged it!

Thanks for stopping by and have a great week!

Full YAML:

trigger:
- main

pool:
  vmImage: windows-latest

steps:
- task: PowerShell@2
  inputs:
    targetType: 'inline'
    script: |
      $source = 'https://download.red-gate.com/installers/DataMaskerforSQLServer/2021-03-15/DataMaskerforSQLServer.exe'
      New-Item -ItemType directory -Path C:\download
      $destination = 'C:\download\DataMaskerforSQLServer.exe'
      Invoke-WebRequest -Uri $source -OutFile $destination
      New-Item -ItemType directory -Path C:\download\DMlogs
  displayName: 'Download Data Masker'

- task: CmdLine@2
  inputs:
    script: |
      "C:\download\DataMaskerforSQLServer.exe" products "Data Masker for SQL Server" log c:\download temp c:\download /IAgreeToTheEula RG_LICESE=%RGLICENSE%
  displayName: 'Install DMS Headlessly'

- task: CopyFiles@2
  inputs:
    SourceFolder: '$(Build.Repository.LocalPath)'
    Contents: '**'
    TargetFolder: 'C:\download'
  displayName: 'Copy files from working directory'

- task: CmdLine@2
  inputs:
    script: '"C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataMaskerCmdLine.exe" PARFILE=C:\download\PARFILE.txt'
  displayName: 'Run Data Masker'

Moving teams and users between multiple SQL Clone Servers

“Portability should be the default.”
Larry Wall

Greetings one-and-all! It’s been a long time hasn’t it?

Well – I’ll be honest. I’ve not been great… I’ve been struggling. With the pressures of everything going on; from National Lockdown 3: The Lockdown Strikes Back, to working hard, and trying to lose weight it’s all been a bit too much – but that’s not really any excuse for neglecting y’all is it?

As we go further into 2021 I can’t promise that you’ll see LOADS more here, but I’m determined to bring you something, and so long as it helps 1 person out there, I’ll consider it a success.

Well I was asked a question yesterday I wasn’t really sure of the answer to, but is definitely a nice-to-know and I thought I would share it with all of you.

“If I have multiple SQL Clone Servers, or even just a new SQL Clone Server, how do I move the teams and permissions from one to the other without manually setting everything up again?”

Quite the head-scratcher indeed!

Emoji Emotion GIF by Anne Arundel Community College

Well fortunately we know where to start – in SQL Clone every action, every change is recorded by virtue of a back-end SQL Server DB called the SQL Clone Config DB (you may remember this from such classics as Using things weirdly part 1!) and this holds all of the information we might need to move things over.

Let’s start by taking a look at the schema in the DB:

There is already fairly comprehensive documentation on moving the whole SQL Clone Config DB and even the SQL Clone Server to another instance here, but we ONLY care about users and permissions for this particular run, so we’ll focus on that.

The schema is reasonably straight forward and well-named thankfully which means we can isolate the tables we’ll need for the move. Anything that mentions Teams or Principles (Users) is generally going to be in our remit. I took a reasonably fresh install of SQL Clone and added a few users and teams (I’ve been watching a lot of Sword Art Online lately, so you might sense a trend with these teams…

Now when we investigate some of the tables in the config DB we’ll notice that everything is really nicely normalized and mapped together:

But what does this look like in a fresh install with NO teams defined?

Wait… no teams defined/ activated but there’s… a team. The default team.

aaron paul what GIF by Breaking Bad

It turns out once you hit the “Enable Teams” option in the SQL Clone Dashboard, that entry magically goes away though and it looks more like we’re expecting:

Now we know the lay of the land – how do we move our users? Well to answer that I’m going to work on 3 assumptions here:

  • I’m going to stick to just these 3 tables (Principles, Teams and the mappings). There’s a wider question around Images and Clones etc. that we’ve added for those teams and any overlap that may occur, but we’re working on the belief that the SQL Clone Server we’re moving to is segregated and independent (so will not have access to the same images), so it only needs the users and teams.
  • The target SQL Clone Config DB we’re moving these to is brand new, and does not already have any users/permissions or teams set up except clicking the “enable teams” button in the dash.
  • The same user has set both SQL Clone Servers up, and is therefore position 1 in the dbo.SqlClonePrinciples table, however you could easily overwrite this one value manually if needed or use a local copy with the same #1 user to generate the script initially E.g. Chris.Unwin is ID1 in both the Source and Target Config DBs below, and then just add in the missing user manually to the target config DB.

1. SQL Data Compare – my recommended

SQL Data Compare is our friend here. It knows all about constraints and identities and anything we may come across. This is a fairly small scale need however if you open it up and simply compare those 3 tables we’ve been looking at, it should be able to pick up the users and permissions to insert, and generate us a script quickly and easily we can run against the target Config DB:

I’m going to generate the SQL Script and then run it against my target – notice how Data Compare takes care of any issues we might face with constraints and identity inserts to make sure everything 100% ties up? Good isn’t it!?

This was my script:

SET NUMERIC_ROUNDABORT OFF
GO
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT, QUOTED_IDENTIFIER, ANSI_NULLS, NOCOUNT ON
GO
SET DATEFORMAT YMD
GO
SET XACT_ABORT ON
GO
SET TRANSACTION ISOLATION LEVEL Serializable
GO
BEGIN TRANSACTION

PRINT(N'Drop constraints from [dbo].[TeamSqlClonePrincipalMappings]')
ALTER TABLE [dbo].[TeamSqlClonePrincipalMappings] NOCHECK CONSTRAINT [FK_TeamSqlClonePrincipalMappings_SqlClonePrincipals_SqlClonePrincipalId]
ALTER TABLE [dbo].[TeamSqlClonePrincipalMappings] NOCHECK CONSTRAINT [FK_TeamSqlClonePrincipalMappings_Teams_TeamId]

PRINT(N'Drop constraint FK_TeamImageMappings_Teams_TeamId from [dbo].[TeamImageMappings]')
ALTER TABLE [dbo].[TeamImageMappings] NOCHECK CONSTRAINT [FK_TeamImageMappings_Teams_TeamId]

PRINT(N'Drop constraint FK_TeamInstanceMappings_Teams_TeamId from [dbo].[TeamInstanceMappings]')
ALTER TABLE [dbo].[TeamInstanceMappings] NOCHECK CONSTRAINT [FK_TeamInstanceMappings_Teams_TeamId]

PRINT(N'Add rows to [dbo].[SqlClonePrincipals]')
SET IDENTITY_INSERT [dbo].[SqlClonePrincipals] ON
INSERT INTO [dbo].[SqlClonePrincipals] ([Id], [FriendlyName], [Role], [Sid], [IsGroup]) VALUES (2, N'RED-GATE\Chris.Kerswell', 2, 0x010500000000000515000000fd43461e19525f12828ba62895680000, 0)
INSERT INTO [dbo].[SqlClonePrincipals] ([Id], [FriendlyName], [Role], [Sid], [IsGroup]) VALUES (3, N'RED-GATE\David.Ong', 3, 0x010500000000000515000000fd43461e19525f12828ba628cb7b0000, 0)
SET IDENTITY_INSERT [dbo].[SqlClonePrincipals] OFF
PRINT(N'Operation applied to 2 rows out of 2')

PRINT(N'Add rows to [dbo].[Teams]')
SET IDENTITY_INSERT [dbo].[Teams] ON
INSERT INTO [dbo].[Teams] ([Id], [Name], [Color], [IncludeAllPrincipals], [IncludeAllImages], [IncludeAllInstances]) VALUES (2, N'TeamKirito', N'#373737', 0, 0, 0)
INSERT INTO [dbo].[Teams] ([Id], [Name], [Color], [IncludeAllPrincipals], [IncludeAllImages], [IncludeAllInstances]) VALUES (3, N'TeamAsuna', N'#fc9003', 0, 1, 1)
SET IDENTITY_INSERT [dbo].[Teams] OFF
PRINT(N'Operation applied to 2 rows out of 2')

PRINT(N'Add rows to [dbo].[TeamSqlClonePrincipalMappings]')
INSERT INTO [dbo].[TeamSqlClonePrincipalMappings] ([TeamId], [SqlClonePrincipalId]) VALUES (2, 3)
INSERT INTO [dbo].[TeamSqlClonePrincipalMappings] ([TeamId], [SqlClonePrincipalId]) VALUES (3, 2)
PRINT(N'Operation applied to 2 rows out of 2')

PRINT(N'Add constraints to [dbo].[TeamSqlClonePrincipalMappings]')
ALTER TABLE [dbo].[TeamSqlClonePrincipalMappings] WITH CHECK CHECK CONSTRAINT [FK_TeamSqlClonePrincipalMappings_SqlClonePrincipals_SqlClonePrincipalId]
ALTER TABLE [dbo].[TeamSqlClonePrincipalMappings] WITH CHECK CHECK CONSTRAINT [FK_TeamSqlClonePrincipalMappings_Teams_TeamId]
ALTER TABLE [dbo].[TeamImageMappings] WITH CHECK CHECK CONSTRAINT [FK_TeamImageMappings_Teams_TeamId]
ALTER TABLE [dbo].[TeamInstanceMappings] WITH CHECK CHECK CONSTRAINT [FK_TeamInstanceMappings_Teams_TeamId]
COMMIT TRANSACTION
GO

and success! Everything has been replicated as we expect:

I also tried this on a VM that does not have the same AD access as I do locally and the users were imported successfully, but because they can’t be resolved they don’t show in the Clone dash so it doesn’t cause us any dead or invalid relationship issues.

2. & 3. Script from one DB to the other or SSIS load

Another simple-ish option would be to write the INSERT INTO SELECT FROM statement, or create a very simple SSIS package if you can connect to both – you would have to take care of the Constraints and Identity Inserts yourself, but it would be equally effective:

Important Note(s)

You will have to manually tie up any differences if you have already started using the new install with Teams and Permissions as THAT will get MUCH more complicated, especially if you have already assigned images and instances and users to teams – so this process is best done at the very start of life for the new SQL Clone Server.

Also. You will have to restart the SQL Clone Server Service after loading the users and teams in, as SQL Clone won’t recognize they’re there straight away.

Conclusion

This is going to be a rare occasion for most – the SQL Clone Server is not designed nor intended to deal with being duplicated many times. Although there is a relatively easy method for MOVING SQL Clone Servers simply moving a part of the Config like Users & Teams is definitely possible, but is not 100% fool-proof, so on said rare occasion, just be very, very careful!

Creating rollback scripts automatically with SQL Compare and SQL Data Compare in Azure DevOps (migrations approach)

“Life’s under no obligation to give us what we expect.”
Margaret Mitchell

Hello everyone and happy 2021! Wait, what? It’s FEBRUARY!?! Oops.

You may well have noticed I have not been around for a couple of months. That is because I’ve been struggling. A lot. We’ve had some home worries, and with the state of the world playing on our minds constantly, it has never felt right to sit down and blog anything other than some of the recipes I’m working on (purely to get out of my own head!) but this week I was asked about rollbacks when working in the Migrations based approach with SQL Change Automation, and I couldn’t resist writing something.

Before I get into this though, I want to be very clear in my preferences. I don’t believe database changes SHOULD ever be rolled back. Always forward. If for any reason you need to undo a change, either for a disaster or just because you want to, and other code is being rolled back – then this should be a task. A rollback branch should be created, the respective changes should be made in dev and then using your fabulous, flexible and repeatable pipeline should be merged, built, checked, tested and deployed upstream to Prod. This rolls the database forwards and does so in a way that is controlled, auditable and prevents data loss.

Jimmy Fallon Agree GIF

However there MAY be some scenarios where a rollback is unavoidable, for example if you are required to do so immediately, or you’re not sure which of the 3 changes that were just deployed actually caused something to go bang, so you want to blanket undo until you can investigate properly.

With this in mind then there are a couple of ways of rolling back in the migrations approach that can absolutely be a part of your pipeline, one or two of which I want to try and flesh out in this post. This post would of course be much shorter if we were working with the State-Based approach, as we would simply need to compare a previous artifact with our target and generate a down script, but as we know, the state-based method does indeed come with it’s own challenges.

  1. Generate down scripts up front – this is an option that is described fairly well in the SQL Change Automation documentation, and may be preferable to some, however this is limited to VS and does definitely include some additional work for developers, and potentially generates quite a lot of scripts that may have to be rationalized
  2. Ad-hoc rollback with SQL Compare – again, this is described in the documentation but they will have to be changes that are then recaptured in the development pipeline in some way, plus this method is ideal for programmable objects but not so much for table changes.

But I think we’re onto something there with a combination of these 2 approaches. Is there a way to use SQL Compare and SQL Data Compare alongside SQL Change Automation, to automatically generate down scripts as artifacts, before each Production deployment? This is what I want to investigate.

Set Up

As usual, I set up an Azure DevOps repo (this time called RollBackPipe) and created myself a set of local databases that I’ve been using to record videos lately so I now have the MaskMePlease database in Dev, Test, Staging and Production flavors. I’m doing this locally on my machine with an Azure DevOps agent I had hanging around anyway – it saved me firing up a bunch of Azure SQL DBs, and I just used SQL Clone.

Implementation

I got the pipeline running straight through, I was simply using YAML for this pipeline as I didn’t want to “faff” around with the Release stages etc. but this could also have been achieved with Classic Pipelines of course.

Ignore the terrible stage names, essentially I was doing 3 things with SQL Change Automation:

  • Database Build: Producing a Nuget Package to represent the successfully building files stored in my Git repo, to consume later in the pipeline
  • Create Release Artifact: This stage looks at the target environments (Test, STG and Production) and creates a number of reports in my local DatabaseDeploymentResources folder for me to review
  • Deploy From Release Artifact: Take the patch script that has been produced and run this against the target DB

So it was time to try and build in the SQL Compare Command Lines. Note if I was doing this on an Azure Pipelines agent I might need to run this as a container or programmatically install the command lines instead, but fortunately they’re just installed on my machine so I could just add a command line stage.

There was actually a couple of ways to do this, I can either pass in the details of the 2 databases completely via the command line (full documentation here), or I could generate an XML config file to pass in and store that in version control, or I could pre-construct a SQL Compare / Data Compare “Project” and just pass that in. The latter seemed easier, so of course I did that!

I opened up SQL Compare and did a Database to Database comparison for Staging and Production, but in the reverse direction to ensure we create a down script and not an up script!

I then saved this as a project, then using the red SQL Compare logo in the top left I was able to launch SQL Data Compare with the same project, so it loads the settings for you. The reason for doing this is that if we undo anything that was created as part of a migration, then try to redeploy it, it won’t actually recognize those features as missing, because the Migration Log table tracks all the migrations, and it will still be saying that it has been deployed. So we need to compare the MigrationLog tables to re-sync back to the previous state, before we ran the migrations.

However I DID make sure that before comparing I only the __MigrationLog table in the comparison in Tables and Mappings:

I then had to change the Comparison Key being used for the migration log table to [Custom], because the default comparison key includes the date/time executed, so this is what I went for:

I saved this project down with the Compare script:

and then following the rather helpful documentation on the command lines, I was able to make the following calls as part of my pipeline (YAML included):

- task: CmdLine@2
  inputs: 
    script: '"C:\Program Files (x86)\Red Gate\SQL Compare 14\sqlcompare.exe" /project:"C:\DatabaseDeploymentResources\MaskMePlease\_ComparisonResources\ProdToStagingDownScript.scp" /scriptfile:"C:\DatabaseDeploymentResources\MaskMePlease\%BuildId%\SchemaDowngradeScript.sql" /include:Identical'
  env:
    BuildId: $(Build.BuildId)
- task: CmdLine@2
  inputs:
    script: '"C:\Program Files (x86)\Red Gate\SQL Data Compare 14\sqldatacompare.exe" /project:"C:\DatabaseDeploymentResources\MaskMePlease\_ComparisonResources\ProdToStagingDataDownScript.sdc" /scriptfile:"C:\DatabaseDeploymentResources\MaskMePlease\%BuildId%\DataDowngradeScript.sql" /include:Identical'
  env:
    BuildId: $(Build.BuildId)

and sure enough, having this run after the deployment to staging but before the deployment to Production I was able to get the reports being produced into the artifacts folder with the relevant changes document:

So it was time to try it out! I created a BadTable on Dev and generated the migration for it, and committed it to source control:

This then triggered the pipeline to run:

and produced the correct reports and .sql scripts:

Then I ran these scripts to rollback Production (of course I could have modified them to preserve any data I needed to retain, this actually works best with Programmable objects like sprocs). Then I re-ran the deployment pipeline and the BadTable lives once again!

So does this work with multiple changes too? I created a stored procedure called “deletemeplease” and modified our BadTable to add a “randomthing” column, to see if we could roll those back:

Production after deployment:

Production after rollback:

All I have to do to roll back forwards now is run my pipeline:

and they’re back:

This image has an empty alt attribute; its file name is image-24.png
This image has an empty alt attribute; its file name is image-25.png

Conclusions

Is it perfectly possible to have a version specific script output as an artifact as part of your Azure DevOps pipelines? Yes it absolutely is.

Is it perfectly possible to rollback changes that you’ve made when taking a Migrations-First approach to database development? Yes it absolutely is.

Would I recommend this approach? No. Absolutely not.

Just because this is possible doesn’t mean it is good process. I’ve been asked about it enough times that I was more curious than anything – however after actually IMPLEMENTING this process, it is clear to see there are many aspects to it that can fall down and it makes it clearer now, more so than ever, that the best approach would be to either roll forwards from Dev, through the pipeline in an automated, auditable and controllable way, or if we just need to roll back temporarily because of other code being rolled back, then use SQL Compare ad-hoc. This pipeline was very simple, with very few changes and when i made mistakes I could rectify them easily myself – but with a larger dev team, when under pressure to fix things, and many changes to consider, this approach can very quickly fall apart.

Long story short, (I believe) we should follow and trust the documentation.

YAML from Azure DevOps:

trigger:
- main

pool: 'default'

steps:
- task: RedgateSqlChangeAutomationBuild@4
  inputs:
    operation: 'Build SQL Change Automation Project'
    sqlProj: 'RollBackPipe\RollBackPipe.sqlproj'
    packageName: 'MaskMePlease.Database'
    tempServerTypeBuild: 'localDB'
    nugetPackageVersionSelector: 'Specific'
    nugetPackageVersion: '1.0'
    nugetPackageVersionUseBuildId: true
    requiredVersionOfDlma: 'latest'

- task: RedgateSqlChangeAutomationRelease@4
  inputs:
    Operation: 'Create'
    NuGetFile: 'MaskMePlease.Database.1.0.$(Build.BuildID).nupkg'
    ExportPath: 'C:\DatabaseDeploymentResources\MaskMePlease\$(Build.BuildID)\Test'
    AppendEnvironment: false
    TargetDatabaseConnectionType: 'explicit'
    TargetDatabaseServer: '.\TOOLS'
    TargetDatabaseName: 'MaskMePlease_Test'
    TargetAuthMethod: 'windowsAuth'
    TransactionIsolationLevel: 'serializable'
    AbortOnWarning: 'none'
    requiredVersionOfDlma: 'latest'


- task: RedgateSqlChangeAutomationRelease@4
  inputs:
    Operation: 'DeployFromResources'
    ImportPath: 'C:\DatabaseDeploymentResources\MaskMePlease\$(Build.BuildID)\Test'
    AppendEnvironment: false
    TargetDatabaseConnectionType: 'explicit'
    TargetDatabaseServer: '.\TOOLS'
    TargetDatabaseName: 'MaskMePlease_Test'
    TargetAuthMethod: 'windowsAuth'
    requiredVersionOfDlma: 'latest'

- task: RedgateSqlChangeAutomationRelease@4
  inputs:
    Operation: 'Create'
    NuGetFile: 'MaskMePlease.Database.1.0.$(Build.BuildID).nupkg'
    ExportPath: 'C:\DatabaseDeploymentResources\MaskMePlease\$(Build.BuildID)\Staging'
    AppendEnvironment: false
    TargetDatabaseConnectionType: 'explicit'
    TargetDatabaseServer: '.\TOOLS'
    TargetDatabaseName: 'MaskMePlease_STG'
    TargetAuthMethod: 'windowsAuth'
    TransactionIsolationLevel: 'serializable'
    AbortOnWarning: 'none'
    requiredVersionOfDlma: 'latest'


- task: RedgateSqlChangeAutomationRelease@4
  inputs:
    Operation: 'DeployFromResources'
    ImportPath: 'C:\DatabaseDeploymentResources\MaskMePlease\$(Build.BuildID)\Staging'
    AppendEnvironment: false
    TargetDatabaseConnectionType: 'explicit'
    TargetDatabaseServer: '.\TOOLS'
    TargetDatabaseName: 'MaskMePlease_STG'
    TargetAuthMethod: 'windowsAuth'
    requiredVersionOfDlma: 'latest'

- task: CmdLine@2
  inputs: 
    script: '"C:\Program Files (x86)\Red Gate\SQL Compare 14\sqlcompare.exe" /project:"C:\DatabaseDeploymentResources\MaskMePlease\_ComparisonResources\ProdToStagingDownScript.scp" /scriptfile:"C:\DatabaseDeploymentResources\MaskMePlease\%BuildId%\SchemaDowngradeScript.sql" /include:Identical'
  env:
    BuildId: $(Build.BuildId)
- task: CmdLine@2
  inputs:
    script: '"C:\Program Files (x86)\Red Gate\SQL Data Compare 14\sqldatacompare.exe" /project:"C:\DatabaseDeploymentResources\MaskMePlease\_ComparisonResources\ProdToStagingDataDownScript.sdc" /scriptfile:"C:\DatabaseDeploymentResources\MaskMePlease\%BuildId%\DataDowngradeScript.sql" /include:Identical'
  env:
    BuildId: $(Build.BuildId)
- task: RedgateSqlChangeAutomationRelease@4
  inputs:
    Operation: 'Create'
    NuGetFile: 'MaskMePlease.Database.1.0.$(Build.BuildID).nupkg'
    ExportPath: 'C:\DatabaseDeploymentResources\MaskMePlease\$(Build.BuildID)\Production'
    AppendEnvironment: false
    TargetDatabaseConnectionType: 'explicit'
    TargetDatabaseServer: '.\TOOLS'
    TargetDatabaseName: 'MaskMePlease_Production'
    TargetAuthMethod: 'windowsAuth'
    TransactionIsolationLevel: 'serializable'
    AbortOnWarning: 'none'
    requiredVersionOfDlma: 'latest'


- task: RedgateSqlChangeAutomationRelease@4
  inputs:
    Operation: 'DeployFromResources'
    ImportPath: 'C:\DatabaseDeploymentResources\MaskMePlease\$(Build.BuildID)\Production'
    AppendEnvironment: false
    TargetDatabaseConnectionType: 'explicit'
    TargetDatabaseServer: '.\TOOLS'
    TargetDatabaseName: 'MaskMePlease_Production'
    TargetAuthMethod: 'windowsAuth'
    requiredVersionOfDlma: 'latest'

Refreshing SQL Server Development workflows with Redgate SQL Provision

“If you quit on the process, you are quitting on the result.
Idowu Koyenikan

SQL Provision is really cool. But you knew that didn’t you? It’s obvious – we get teeny-tiny clones, based on an image with completely sanitized data we can use for just about anything in dev and test, and if we break them? Boom! There’s a new one.

I’m not just talking about refreshing Dev & Test environments though, oh no! I’m talking:

  • Clones as baseline with SQL Change Automation – baseline scripts for projects are a thing of the past, goodbye invalid object headaches!
  • Clones every single time you switch a branch – keeping everything separate and not cross-pollenating database work between branches
  • Clones to check Pull Requests instead of relying solely on the code itself in Version Control

Watch my session on all 3 of these from Redgate Streamed back in August: https://www.red-gate.com/hub/events/redgate-events/redgate-streamed/redgate-streamed-global-august-26

But one question always comes up about clones in any workflow and that is – how often should I refresh Images and Clones?

This question obviously depends a lot on the process but in reality I think the question should be less about clones and more about the images themselves. Clones are transient and can be flipped at a moments notice, but the image, or the “clone tax” as Steve Jones calls it, is the thing that takes time, resource and space.

I’m going to take my own go at answering this question as I would in any customer meeting or architecture session – but if you want some excellent detailed advice and examples, check out this awesome documentation page here: https://documentation.red-gate.com/clone/how-sql-clone-improves-database-devops/self-service-disposable-databases-for-development-and-testing

Q: So, how often should we refresh it?

A: It depends on your use of the Clone – how often do you need up to date data?

As a rule of thumb though, I tend to see the following behaviours:

  • Customer Support – overnight during the working week: Where you have data that needs people to troubleshoot customer issues, it always helps to have data as close to now as possible to help resolve issues. You want an image on standby ready so that at any second a member of support can pull down a copy to look through (if it NEEDS to have sensitive data for this purpose, you can restrict who can create clones from these images by using SQL Clone’s Teams functionality)
  • BI / MIS and Report testing – once a week (if not more often): Business Intelligence and reporting workflows can just mean that you’re reading a lot from your clones in which case they should stay small and you should be able to move seamlessly between clones. But. If your ETL process puts a very heavy load on your clones (like truncating and re-populating tables) you may cause bloat and need to rethink your refresh frequency to be more often where possible, perhaps overnight so that any transformations are captured in the new images, and clones by extension.
  • “BAU” Development (Schema and Static Data Changes) – Every 1 or 2 weeks: If you’re not affecting a large number of changes to your clone, or they are limited to schema and static data only then you should be absolutely fine with a wider refresh cadence – keeping the clones around for the whole sprint or only refreshing once during the sprint can mean everyone more easily stays up to date with the same environment consistently.
  • Ad-Hoc and Test workflows – once per month: There are going to be times where you occasionally need a copy of the live DB, but the fact it is 99% similar in terms of schema, and the data is a few weeks out of date isn’t a big deal. You can pull one down from this “cold copy” for any kind of test, destructive or even to validate certain behaviors / sense check if an update or query will work. It’s also handy to maintain a slightly older copy where possible if you need to start digging into failed updates made in development, so need to have a milestone to compare from.

Again – these workflows may vary and you may have needs to be more or less frequent based on differences being recorded, bloat, space available on the fileshare etc. but generally I find customers are pretty happy with this.

Q: Once we have our refresh rate in place – how do we move developers across?

This is a great question I get a lot of the time, and it stems from the fact a developer may have made a few dozen changes to a clone, and then the frequent refresh rate blows their clones away (and they forgot to commit to version control – D’oh!) – so it’s important to bear in mind that development work, and as a result the cloning of environments is not “cut and dried“. We should give developers a chance to move across as-and-when they’re ready, so I often end up recommending the below workflow, to ease this process.

For the sake of this proposed workflow I’m assuming a couple of things:

  1. The selected workflow is BAU Development and we want to refresh once per week
  2. We have enough space available on our fileshare to allow for 2 (or more) distinct copies of the primary image
  3. Clones are being delivered to jump boxes / VMs within the network that are always connected (and not developer machines), and we can control when they are deleted
  4. We operate on a standard western work schedule where the week begins on Sunday, Saturday and Sunday are considered non-working days and developers typically work anywhere between 8am and 6pm
  5. This can all be automated using SQL Clone’s PowerShell module

Week 1 – Sunday night

  • We create Image A of Primary Database from most recent backup file onto fileshare, applying data masking

Week 1 – Monday to Friday

  • Developers X, Y and Z create their own clones of Database A as they begin the working week
  • The clones are linked to a Git repo where, using SQL Change Automation, the developers commit all changes they make to their clones throughout the week
  • Developer X finishes with their changes, makes their final commit and push on Thursday and works on a different task on Friday

Week 2 – Sunday night

  • We create Image B of Database A – with slightly more up to date (and sanitized) data and capturing any deployed changes the team committed and pushed to git previously
  • We retain Image A for now but do a check for which developers have clones remaining (Developers Y and Z) and either nudge them in the team stand up that they only have a few days left, or automate the sending of an email to those developers warning them their clones are now 1 week old

Week 2 – Monday morning

  • Developer X creates their new clone from Image B and links it to Git ready to start making changes

Week 2 – Tuesday to Friday

  • Gradually over the course of the week as Developers Y and Z finish with their tasks and commit their changes they remove their clones and create new ones from Image B
  • A final reminder, as an email or a notification in MS Teams / Slack goes out on Friday morning that any clones of Image A will be deleted over the weekend

Week 3 – Sunday night

  • Image A with no clones remaining is deleted (or any remaining clones are deleted first) and Image C is created to begin the cycle again

Conclusion

Although this workflow requires the duplication of the central image, it has a number of benefits:

  • It is easily automated using PowerShell
  • The source control process suffers minimal disruption and developers don’t need to rush to finish anything
  • We don’t accidentally destroy developer work – the onus is on the developer to ensure work is committed
  • If, for any reason the image creation process fails, you still have a persisting image, so you don’t prevent developers from doing any work / waiting for the image process to manually complete
  • Moving to newer clones is a more organic process
  • If you wanted to maintain an image throughout the week and refresh a second image overnight for more up to date data, you can simply re-purpose the above principles. This could then be used for a number of the different teams and workflows simultaneously

Bonus Point – Naming Conventions

Many people choose to append the images they create with a date stamp like Image_A_16102020 so we know when it was taken and what the latest is. This is good practice but be warned if you’re using Clones as baseline or for branch switching etc. you will need to have a persistent name else that link will break. An alternative is always having the same name for the most current image and then simply renaming the older image with the date time stamp e.g. Image_A is current, but before creation of a new Image_A, it is renamed to Image_A_16102020 – this will not disrupt the clones that already exist on it, and it allows you to always know which one is most recent.

3 RDBMS’, 3 models, 3 end-to-end deployment pipelines with Azure DevOps and Redgate Deploy

“Choice is the most powerful tool we have. Everything boils down to choice. Every choice we make shuts an infinite number of doors and opens an infinite number of doors.”
– Lori Deschene (https://tinybuddha.com/)

Picking a Set-Up

One of the hardest parts of my job is that at any moments notice we could be asked to walk through better database change management processes. That’s not the challenge, the problem is that it could be with any kind of tech stack. I might need a Git Repo of some shape or form (Azure DevOps, plain ol’ Git, Bitbucket etc.) and then a CI server of some kind (Azure DevOps, GitLab, TeamCity, Bamboo etc.) and finally something to handle releases (Azure DevOps, Octopus Deploy, Bamboo etc.) – this is fairly easy to reproduce in multiple combinations with automation, terraform etc. but when you’re actually helping someone set it up – you’ve got to know where all the bits go.

The Redgate tools work with all of these options and combinations so making sure we’re setting everything up right usually means questions about the Repo/CI/CD tooling people choose.

The commonality above and the one I run into the most for all 3 stages, is Azure DevOps. Its straightforward to understand, all in the same place and just plain fun to use (AND it supports emojis ^_^).

Finally now, we have to pick a Relational Database Management System (RDBMS) to use – Redgate Deploy is one of the newest offerings from Redgate and it comprises capabilities for “Database DevOps” across MS SQL Server, Oracle Database and 18 (well actually 19 now thanks to Flyway v7!) other RDBMs‘! So instead of choosing, I’m going to pick the two key ones there, and one of the 18 others: MSSQL, Oracle DB and PostgreSQL.

One final question I had to ask of myself was what models I wanted to use. There are a couple of choices available within the Redgate solution, specifically for MSSQL and Oracle at the moment, so I decided that I would do State based deployments for Oracle and Hybrid deployments for MSSQL, given that PostgreSQL will have to be migrations anyway. Fear not though, the setup is not hugely dissimilar when it comes to the actual pipelines!

Setting up Azure DevOps Repos

This stage was relatively easy – I simply created 3 new projects in my DefaultCollection where I’m going to put the repos for each of the DBs.

and then I created 3 readme files, and cloned all 3 git repos down onto my machine as local repos:

and we’re ready to go!

A quick note: I’m using a mixture of Azure DevOps hosted (for PostgreSQL) and Azure DevOps Server locally installed on my Virtual Machine (for MSSQL/Oracle) with a local agent present to run everything below – you can adopt this methodology or you can use the hosted version, but for the Oracle solution below at least you will need a local agent available (unless you use the DockerHub Image for Schema/Data Compare).

Microsoft SQL Server

The first thing I need to do for all of these is to pick the databases I’ll be working on – for me I’m rather lucky as our demonstration environment has a rather nifty set of databases for me to choose from!

I’m going with SQL Source Control (the MSSQL State component in Redgate Deploy) and SQL Change Automation (the MSSQL Migrations component) both plugged into Management Studio (SSMS) with a set of databases called the ScaryDBA_Dev/Test/Prod environments (which I used SQL Clone to create the copies of), in homage to the wonderful Grant Fritchey.

So the first thing we need to do is get Dev under source control – we’ve refreshed back from Prod so there shouldn’t be any differences and we’re using the Hybrid model, so we’ll need to create the State first. I do this by going to SQL Source Control in SSMS, and linking my DB to Git, creating a State Folder in the top level of my local repo as I do so:

Then once linked I go ahead and source control the initial schema (not sure how? Watch the Redgate University videos here):

Next I setup my Migrations project using SQL Change Automation, creating the Migrations folder in the same top level of my local repo, but now instead of pointing to the database, I’m pointing to my SQL Source Control generated State folder:

Now at this point we get the options to choose filters and comparison options – I would recommend if you’re not sure speak to someone at Redgate or look up the documentation – I often see people wanting to filter out Security/Users/Roles at this stage so it might be worth a look! I just carried on as I only have a few objects anyway!

Connect to the target and create a baseline script (i.e. what does Prod look like now?) again, because I have a minimal setup I’ll go straight from my “Prod” database:

Commit and push and we’re on our way – everything is in version control:

Now i may have cheated by doing MSSQL first – because now actually building and deploying the project is pretty straight forward – much like I have done in previous posts here and here I just used the SQL Change Automation plugins from the Azure DevOps marketplace to first build:

and then deploy the project:

and it all succeeded… the 2nd time around when I remembered to specify which DB I was deploying to!

Oracle Database

The first thing I need to do for all of these is to pick the schemas I’ll be working on… wait, Deja Vu! – well once again I have a little set of schemas present on the demonstration machine that will serve me just fine!

Because we’re working in the State setup, out of Redgate Deploy I’m going to use Source Control for Oracle which allows me to specify the remote repo, the folder to create and even the fact I’m using Azure DevOps Git:

(Step 1 was simply providing the connection details to my Oracle Database, hence why I was on step 2!) – I select the Schema I’ll be putting in Source Control and even get a nifty run down of the structure:

Hit next and give a name to the Project (unsurprisingly I went with HR) and then check in all of your initial objects:

Now one thing that you may have noticed if you’re following along that I should clarify (and which I forgot when setting up this blog post):

  1. You don’t need to specify the local repo you cloned down because Source Control for Oracle handles this itself in the back end, if you want it to be part of a local repo with other code in it, use the Working Folder instead
  2. If you are using Git and NOT the working folder, committing will also Push your objects to the remote – you’ve been warned!

As above, I now head over to Pipelines and hit Create New Pipeline! I check out my repo with the schema objects in it, and add a job to my agent. But what am I going to pick? Well unlike SQL Change Automation there’s not a plugin available on the Azure DevOps Marketplace, we’ll need some good old fashioned command line calls!

First, let’s clean out the CI Schema, I’m going to use the script to remove all objects from the Redgate documentation site and make a call to run the script using sqlplus (I’m storing the file locally but you could even include it in your repo under a build folder maybe?)

echo on
Call exit | sqlplus hr/[passwordredacted]@//localhost:1521/CI @C:\DemoFiles\DropAllObjects.sql
echo off

Next we’ll add a call to the cmdline of Schema Compare for Oracle to build the database from our repo, using the files that were checked out by the agent (an Azure DevOps pre-defined environment variable) – again we’re using a similar script from the Redgate DevOps for Oracle site but because we’re deploying ALL objects from version control, we don’t really want a report per say, this is just to test the schema can be built from the ground up:

"C:\Program Files\Red Gate\Schema Compare for Oracle 5\sco.exe" /deploy /source $(Build.SourcesDirectory)\Schema{HR} /target SYSTEM/[passwordredacted]@localhost:1521/CI{HR} AS SYSDBA /indirect 

echo Build database from state:%ERRORLEVEL%
 
rem IF ERRORLEVEL is 0 then there are no changes.
IF %ERRORLEVEL% EQU 0 (
    echo ========================================================================================================
    echo == Warning - No schema changes detected. == echo ========================================================================================================
)
 
rem IF ERRORLEVEL is 61 there are differences, which we expect.
IF %ERRORLEVEL% EQU 61 (
    echo ========================================================================================================
    echo == Objects were found and built. ==
    echo ========================================================================================================
    rem Reset the ERRORLEVEL to 0 so the build doesn't fail 
    SET ERRORLEVEL=0
)

and assuming this all works, we’ll package up the files into a zip and publish them as an artifact so we can consume them at the release stage!

and guess what? It all just worked *cough* on build #23 when I got the syntax right finally…

Of course we can add additional stages to the build as well, such as a check for Invalid Objects and some Unit Testing, but I’ll keep this pretty lean for now!

Now, just like we did for MSSQL we’re going to set up a new deployment pipeline, grab the artifact we’re publishing from the build, enable a CD trigger and we’re going to deploy to, in this case, Acceptance.

Let’s first create a job on the agent to unpack the zip file and see how far we get – I’m just going to dump them in a DeploymentState folder in the working directory:

and… awww thanks Azure DevOps, I needed to hear that!

and now we add yet another command line task, but this one is just going to do a comparison, it’s not actually going to deploy anything – because we’re going to add a manual intervention step to approve the deployment first! I had a little help again from the Redgate docs for this one, because I keep having to catch cmdline error codes – if I was wise like Alex Yates I probably would have just handled this with PowerShell…

echo off
rem  We generate the deployment preview script artifact here
"C:\Program Files\Red Gate\Schema Compare for Oracle 5\sco.exe" /abortonwarnings:high /b:hdre /i:sdwgvac /source $(System.DefaultWorkingDirectory)\DeploymentState\Schema{HR} /target SYSTEM/Redgate1@localhost:1521/Acceptance{HR} AS SYSDBA /indirect /report:$(System.DefaultWorkingDirectory)\DeploymentState\changes_report.html /scriptfile:$(System.DefaultWorkingDirectory)\DeploymentState\deployment_script.sql > $(System.DefaultWorkingDirectory)\DeploymentState\Warnings.txt

echo Warnings exit code:%ERRORLEVEL%
rem In the unlikely event that the exit code is 63, this mean that a deployment warning has exceeded the allowable threshold (eg, data loss may have been detected)
rem If this occurs it is recommended to review the script, customize it, and perform a manual deployment
 
IF %ERRORLEVEL% EQU 0 (
    echo ========================================================================================================
    echo == No schema changes to deploy
    echo ========================================================================================================

    GOTO END
)
 
IF %ERRORLEVEL% EQU 63 (
    echo ========================================================================================================
    echo == High Severity Warnings Detected! Aborting the build. 
    echo == Review the deployment script and consider deploying manually.
    echo ========================================================================================================
    rem Aborting deployment because high severity warnings were detected
        SET ERRORLEVEL=1
    GOTO END
)
 
rem This is the happy path where we've identified changes and not detected any high warnings
IF %ERRORLEVEL% EQU 61 (
    echo ========================================================================================================
    echo == Schema changes found to deploy - generating deployment script for review
    echo ========================================================================================================
    rem Set ERROLEVEL to 0 so the build job doesn't fail
	SET ERRORLEVEL=0
    GOTO END
)
 
:END
EXIT /B %ERRORLEVEL%

I then throw in an agentless job (Manual Intervention Step) and then finally (once I have reviewed the deployment report that is produced) one further cmdline call to actually run the deployment script again my Acceptance target:

echo on
Call exit | sqlplus hr/[passwordRedacted]@//localhost:1521/Acceptance @$(System.DefaultWorkingDirectory)/DeploymentState\deployment_script.sql
echo off

I have saved my pipeline, now it’s time to test. So I’m going to make a very quick change (so that something is produced) and see what happens…

Boom. Pipeline done.

One word on this though – I haven’t included an awful lot of frills (error handling, checks, NuGet instead of Zip etc.) so you’re free to bulk this out how you see fit, but by golly it works! Also make sure you tick this on the second Agent Job, else it’ll wipe out your working directory – something that obviously definitely did not happen to me…

PostgreSQL

This one might be cheating a little. As you know I’ve already setup a CI pipeline with Flyway before, using Azure SQL DBs and the Flyway Docker container as part of the build, and in some cases even tSQLt for Unit Testing too! But this is PostgreSQL, and this is a new blog post, darn it!

Still getting your head around Flyway? Check out the Redgate University videos!

I started out by creating myself a PostgreSQL 10 server in the Azure Portal, because:

  • I can
  • I didn’t want a local install of PostgreSQL
  • I’m not self sabotaging

and I set up a Dev and Test database on it – that is once I remembered to allow my client IP address *sigh* and then connected from Azure Data Studio:

I already have some basic scripts from my last demo that I can use – so I pulled down the latest version of Flyway (V7) and unzipped it into my files:

Then I created a SQL folder in my local repository for the PostgreSQLPipeline (and popped a couple of migrations in – I’m using the StackOverflow scripts, adapted for PostgreSQL from Kendra Little’s GitHub, thank you Kendra!) – in the previous posts we’ve had to source control the state or initial baseline of the database, however as we’re using Flyway for PostgreSQL this requires us to create and name/order the migrations ourselves, so we have plenty of control over that – hence why we can jump straight into building some scripts this time around.

Finally, I pointed the config file for Flyway to that, also taking the opportunity to point it at my Dev DB using the PostgreSQL JDBC:

Now i didn’t really NEED to do this step and try things out against Dev, because I already have the scripts, so I could have just started building the pipeline – but it’s always worthwhile getting local validation first by running things against Dev and then migrating up!

A quick Flyway Info later and we were good to go – the scripts are recognized so we know we’ve set everything up correctly.

One git add / commit / push and everything is in my repo:

Now as you may know from my other post we can do 1 of 2 things here – we can now either build what we eventually push to the repo using a cmdline call (like we did with the Oracle build) to a machine where we have Flyway installed, or we can use the Docker image.

I’m actually going to use Docker again but this time, instead of specifying the various credentials in a config file that was getting passed to the container, I’m actually going to use Azure DevOps environment variables and build the connection string that way – it’s really easy to keep the variables secret in Pipelines, so I can pass my JDBC connection, complete with Username and Password, as well as my Flyway license key, without worrying someone might get hold of them!

I’m actually going to build against a live PostgreSQL database before deploying, so I also created another DB for me to use: demodb_ci

I actually stole the YAML from my previous pipeline (below) and updated the variables accordingly:

trigger:
- master
 
pool:
  vmImage: 'ubuntu-latest'
 
steps:
- task: DockerInstaller@0
  inputs:
    dockerVersion: '17.09.0-ce'
  displayName: 'Install Docker'

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run --rm -v $(FLYWAY_LOCATIONS):/flyway/sql flyway/flyway clean -url=$(JDBC) -licenseKey=$(licenseKey) -user=$(userName) -password=$(password) -enterprise 
  displayName: 'Clean build schema'
 
- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run --rm -v $(FLYWAY_LOCATIONS):/flyway/sql flyway/flyway migrate -url=$(JDBC) -licenseKey=$(licenseKey) -user=$(userName) -password=$(password) -enterprise 
  displayName: 'Run flyway build'

and it ran just fine! Well actually it failed first, because I didn’t have permissions from the IP address that the container was running from, but fortunately Azure has a handy switch in the PostgreSQL Server settings to simply allow Azure Services traffic through the firewall:

Once that was sorted, the first stage (as always) is to download Docker and then we have 2 Flyway containers steps:

1 – Clean the schema and make sure the database is empty
2 – Migrate the schema changes

Then we have two options – we could do like we did in the Oracle pipeline and zip up the files, spitting them out at Release stage and consuming them, either calling Flyway from the command line, or we can go ahead and promote our deployment using the same pipeline.

I’m lazy, so I’m going for the latter!

In a normal “production like” situation I would probably take the opportunity to test and check etc. like I did above, but let’s keep this super lean – if the build works, I trust the deployment. Lets go ahead and deploy to Production – I’ll add this as an additional task in my YAML:

- task: Bash@3
  inputs:
    targettype: 'inline'
    script: docker run --rm -v $(FLYWAY_LOCATIONS):/flyway/sql flyway/flyway migrate -url=$(ProdJDBC) -licenseKey=$(licenseKey) -user=$(userName) -password=$(password) -enterprise 
  displayName: 'Promote to Production'

And the deployment was successful! Phew – I think I’ve earned a cup of tea!

Conclusion

In this blog post I have demonstrated 3 different (and initially very simple*) approaches to the source control and deployment of database changes – but there’s actually a much wider combination we could have adopted – all 3 models with MSSQL, all 3 models with Oracle, and Migrations for up to 18 other systems like DB2, Snowflake and even SAP HANA! But what did I need to do ALL of this? A single solutionRedgate Deploy**.

Thank you for stopping by! Have an amazing week!

*There is a lot missing from the code I have provided, like additional error handling, tests etc. and all of the above CAN be improved – but did we manage to build and deploy across three different systems all using Azure DevOps? Yes we did. If you intend on using any of the above, please ensure you build in the necessary controls and process around it and always pick what is best for you and your team.

**Redgate Deploy is going from strength to strength, expect to see a wide range of improvements made over the coming months – I won’t be surprised if this blog post is already out of date by the time I finish writing it – that’s how awesome the teams working on all of this are!