I spend a great deal of my time talking about Data Masking, don’t believe me? Checkout here, here, here and even here. I talk about it a LOT, but I’m always talking about Static Data Masking (SDM), which is the process of masking the data at the file level itself so it is irreversibly altered – this is fabulous for non-Production environments like Dev and Test, especially when you pair it with a good cloning technology.
But what about Staging / Production environments?
I often work with teams to implement SDM and one of the fastest routes to successfully generating your masking sets for cloned environments is, of course, SQL Data Catalog (or cataloging solution of your choice) – you’ve already put the effort in to classify your columns and figure out where sensitive information exists within your databases and instances… so doesn’t it make sense that we can just use THAT as a source of truth and generate masking from there?
Funnily enough though we wouldn’t want to use SDM for Production (and potentially Staging) environments though – as it irreversibly changes the data, it’s just going to completely mess up all of our Prod data. To tackle this then, many people I work with turn to Dynamic Data Masking.
Dynamic Data Masking (DDM)
DDM is a method of masking the data based on your access rights to the data. As far as customers see they have access to their data through our site or application no issues, but if anyone else needs to query that data, or different people need to see different results when querying environments, DDM has been their way to go.
Whilst a lot of people like to pick up on some of the well known downsides of DDM, it’s not like you’re entrusting the entire security of an environment to it alone – there are a ton of measures we can put in place and DDM is just one; like an ex-colleague of mine (someone very wise whom I admired greatly and am still sad to this day I no longer get to work with them) used to say: “It’s about building a defensible position. The more you do the easier it is to prove you’re doing something and the more likely you are to BE protected.“
So when a customer asked this week if it was possible to configure Dynamic Data Masking from SQL Data Catalog (because they’d seen the “Treatment Intent” category and the tag that clearly states “Dynamic Data Masking”), just like we’re able to configure Static Data Masking, well now that was a challenge I couldn’t turn down!
The SQL Data Catalog Taxonomy Page – Treatment Intent Category showing Dynamic Data Masking
DDM in Azure SQL DB
Configuring Dynamic Masking in Azure SQL DB is fairly straight forward through the Azure portal, you can go to your Azure SQL DB, click Dynamic Data Masking and it gives you the option to simply pick and save columns to apply Data Masking to, and to whom these rules apply / don’t apply:
DDM in the Azure Portal for the DMDatabase_Dev, with masks configured on customer_firstname and customer_email
Now I’m not expert on DDM, and Redgate Data Masker for SQL Server is not a DDM solution (so I’ve only ever needed to know SDM really) I don’t pretend to be, but it seemed that I had everything I needed to tie Catalog into DDM.
PowerShell time!
I’ve written so much PowerShell to get classifications out of Data Catalog at this point it’s become second nature, but if you’re using the SDC PowerShell module and you need a reference you can view it here: https://documentation.red-gate.com/sql-data-catalog/automation-with-powershell but the standard “stuff” goes:
Pull down the PoSh module
Connect to catalog where it’s installed using an Auth token
Grab out the classifications with Get-ClassificationColumn
Shrink this down to just the columns we care about based on the tags
But the Az PowerShell cmdlets were honestly just as easy to use! I was surprised how easy it was to get up and running:
Connect to my Azure subscription
Get the current list of columns already with DDM masks
Remove these from the Catalog list
Update the remaining columns to use the default Mask
This was the full code I ended up using:
# This script is intended to be used with Azure SQL Database and Redgate SQL Data Catalog, however you are welcome to adapt and edit as required
# It will pull columns out of azure that are already being masked, and a list of columns that need to be masked with DDM
# It will then rationalise these, and configure Default DDM masks for any columns not already being masked on that Azure SQL DB
#Variables for Azure SQL DB & Catalog
$ResourceGroup = "DMDb"
$ServerName = "dmnonproduction" # Your instance minus .database.windows.net
$instance = "dmnonproduction.database.windows.net" # The instance or logical SQL Server as displayed in SQL Data Catalog
$DatabaseName = "DMDatabase_Dev"
$CatalogServer="http://pse-lt-chrisu:15156" # Your SQL Data Catalog location, leave off the trailing "/"
$authToken="REDACTED" # Your SQL Data Catalog Auth Token
$AzureSub = "Redacted" # Your Sub ID
# Get the SQL Data Catalog PowerShell Module & Connect
Invoke-WebRequest -Uri "$CatalogServer/powershell" -OutFile 'data-catalog.psm1' -Headers @{"Authorization"="Bearer $authToken"}
Import-Module .\data-catalog.psm1 -Force
Connect-SqlDataCatalog -ServerUrl $CatalogServer -AuthToken $authToken
#Connect to your Azure Subscription
Connect-AzAccount -Subscription $AzureSub
#Get current active DDM Masks from Azure
$DdmMasks = Get-AzSqlDatabaseDataMaskingRule `
-ResourceGroupName $ResourceGroup `
-ServerName $ServerName `
-DatabaseName $DatabaseName
$ListOfDDMColumns = $DdmMasks | ForEach-Object {$_.SchemaName + '.' + $_.TableName + '.' + $_.ColumnName}
#Get columns from Catalog currently marked with "Dynamic Data Masking" as a treatment intent
$CatalogColumns = Get-ClassificationColumn `
-InstanceName $instance `
-DatabaseName $DatabaseName | Where-Object {$_.tags.name -eq "Dynamic data masking"}
#Filter down to a list of columns that need to be masked, that currently aren't configured with DDM
$ColumnsToDDM = $CatalogColumns | Where-Object {($_.SchemaName + '.' + $_.TableName + '.' + $_.ColumnName) -notin $ListOfDDMColumns }
#Set default DDM Masks for identified columns
$ColumnsToDDM | ForEach-Object { `
New-AzSqlDatabaseDataMaskingRule -ResourceGroupName $ResourceGroup `
-ServerName $ServerName `
-DatabaseName $DatabaseName `
-SchemaName $_.schemaName `
-TableName $_.tableName `
-ColumnName $_.columnName `
-MaskingFunction "Default"
}
And this was the result – here were the two columns I had already being masked:
Customer Firstname and Customer Email with DDM Masks Configured
These were the columns I had marked as Dynamic Data Masking in Data Catalog:
Customer firstname, lastname, street addres and email all marked for DDM in Catalog
and after running the PowerShell it deduced that the delta was street_address and lastname and created the default DDM mask for them in Azure:
All columns now being masked dynamically
Considerations
1 – I have used the default mask in this process, however if you wanted to configure the mask (as per the link to the docs above) to be specific numbers or format you could absolutely do this, simply by modifying the PowerShell to look at the Data Type and then just passing into a different New-AzSqlDatabaseDataMaskingRule for each of those types.
2 – This only applies to Azure SQL DB and does not take into account the considerations when using DDM on say, a 2017 SQL Server Instance running on a VM – however you could use the same approach to pass the columns into some dynamic T-SQL which would in turn run the correct command to add DDM to that/those column(s)
3 – I would still use Static Data Masking (SDM) for non-Production environments, because if anyone bypasses the DDM they will have access to the full data, which we don’t really NEED in less secure non-Prod environments anyway, so Static might well be the way to go!
“Quality is not an act, it is a habit.” – Aristotle
I’ve always been thoroughly impressed with the static analysis code report that you can get from SQL Change Automation when it creates a Release Artifact prior to deploying upstream, and the fact we can use SQL Prompt to carry out on-the-fly static analysis as we write our T-SQL code in SSMS, but it has always struck me as odd that there doesn’t appear to be a way to include these checks at build time.
This seems like the perfect opportunity to build the DB from scratch (check), run unit tests (check) and check no poor coding practices have been checked in onto our branch (uh… not check?)
One of my esteemed colleagues asked this question yesterday an being me I just couldn’t wait to have a go – is there a way to include this cmdline as part of an Azure DevOps build and FAIL the build if issues are found?
First things’ first: How does it work?
I downloaded the command line and it was pretty simple to get my head around. you can use windows or sql auth, you can point it at a scripts folder or a live DB and you can output the results to xml, html or the console if you’d like. Simple.
SQL Code Analysis Console Output: 4 issues with the DMDatabase2019 found
Code Analysis HTML Output: 4 issues found with the DMDatabase2019, but much easier to read this time
Next: Wrap it in some PowerShell
I am neither a windows command line nor a PowerShell guru, but my first instinct when I’m going to include something like this in a build or deployment is to use PowerShell. It’s easy and mostly non-confusing to pass variables through the pipeline to PowerShell, easy to customize scripts and include if/else logic and to capture exit codes.
So I wrapped the cmdline call in some PowerShell (take it, it’s all yours!) that made it easier to:
Import the XML output generated by Code Analysis
Count the number of issues generated
Exit with error code 1 (failure) if any issues were found
Exit clean with 0 if no issues are found
#Set Path for Code Guard, server/instance, database and output location for XML
$codeGuardPath = "C:\Users\chris.unwin\Downloads\SCG-2019-10-17-11-40-22-46"
$server = "REDACTED"
$database = "REDACTED"
$outLocation = "$codeGuardPath\myoutput.xml"
#$user = "REDACTED"
#$password = "REDACTED"
#Invoke SQL Code Guard against the DB (could be the Build Database)
& "$codeGuardPath\SqlCodeGuard.Cmd.exe" /s:$server /d:$database /out:$outLocation #/u:$user /p:$password
#Import output xml file and count contents
$blah = [xml](Get-Content -Path $outLocation)
$files = $blah.SelectNodes('//file') #Objects with issues
$issues = $blah.SelectNodes('//file/issue') #Total issues themselves
#If number of issues > zero, exit with non-zero exit code and output list of affected objects
if ( $issues.count -gt 0 ) {
"You have: " + $files.count + " objects, containing a total of: " + $issues.count + " issues."
$files.fullname
"Please review the xml output for more information."
exit 1
}
#Else continue with no issues
else {
"No code issues discovered."
}
This works like a charm:
Static Analysis Output in PowerShell: 2 objects with 30 issues
Finally: Put it in a pipeline
Unsurprisingly, putting it in a pipeline was the easiest part. I took a pipeline I had that was already running a local Azure DevOps agent in my default pool, made sure SQL Code Analysis was present in the correct directory on that machine and voilà! The build fails if it finds any issues.
N.B. I just stuck the raw PowerShell in the pipeline, you would be better off passing connection and location variables to the PowerShell script using custom Azure DevOps environment (and secret) variables.Oh, and having a better install directory for Code Analysis than Downloads, my bad…
Example YAML containing the PowerShell step
Code Analysis finds issues, so the PowerShell exists with Code 1, causing the build to fail
Time to choose.
There are a couple of things I’ve assumed here – I’m running it locally on a server and running against a database, and that database could be one that I’ve just built during my CI pipeline, absolutely – but you could also run this against a scripts folder / set of scripts, so even if you don’t yet have a full build / deploy process, or you have a different process that works for you – you can still include SQL Code Analysis with fairly minimal overhead! Enjoy!
“There is always a way to go if you look for it.” – Ernest A. Fitzgerald
As many of you know, I really enjoy talking about Data Masking. I fundamentally believe it is an absolutely ESSENTIAL part of Test Data Management and specifically the provisioning of Pre-Production environments. If you hold sensitive PII/PHI/PCI in your Production environments, you have no excuses for porting any of that back into Dev and Test.
I also stand firmly behind the belief (and it is just that, my belief) that masking is much safer than anything you can achieve with encryption or limiting access alone. Static masking, when done correctly, means that even if all other security measures are bypassed, or we accidentally expose data somehow, it doesn’t matter because the PII/PHI has been wiped, but it is still fundamentally useful for development and testing environments.
When we use SQL Provision to spin up our environments in non-Prod, generally we know they have to be in IaaS VMs or On-Prem, due to the nature of the technology (keep your eyes open on THIS because big changes coming soon *squeals in excitement*) but sometimes we’re working entirely with PaaS DBs and VMs don’t even come into our vocabulary – but we still need them to be masked for non-Prod use.
I have some data in an Azure SQL DB – a copy of DMDatabase (get it from my GitHub):
DM_CUSTOMER table in the DMDatabase
and I would like to get this masked before I create copies of it. Now I have written scripts in the past to make sure that Azure SQL DBs can be masked and then copied into non-Prod environments and you can get those scripts here, and you could easily wrap something like the below INTO a script like that – but much of the work I’ve done on this always involves having Data Masker installed somewhere and invoked on that machine – a VM, my laptop, whatever.
So, how do we avoid having to install Data Masker each time?
Data Masker for SQL Server does not (at time of writing) have publicly available a docker container or method for installing using Choco or something like that – once we do, trust me, I’ll be blogging about it A LOT as I will be very excited. But there is an install file available: https://download.red-gate.com/installers/DataMaskerforSQLServer/ and this might be enough.
The Process
Initially I created a Masking Set locally for the DMDatabase copy in Azure, it was nice and simple just masking a few of the fields on the CUSTOMER table:
It relies on SQL auth for the connection and I’m remembering the credentials, though these could be subbed in later on using the PARFILE (documentation on that here).
Next I put this masking set into a newly created Azure DevOps Git repo, which I cloned down onto my local machine – and then committed and pushed my changes up into Azure DevOps:
Now that this was all in ADO, it was time to set up a pipeline for it – so let’s jump into some YAML! Now, Data Masker currently needs to run on a Windows machine so we’ll set the pool to Windows-Latest:
trigger:
- main
pool:
vmImage: windows-latest
The next step is to grab the installer – which I know I can easily do with PowerShell. I’m sure you could be more clever about this, but with limited time I chose the most recent version and hard coded that in to a PowerShell Invoke-WebRequest cmdlet:
This PowerShell task is going to grab the most recent exe for Data Masker and pull it down into C:\download on the hosted VM we’re using for the pipeline and it’s also going to create a directory for the Data Masker logs as well (if you wanted to extend the YAML at the end to wrap these logs up and publish them as a result of the pipeline masking, then go for it and tweet me to let me know!)
Next we have to extract and install Data Masker from that download, which is fairly easy to do with a cmd call:
Note I’m using the guidance from this page, making sure to accept the EULA and I’m passing in my Redgate License as a variable, that I have specified for the pipeline and kept secret. This will put Data Masker in the default location in C:\Program Files\… and means we will then be able to call it. I do however need to make sure that this now works. So I saved my pipeline and ran it to see what happened:
Fabulous, that all works nicely. Now to pass the DMSMaskSet file to Data Masker and get it to run – ah but I forgot, I’m going to need a PARFILE as per the cmdline documentation that specifies where the files etc. are for the run. So I create my PARFILE.txt as such:
MASKINGSET=C:\download\DMDB_MaskingTime.DMSMaskSet
LOGFILEDIR=C:\download\DMlogs
DATASETSDIR=C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataSets
REPORTSDIR=C:\download\DMlogs
INTERIM_REPORTS=false
As you can see, very simple. But to make sure these files are “simply” in the right folder, and because I don’t have time to explore how I could pass an environment variable into the .txt file itself, I’m going to add a quick file copy task to make sure that my masking set and PARFILE both make it into that location:
- task: CopyFiles@2
inputs:
SourceFolder: '$(Build.Repository.LocalPath)'
Contents: '**'
TargetFolder: 'C:\download'
displayName: 'Copy files from working directory'
Of course now that we have Data Masker installed on the pipeline VM, the masking set AND the PARFILE… let’s get masking!
- task: CmdLine@2
inputs:
script: '"C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataMaskerCmdLine.exe" PARFILE=C:\download\PARFILE.txt'
displayName: 'Run Data Masker'
I agree the command is now less than impressive given all the prep work, but when you run all of this in it’s entirety…
Big success! But let’s check the data of course and make sure it is as we expect:
It all works, Lynne has been masked to Muna (and the rest has been masked too!), and I didn’t need to have Data Masker installed on a VM in my environment, with an Azure DevOps self hosted agent to run it – I could just do it programmatically. #Winning.
I’ll put the full YAML below however some caveats:
You cannot run this pipeline a LOT in a short space of time, I found that out. The RedGate downloads page is not particularly designed for this process, so it should probably be run sparingly otherwise the pipeline times out because the server, by way of balancing, will prevent you from pulling the file too often.
You will need to update the PowerShell step reasonably often if you want the DMS most recent installer, or make it more futureproof to grab the latest version – I just didn’t investigate that.
Data Masker is dependent on the power of the machine it is running on in MANY ways and Azure DevOps pipeline VMs are not particularly the most powerful beasts in the world – so if you have a lot of masking that needs doing, I would be weary of this method and might stick to an Application Server VM you’ve got hanging around, just in case.
This is likely to change a lot in the future so may not be relevant when you’re reading it after a few months – so before implementing anything like this, if it’s been a few months, contact me or Redgate and just confirm that there isn’t some better way of doing this, if I haven’t already blogged it!
Greetings one-and-all! It’s been a long time hasn’t it?
Well – I’ll be honest. I’ve not been great… I’ve been struggling. With the pressures of everything going on; from National Lockdown 3: The Lockdown Strikes Back, to working hard, and trying to lose weight it’s all been a bit too much – but that’s not really any excuse for neglecting y’all is it?
As we go further into 2021 I can’t promise that you’ll see LOADS more here, but I’m determined to bring you something, and so long as it helps 1 person out there, I’ll consider it a success.
Well I was asked a question yesterday I wasn’t really sure of the answer to, but is definitely a nice-to-know and I thought I would share it with all of you.
“If I have multiple SQL Clone Servers, or even just a new SQL Clone Server, how do I move the teams and permissions from one to the other without manually setting everything up again?”
Quite the head-scratcher indeed!
Well fortunately we know where to start – in SQL Clone every action, every change is recorded by virtue of a back-end SQL Server DB called the SQL Clone Config DB (you may remember this from such classics as Using things weirdly part 1!) and this holds all of the information we might need to move things over.
Let’s start by taking a look at the schema in the DB:
There is already fairly comprehensive documentation on moving the whole SQL Clone Config DB and even the SQL Clone Server to another instance here, but we ONLY care about users and permissions for this particular run, so we’ll focus on that.
The schema is reasonably straight forward and well-named thankfully which means we can isolate the tables we’ll need for the move. Anything that mentions Teams or Principles (Users) is generally going to be in our remit. I took a reasonably fresh install of SQL Clone and added a few users and teams (I’ve been watching a lot of Sword Art Online lately, so you might sense a trend with these teams…
Now when we investigate some of the tables in the config DB we’ll notice that everything is really nicely normalized and mapped together:
But what does this look like in a fresh install with NO teams defined?
Wait… no teams defined/ activated but there’s… a team. The default team.
It turns out once you hit the “Enable Teams” option in the SQL Clone Dashboard, that entry magically goes away though and it looks more like we’re expecting:
Now we know the lay of the land – how do we move our users? Well to answer that I’m going to work on 3 assumptions here:
I’m going to stick to just these 3 tables (Principles, Teams and the mappings). There’s a wider question around Images and Clones etc. that we’ve added for those teams and any overlap that may occur, but we’re working on the belief that the SQL Clone Server we’re moving to is segregated and independent (so will not have access to the same images), so it only needs the users and teams.
The target SQL Clone Config DB we’re moving these to is brand new, and does not already have any users/permissions or teams set up except clicking the “enable teams” button in the dash.
The same user has set both SQL Clone Servers up, and is therefore position 1 in the dbo.SqlClonePrinciples table, however you could easily overwrite this one value manually if needed or use a local copy with the same #1 user to generate the script initially E.g. Chris.Unwin is ID1 in both the Source and Target Config DBs below, and then just add in the missing user manually to the target config DB.
1. SQL Data Compare – my recommended
SQL Data Compare is our friend here. It knows all about constraints and identities and anything we may come across. This is a fairly small scale need however if you open it up and simply compare those 3 tables we’ve been looking at, it should be able to pick up the users and permissions to insert, and generate us a script quickly and easily we can run against the target Config DB:
I’m going to generate the SQL Script and then run it against my target – notice how Data Compare takes care of any issues we might face with constraints and identity inserts to make sure everything 100% ties up? Good isn’t it!?
This was my script:
SET NUMERIC_ROUNDABORT OFF
GO
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT, QUOTED_IDENTIFIER, ANSI_NULLS, NOCOUNT ON
GO
SET DATEFORMAT YMD
GO
SET XACT_ABORT ON
GO
SET TRANSACTION ISOLATION LEVEL Serializable
GO
BEGIN TRANSACTION
PRINT(N'Drop constraints from [dbo].[TeamSqlClonePrincipalMappings]')
ALTER TABLE [dbo].[TeamSqlClonePrincipalMappings] NOCHECK CONSTRAINT [FK_TeamSqlClonePrincipalMappings_SqlClonePrincipals_SqlClonePrincipalId]
ALTER TABLE [dbo].[TeamSqlClonePrincipalMappings] NOCHECK CONSTRAINT [FK_TeamSqlClonePrincipalMappings_Teams_TeamId]
PRINT(N'Drop constraint FK_TeamImageMappings_Teams_TeamId from [dbo].[TeamImageMappings]')
ALTER TABLE [dbo].[TeamImageMappings] NOCHECK CONSTRAINT [FK_TeamImageMappings_Teams_TeamId]
PRINT(N'Drop constraint FK_TeamInstanceMappings_Teams_TeamId from [dbo].[TeamInstanceMappings]')
ALTER TABLE [dbo].[TeamInstanceMappings] NOCHECK CONSTRAINT [FK_TeamInstanceMappings_Teams_TeamId]
PRINT(N'Add rows to [dbo].[SqlClonePrincipals]')
SET IDENTITY_INSERT [dbo].[SqlClonePrincipals] ON
INSERT INTO [dbo].[SqlClonePrincipals] ([Id], [FriendlyName], [Role], [Sid], [IsGroup]) VALUES (2, N'RED-GATE\Chris.Kerswell', 2, 0x010500000000000515000000fd43461e19525f12828ba62895680000, 0)
INSERT INTO [dbo].[SqlClonePrincipals] ([Id], [FriendlyName], [Role], [Sid], [IsGroup]) VALUES (3, N'RED-GATE\David.Ong', 3, 0x010500000000000515000000fd43461e19525f12828ba628cb7b0000, 0)
SET IDENTITY_INSERT [dbo].[SqlClonePrincipals] OFF
PRINT(N'Operation applied to 2 rows out of 2')
PRINT(N'Add rows to [dbo].[Teams]')
SET IDENTITY_INSERT [dbo].[Teams] ON
INSERT INTO [dbo].[Teams] ([Id], [Name], [Color], [IncludeAllPrincipals], [IncludeAllImages], [IncludeAllInstances]) VALUES (2, N'TeamKirito', N'#373737', 0, 0, 0)
INSERT INTO [dbo].[Teams] ([Id], [Name], [Color], [IncludeAllPrincipals], [IncludeAllImages], [IncludeAllInstances]) VALUES (3, N'TeamAsuna', N'#fc9003', 0, 1, 1)
SET IDENTITY_INSERT [dbo].[Teams] OFF
PRINT(N'Operation applied to 2 rows out of 2')
PRINT(N'Add rows to [dbo].[TeamSqlClonePrincipalMappings]')
INSERT INTO [dbo].[TeamSqlClonePrincipalMappings] ([TeamId], [SqlClonePrincipalId]) VALUES (2, 3)
INSERT INTO [dbo].[TeamSqlClonePrincipalMappings] ([TeamId], [SqlClonePrincipalId]) VALUES (3, 2)
PRINT(N'Operation applied to 2 rows out of 2')
PRINT(N'Add constraints to [dbo].[TeamSqlClonePrincipalMappings]')
ALTER TABLE [dbo].[TeamSqlClonePrincipalMappings] WITH CHECK CHECK CONSTRAINT [FK_TeamSqlClonePrincipalMappings_SqlClonePrincipals_SqlClonePrincipalId]
ALTER TABLE [dbo].[TeamSqlClonePrincipalMappings] WITH CHECK CHECK CONSTRAINT [FK_TeamSqlClonePrincipalMappings_Teams_TeamId]
ALTER TABLE [dbo].[TeamImageMappings] WITH CHECK CHECK CONSTRAINT [FK_TeamImageMappings_Teams_TeamId]
ALTER TABLE [dbo].[TeamInstanceMappings] WITH CHECK CHECK CONSTRAINT [FK_TeamInstanceMappings_Teams_TeamId]
COMMIT TRANSACTION
GO
and success! Everything has been replicated as we expect:
I also tried this on a VM that does not have the same AD access as I do locally and the users were imported successfully, but because they can’t be resolved they don’t show in the Clone dash so it doesn’t cause us any dead or invalid relationship issues.
2. & 3. Script from one DB to the other or SSIS load
Another simple-ish option would be to write the INSERT INTO SELECT FROM statement, or create a very simple SSIS package if you can connect to both – you would have to take care of the Constraints and Identity Inserts yourself, but it would be equally effective:
Important Note(s)
You will have to manually tie up any differences if you have already started using the new install with Teams and Permissions as THAT will get MUCH more complicated, especially if you have already assigned images and instances and users to teams – so this process is best done at the very start of life for the new SQL Clone Server.
Also. You will have to restart the SQL Clone Server Service after loading the users and teams in, as SQL Clone won’t recognize they’re there straight away.
Conclusion
This is going to be a rare occasion for most – the SQL Clone Server is not designed nor intended to deal with being duplicated many times. Although there is a relatively easy method for MOVING SQL Clone Servers simply moving a part of the Config like Users & Teams is definitely possible, but is not 100% fool-proof, so on said rare occasion, just be very, very careful!
“Life’s under no obligation to give us what we expect.” – Margaret Mitchell
Hello everyone and happy 2021! Wait, what? It’s FEBRUARY!?! Oops.
You may well have noticed I have not been around for a couple of months. That is because I’ve been struggling. A lot. We’ve had some home worries, and with the state of the world playing on our minds constantly, it has never felt right to sit down and blog anything other than some of the recipes I’m working on (purely to get out of my own head!) but this week I was asked about rollbacks when working in the Migrations based approach with SQL Change Automation, and I couldn’t resist writing something.
Before I get into this though, I want to be very clear in my preferences. I don’t believe database changes SHOULD ever be rolled back. Always forward. If for any reason you need to undo a change, either for a disaster or just because you want to, and other code is being rolled back – then this should be a task. A rollback branch should be created, the respective changes should be made in dev and then using your fabulous, flexible and repeatable pipeline should be merged, built, checked, tested and deployed upstream to Prod. This rolls the database forwards and does so in a way that is controlled, auditable and prevents data loss.
However there MAY be some scenarios where a rollback is unavoidable, for example if you are required to do so immediately, or you’re not sure which of the 3 changes that were just deployed actually caused something to go bang, so you want to blanket undo until you can investigate properly.
With this in mind then there are a couple of ways of rolling back in the migrations approach that can absolutely be a part of your pipeline, one or two of which I want to try and flesh out in this post. This post would of course be much shorter if we were working with the State-Based approach, as we would simply need to compare a previous artifact with our target and generate a down script, but as we know, the state-based method does indeed come with it’s own challenges.
Generate down scripts up front – this is an option that is described fairly well in the SQL Change Automation documentation, and may be preferable to some, however this is limited to VS and does definitely include some additional work for developers, and potentially generates quite a lot of scripts that may have to be rationalized
Ad-hoc rollback with SQL Compare – again, this is described in the documentation but they will have to be changes that are then recaptured in the development pipeline in some way, plus this method is ideal for programmable objects but not so much for table changes.
But I think we’re onto something there with a combination of these 2 approaches. Is there a way to use SQL Compare and SQL Data Compare alongside SQL Change Automation, to automatically generate down scripts as artifacts, before each Production deployment? This is what I want to investigate.
Set Up
As usual, I set up an Azure DevOps repo (this time called RollBackPipe) and created myself a set of local databases that I’ve been using to record videos lately so I now have the MaskMePlease database in Dev, Test, Staging and Production flavors. I’m doing this locally on my machine with an Azure DevOps agent I had hanging around anyway – it saved me firing up a bunch of Azure SQL DBs, and I just used SQL Clone.
Implementation
I got the pipeline running straight through, I was simply using YAML for this pipeline as I didn’t want to “faff” around with the Release stages etc. but this could also have been achieved with Classic Pipelines of course.
Ignore the terrible stage names, essentially I was doing 3 things with SQL Change Automation:
Database Build: Producing a Nuget Package to represent the successfully building files stored in my Git repo, to consume later in the pipeline
Create Release Artifact: This stage looks at the target environments (Test, STG and Production) and creates a number of reports in my local DatabaseDeploymentResources folder for me to review
Deploy From Release Artifact: Take the patch script that has been produced and run this against the target DB
So it was time to try and build in the SQL Compare Command Lines. Note if I was doing this on an Azure Pipelines agent I might need to run this as a container or programmatically install the command lines instead, but fortunately they’re just installed on my machine so I could just add a command line stage.
There was actually a couple of ways to do this, I can either pass in the details of the 2 databases completely via the command line (full documentation here), or I could generate an XML config file to pass in and store that in version control, or I could pre-construct a SQL Compare / Data Compare “Project” and just pass that in. The latter seemed easier, so of course I did that!
I opened up SQL Compare and did a Database to Database comparison for Staging and Production, but in the reverse direction to ensure we create a down script and not an up script!
I then saved this as a project, then using the red SQL Compare logo in the top left I was able to launch SQL Data Compare with the same project, so it loads the settings for you. The reason for doing this is that if we undo anything that was created as part of a migration, then try to redeploy it, it won’t actually recognize those features as missing, because the Migration Log table tracks all the migrations, and it will still be saying that it has been deployed. So we need to compare the MigrationLog tables to re-sync back to the previous state, before we ran the migrations.
However I DID make sure that before comparing I only the __MigrationLog table in the comparison in Tables and Mappings:
I then had to change the Comparison Key being used for the migration log table to [Custom], because the default comparison key includes the date/time executed, so this is what I went for:
I saved this project down with the Compare script:
and then following the rather helpful documentation on the command lines, I was able to make the following calls as part of my pipeline (YAML included):
and sure enough, having this run after the deployment to staging but before the deployment to Production I was able to get the reports being produced into the artifacts folder with the relevant changes document:
So it was time to try it out! I created a BadTable on Dev and generated the migration for it, and committed it to source control:
This then triggered the pipeline to run:
and produced the correct reports and .sql scripts:
Then I ran these scripts to rollback Production (of course I could have modified them to preserve any data I needed to retain, this actually works best with Programmable objects like sprocs). Then I re-ran the deployment pipeline and the BadTable lives once again!
So does this work with multiple changes too? I created a stored procedure called “deletemeplease” and modified our BadTable to add a “randomthing” column, to see if we could roll those back:
Production after deployment:
Production after rollback:
All I have to do to roll back forwards now is run my pipeline:
and they’re back:
Conclusions
Is it perfectly possible to have a version specific script output as an artifact as part of your Azure DevOps pipelines? Yes it absolutely is.
Is it perfectly possible to rollback changes that you’ve made when taking a Migrations-First approach to database development? Yes it absolutely is.
Would I recommend this approach? No. Absolutely not.
Just because this is possible doesn’t mean it is good process. I’ve been asked about it enough times that I was more curious than anything – however after actually IMPLEMENTING this process, it is clear to see there are many aspects to it that can fall down and it makes it clearer now, more so than ever, that the best approach would be to either roll forwards from Dev, through the pipeline in an automated, auditable and controllable way, or if we just need to roll back temporarily because of other code being rolled back, then use SQL Compare ad-hoc. This pipeline was very simple, with very few changes and when i made mistakes I could rectify them easily myself – but with a larger dev team, when under pressure to fix things, and many changes to consider, this approach can very quickly fall apart.
“If you quit on the process, you are quitting on the result.“ – Idowu Koyenikan
SQL Provision is really cool. But you knew that didn’t you? It’s obvious – we get teeny-tiny clones, based on an image with completely sanitized data we can use for just about anything in dev and test, and if we break them? Boom! There’s a new one.
I’m not just talking about refreshing Dev & Test environments though, oh no! I’m talking:
Clones as baseline with SQL Change Automation – baseline scripts for projects are a thing of the past, goodbye invalid object headaches!
Clones every single time you switch a branch – keeping everything separate and not cross-pollenating database work between branches
Clones to check Pull Requests instead of relying solely on the code itself in Version Control
But one question always comes up about clones in any workflow and that is – how often should I refresh Images and Clones?
This question obviously depends a lot on the process but in reality I think the question should be less about clones and more about the images themselves. Clones are transient and can be flipped at a moments notice, but the image, or the “clone tax” as Steve Jones calls it, is the thing that takes time, resource and space.
A: It depends on your use of the Clone – how often do you need up to date data?
As a rule of thumb though, I tend to see the following behaviours:
Customer Support – overnight during the working week: Where you have data that needs people to troubleshoot customer issues, it always helps to have data as close to now as possible to help resolve issues. You want an image on standby ready so that at any second a member of support can pull down a copy to look through (if it NEEDS to have sensitive data for this purpose, you can restrict who can create clones from these images by using SQL Clone’s Teams functionality)
BI / MIS and Report testing – once a week (if not more often): Business Intelligence and reporting workflows can just mean that you’re reading a lot from your clones in which case they should stay small and you should be able to move seamlessly between clones. But. If your ETL process puts a very heavy load on your clones (like truncating and re-populating tables) you may cause bloat and need to rethink your refresh frequency to be more often where possible, perhaps overnight so that any transformations are captured in the new images, and clones by extension.
“BAU” Development (Schema and Static Data Changes) – Every 1 or 2 weeks: If you’re not affecting a large number of changes to your clone, or they are limited to schema and static data only then you should be absolutely fine with a wider refresh cadence – keeping the clones around for the whole sprint or only refreshing once during the sprint can mean everyone more easily stays up to date with the same environment consistently.
Ad-Hoc and Test workflows – once per month: There are going to be times where you occasionally need a copy of the live DB, but the fact it is 99% similar in terms of schema, and the data is a few weeks out of date isn’t a big deal. You can pull one down from this “cold copy” for any kind of test, destructive or even to validate certain behaviors / sense check if an update or query will work. It’s also handy to maintain a slightly older copy where possible if you need to start digging into failed updates made in development, so need to have a milestone to compare from.
Again – these workflows may vary and you may have needs to be more or less frequent based on differences being recorded, bloat, space available on the fileshare etc. but generally I find customers are pretty happy with this.
Q: Once we have our refresh rate in place – how do we move developers across?
This is a great question I get a lot of the time, and it stems from the fact a developer may have made a few dozen changes to a clone, and then the frequent refresh rate blows their clones away (and they forgot to commit to version control – D’oh!) – so it’s important to bear in mind that development work, and as a result the cloning of environments is not “cut and dried“. We should give developers a chance to move across as-and-when they’re ready, so I often end up recommending the below workflow, to ease this process.
For the sake of this proposed workflow I’m assuming a couple of things:
The selected workflow is BAU Development and we want to refresh once per week
We have enough space available on our fileshare to allow for 2 (or more) distinct copies of the primary image
Clones are being delivered to jump boxes / VMs within the network that are always connected (and not developer machines), and we can control when they are deleted
We operate on a standard western work schedule where the week begins on Sunday, Saturday and Sunday are considered non-working days and developers typically work anywhere between 8am and 6pm
We create Image A of Primary Database from most recent backup file onto fileshare, applying data masking
Week 1 – Monday to Friday
Developers X, Y and Z create their own clones of Database A as they begin the working week
The clones are linked to a Git repo where, using SQL Change Automation, the developers commit all changes they make to their clones throughout the week
Developer X finishes with their changes, makes their final commit and push on Thursday and works on a different task on Friday
Week 2 – Sunday night
We create Image B of Database A – with slightly more up to date (and sanitized) data and capturing any deployed changes the team committed and pushed to git previously
We retain Image A for now but do a check for which developers have clones remaining (Developers Y and Z) and either nudge them in the team stand up that they only have a few days left, or automate the sending of an email to those developers warning them their clones are now 1 week old
Week 2 – Monday morning
Developer X creates their new clone from Image B and links it to Git ready to start making changes
Week 2 – Tuesday to Friday
Gradually over the course of the week as Developers Y and Z finish with their tasks and commit their changes they remove their clones and create new ones from Image B
A final reminder, as an email or a notification in MS Teams / Slack goes out on Friday morning that any clones of Image A will be deleted over the weekend
Week 3 – Sunday night
Image A with no clones remaining is deleted (or any remaining clones are deleted first) and Image C is created to begin the cycle again
Conclusion
Although this workflow requires the duplication of the central image, it has a number of benefits:
It is easily automated using PowerShell
The source control process suffers minimal disruption and developers don’t need to rush to finish anything
We don’t accidentally destroy developer work – the onus is on the developer to ensure work is committed
If, for any reason the image creation process fails, you still have a persisting image, so you don’t prevent developers from doing any work / waiting for the image process to manually complete
Moving to newer clones is a more organic process
If you wanted to maintain an image throughout the week and refresh a second image overnight for more up to date data, you can simply re-purpose the above principles. This could then be used for a number of the different teams and workflows simultaneously
Bonus Point – Naming Conventions
Many people choose to append the images they create with a date stamp like Image_A_16102020 so we know when it was taken and what the latest is. This is good practice but be warned if you’re using Clones as baseline or for branch switching etc. you will need to have a persistent name else that link will break. An alternative is always having the same name for the most current image and then simply renaming the older image with the date time stamp e.g. Image_A is current, but before creation of a new Image_A, itis renamed to Image_A_16102020 – this will not disrupt the clones that already exist on it, and it allows you to always know which one is most recent.
“Choice is the most powerful tool we have. Everything boils down to choice. Every choice we make shuts an infinite number of doors and opens an infinite number of doors.” – Lori Deschene (https://tinybuddha.com/)
PLEASE NOTE (edit 18/12/2021): Two of the three (and soon the third) models below have been superceded by Flyway Desktop – please see my posts on Flyway Desktop setup (first here) and shadow database setup here to setup any workflows past today’s date, the below should only be used for legacy purposes or with Flyway Teams for PostgreSQL.
Picking a Set-Up
One of the hardest parts of my job is that at any moments notice we could be asked to walk through better database change management processes. That’s not the challenge, the problem is that it could be with any kind of tech stack. I might need a Git Repo of some shape or form (Azure DevOps, plain ol’ Git, Bitbucket etc.) and then a CI server of some kind (Azure DevOps, GitLab, TeamCity, Bamboo etc.) and finally something to handle releases (Azure DevOps, Octopus Deploy, Bamboo etc.) – this is fairly easy to reproduce in multiple combinations with automation, terraform etc. but when you’re actually helping someone set it up – you’ve got to know where all the bits go.
The Redgate tools work with all of these options and combinations so making sure we’re setting everything up right usually means questions about the Repo/CI/CD tooling people choose.
The commonality above and the one I run into the most for all 3 stages, is Azure DevOps. Its straightforward to understand, all in the same place and just plain fun to use (AND it supports emojis ^_^).
Finally now, we have to pick a Relational Database Management System (RDBMS) to use – Redgate Deploy is one of the newest offerings from Redgate and it comprises capabilities for “Database DevOps” across MS SQL Server, Oracle Database and 18 (well actually 19 now thanks to Flyway v7!) other RDBMs‘! So instead of choosing, I’m going to pick the two key ones there, and one of the 18 others: MSSQL, Oracle DB and PostgreSQL.
One final question I had to ask of myself was what models I wanted to use. There are a couple of choices available within the Redgate solution, specifically for MSSQL and Oracle at the moment, so I decided that I would do State based deployments for Oracle and Hybrid deployments for MSSQL, given that PostgreSQL will have to be migrations anyway. Fear not though, the setup is not hugely dissimilar when it comes to the actual pipelines!
Setting up Azure DevOps Repos
This stage was relatively easy – I simply created 3 new projects in my DefaultCollection where I’m going to put the repos for each of the DBs.
and then I created 3 readme files, and cloned all 3 git repos down onto my machine as local repos:
and we’re ready to go!
A quick note: I’m using a mixture of Azure DevOps hosted (for PostgreSQL) and Azure DevOps Server locally installed on my Virtual Machine (for MSSQL/Oracle) with a local agent present to run everything below – you can adopt this methodology or you can use the hosted version, but for the Oracle solution below at least you will need a local agent available (unless you use the DockerHub Image for Schema/Data Compare).
Microsoft SQL Server
The first thing I need to do for all of these is to pick the databases I’ll be working on – for me I’m rather lucky as our demonstration environment has a rather nifty set of databases for me to choose from!
I’m going with SQL Source Control (the MSSQL State component in Redgate Deploy) and SQL Change Automation (the MSSQL Migrations component) both plugged into Management Studio (SSMS) with a set of databases called the ScaryDBA_Dev/Test/Prod environments (which I used SQL Clone to create the copies of), in homage to the wonderful Grant Fritchey.
So the first thing we need to do is get Dev under source control – we’ve refreshed back from Prod so there shouldn’t be any differences and we’re using the Hybrid model, so we’ll need to create the State first. I do this by going to SQL Source Control in SSMS, and linking my DB to Git, creating a State Folder in the top level of my local repo as I do so:
Then once linked I go ahead and source control the initial schema (not sure how? Watch the Redgate University videos here):
Next I setup my Migrations project using SQL Change Automation, creating the Migrations folder in the same top level of my local repo, but now instead of pointing to the database, I’m pointing to my SQL Source Control generated State folder:
Now at this point we get the options to choose filters and comparison options – I would recommend if you’re not sure speak to someone at Redgate or look up the documentation – I often see people wanting to filter out Security/Users/Roles at this stage so it might be worth a look! I just carried on as I only have a few objects anyway!
Connect to the target and create a baseline script (i.e. what does Prod look like now?) again, because I have a minimal setup I’ll go straight from my “Prod” database:
Commit and push and we’re on our way – everything is in version control:
Now i may have cheated by doing MSSQL first – because now actually building and deploying the project is pretty straight forward – much like I have done in previous posts here and here I just used the SQL Change Automation plugins from the Azure DevOps marketplace to first build:
and then deploy the project:
and it all succeeded… the 2nd time around when I remembered to specify which DB I was deploying to!
Oracle Database
The first thing I need to do for all of these is to pick the schemas I’ll be working on… wait, Deja Vu! – well once again I have a little set of schemas present on the demonstration machine that will serve me just fine!
Because we’re working in the State setup, out of Redgate Deploy I’m going to use Source Control for Oracle which allows me to specify the remote repo, the folder to create and even the fact I’m using Azure DevOps Git:
(Step 1 was simply providing the connection details to my Oracle Database, hence why I was on step 2!) – I select the Schema I’ll be putting in Source Control and even get a nifty run down of the structure:
Hit next and give a name to the Project (unsurprisingly I went with HR) and then check in all of your initial objects:
Now one thing that you may have noticed if you’re following along that I should clarify (and which I forgot when setting up this blog post):
You don’t need to specify the local repo you cloned down because Source Control for Oracle handles this itself in the back end, if you want it to be part of a local repo with other code in it, use the Working Folder instead
If you are using Git and NOT the working folder, committing will also Push your objects to the remote – you’ve been warned!
As above, I now head over to Pipelines and hit Create New Pipeline! I check out my repo with the schema objects in it, and add a job to my agent. But what am I going to pick? Well unlike SQL Change Automation there’s not a plugin available on the Azure DevOps Marketplace, we’ll need some good old fashioned command line calls!
First, let’s clean out the CI Schema, I’m going to use the script to remove all objects from the Redgate documentation site and make a call to run the script using sqlplus (I’m storing the file locally but you could even include it in your repo under a build folder maybe?)
echo on
Call exit | sqlplus hr/[passwordredacted]@//localhost:1521/CI @C:\DemoFiles\DropAllObjects.sql
echo off
Next we’ll add a call to the cmdline of Schema Compare for Oracle to build the database from our repo, using the files that were checked out by the agent (an Azure DevOps pre-defined environment variable) – again we’re using a similar script from the Redgate DevOps for Oracle site but because we’re deploying ALL objects from version control, we don’t really want a report per say, this is just to test the schema can be built from the ground up:
"C:\Program Files\Red Gate\Schema Compare for Oracle 5\sco.exe" /deploy /source $(Build.SourcesDirectory)\Schema{HR} /target SYSTEM/[passwordredacted]@localhost:1521/CI{HR} AS SYSDBA /indirect
echo Build database from state:%ERRORLEVEL%
rem IF ERRORLEVEL is 0 then there are no changes.
IF %ERRORLEVEL% EQU 0 (
echo ========================================================================================================
echo == Warning - No schema changes detected. == echo ========================================================================================================
)
rem IF ERRORLEVEL is 61 there are differences, which we expect.
IF %ERRORLEVEL% EQU 61 (
echo ========================================================================================================
echo == Objects were found and built. ==
echo ========================================================================================================
rem Reset the ERRORLEVEL to 0 so the build doesn't fail
SET ERRORLEVEL=0
)
and assuming this all works, we’ll package up the files into a zip and publish them as an artifact so we can consume them at the release stage!
and guess what? It all just worked *cough* on build #23 when I got the syntax right finally…
Of course we can add additional stages to the build as well, such as a check for Invalid Objects and some Unit Testing, but I’ll keep this pretty lean for now!
Now, just like we did for MSSQL we’re going to set up a new deployment pipeline, grab the artifact we’re publishing from the build, enable a CD trigger and we’re going to deploy to, in this case, Acceptance.
Let’s first create a job on the agent to unpack the zip file and see how far we get – I’m just going to dump them in a DeploymentState folder in the working directory:
and… awww thanks Azure DevOps, I needed to hear that!
and now we add yet another command line task, but this one is just going to do a comparison, it’s not actually going to deploy anything – because we’re going to add a manual intervention step to approve the deployment first! I had a little help again from the Redgate docs for this one, because I keep having to catch cmdline error codes – if I was wise like Alex Yates I probably would have just handled this with PowerShell…
echo off
rem We generate the deployment preview script artifact here
"C:\Program Files\Red Gate\Schema Compare for Oracle 5\sco.exe" /abortonwarnings:high /b:hdre /i:sdwgvac /source $(System.DefaultWorkingDirectory)\DeploymentState\Schema{HR} /target SYSTEM/Redgate1@localhost:1521/Acceptance{HR} AS SYSDBA /indirect /report:$(System.DefaultWorkingDirectory)\DeploymentState\changes_report.html /scriptfile:$(System.DefaultWorkingDirectory)\DeploymentState\deployment_script.sql > $(System.DefaultWorkingDirectory)\DeploymentState\Warnings.txt
echo Warnings exit code:%ERRORLEVEL%
rem In the unlikely event that the exit code is 63, this mean that a deployment warning has exceeded the allowable threshold (eg, data loss may have been detected)
rem If this occurs it is recommended to review the script, customize it, and perform a manual deployment
IF %ERRORLEVEL% EQU 0 (
echo ========================================================================================================
echo == No schema changes to deploy
echo ========================================================================================================
GOTO END
)
IF %ERRORLEVEL% EQU 63 (
echo ========================================================================================================
echo == High Severity Warnings Detected! Aborting the build.
echo == Review the deployment script and consider deploying manually.
echo ========================================================================================================
rem Aborting deployment because high severity warnings were detected
SET ERRORLEVEL=1
GOTO END
)
rem This is the happy path where we've identified changes and not detected any high warnings
IF %ERRORLEVEL% EQU 61 (
echo ========================================================================================================
echo == Schema changes found to deploy - generating deployment script for review
echo ========================================================================================================
rem Set ERROLEVEL to 0 so the build job doesn't fail
SET ERRORLEVEL=0
GOTO END
)
:END
EXIT /B %ERRORLEVEL%
I then throw in an agentless job (Manual Intervention Step) and then finally (once I have reviewed the deployment report that is produced) one further cmdline call to actually run the deployment script again my Acceptance target:
echo on
Call exit | sqlplus hr/[passwordRedacted]@//localhost:1521/Acceptance @$(System.DefaultWorkingDirectory)/DeploymentState\deployment_script.sql
echo off
I have saved my pipeline, now it’s time to test. So I’m going to make a very quick change (so that something is produced) and see what happens…
Boom. Pipeline done.
One word on this though – I haven’t included an awful lot of frills (error handling, checks, NuGet instead of Zip etc.) so you’re free to bulk this out how you see fit, but by golly it works! Also make sure you tick this on the second Agent Job, else it’ll wipe out your working directory – something that obviously definitely did not happen to me…
PostgreSQL
This one might be cheating a little. As you know I’ve already setup a CI pipeline with Flyway before, using Azure SQL DBs and the Flyway Docker container as part of the build, and in some cases even tSQLt for Unit Testing too! But this is PostgreSQL, and this is a new blog post, darn it!
I started out by creating myself a PostgreSQL 10 server in the Azure Portal, because:
I can
I didn’t want a local install of PostgreSQL
I’m not self sabotaging
and I set up a Dev and Test database on it – that is once I remembered to allow my client IP address *sigh* and then connected from Azure Data Studio:
I already have some basic scripts from my last demo that I can use – so I pulled down the latest version of Flyway (V7) and unzipped it into my files:
Then I created a SQL folder in my local repository for the PostgreSQLPipeline (and popped a couple of migrations in – I’m using the StackOverflow scripts, adapted for PostgreSQL from Kendra Little’s GitHub, thank you Kendra!) – in the previous posts we’ve had to source control the state or initial baseline of the database, however as we’re using Flyway for PostgreSQL this requires us to create and name/order the migrations ourselves, so we have plenty of control over that – hence why we can jump straight into building some scripts this time around.
Finally, I pointed the config file for Flyway to that, also taking the opportunity to point it at my Dev DB using the PostgreSQL JDBC:
Now i didn’t really NEED to do this step and try things out against Dev, because I already have the scripts, so I could have just started building the pipeline – but it’s always worthwhile getting local validation first by running things against Dev and then migrating up!
A quick Flyway Info later and we were good to go – the scripts are recognized so we know we’ve set everything up correctly.
One git add / commit / push and everything is in my repo:
Now as you may know from my other post we can do 1 of 2 things here – we can now either build what we eventually push to the repo using a cmdline call (like we did with the Oracle build) to a machine where we have Flyway installed, or we can use the Docker image.
I’m actually going to use Docker again but this time, instead of specifying the various credentials in a config file that was getting passed to the container, I’m actually going to use Azure DevOps environment variables and build the connection string that way – it’s really easy to keep the variables secret in Pipelines, so I can pass my JDBC connection, complete with Username and Password, as well as my Flyway license key, without worrying someone might get hold of them!
I’m actually going to build against a live PostgreSQL database before deploying, so I also created another DB for me to use: demodb_ci
I actually stole the YAML from my previous pipeline (below) and updated the variables accordingly:
and it ran just fine! Well actually it failed first, because I didn’t have permissions from the IP address that the container was running from, but fortunately Azure has a handy switch in the PostgreSQL Server settings to simply allow Azure Services traffic through the firewall:
Once that was sorted, the first stage (as always) is to download Docker and then we have 2 Flyway containers steps:
1 – Clean the schema and make sure the database is empty 2 – Migrate the schema changes
Then we have two options – we could do like we did in the Oracle pipeline and zip up the files, spitting them out at Release stage and consuming them, either calling Flyway from the command line, or we can go ahead and promote our deployment using the same pipeline.
I’m lazy, so I’m going for the latter!
In a normal “production like” situation I would probably take the opportunity to test and check etc. like I did above, but let’s keep this super lean – if the build works, I trust the deployment. Lets go ahead and deploy to Production – I’ll add this as an additional task in my YAML:
And the deployment was successful! Phew – I think I’ve earned a cup of tea!
Conclusion
In this blog post I have demonstrated 3 different (and initially very simple*) approaches to the source control and deployment of database changes – but there’s actually a much wider combination we could have adopted – all 3 models with MSSQL, all 3 models with Oracle, and Migrations for up to 18 other systems like DB2, Snowflake and even SAP HANA! But what did I need to do ALL of this? A single solution – Redgate Deploy**.
Thank you for stopping by! Have an amazing week!
*There is a lot missing from the code I have provided, like additional error handling, tests etc. and all of the above CAN be improved – but did we manage to build and deploy across three different systems all using Azure DevOps? Yes we did. If you intend on using any of the above, please ensure you build in the necessary controls and process around itand always pick what is best for you and your team.
**Redgate Deploy is going from strength to strength, expect to see a wide range of improvements made over the coming months – I won’t be surprised if this blog post is already out of date by the time I finish writing it – that’s how awesome the teams working on all of this are!
“Keep it simple, stupid!” – My year 12 & 13 English Lit. Teacher
Recently I’ve been fascinated with something really cool. A couple of my colleagues at Redgate wrote a GitHook which allows you to easily switch branches using Redgate SQL Clone. You can see the hook here with full instructions – and I thought it was pretty neat.
But it got me thinking – I’ve posted a lot about when people are using just Azure SQL DBs (PaaS), about Masking and DB Change Automation, but when you’re using AzureSQL for Dev and Test DBs as well as Prod, you still don’t have the same agility one would expect from a local copy, like a clone.
But, the above GitHook leverages PowerShell (among some other fancy wizardry) so, what if we could do this exact same thing, using the PowerShell Az module to dynamically create and switch Azure SQL DBs in our own private resource groups every time we checkout a branch?
I don’t imagine it would be fast because I’m restricted to using very low tier SQL DBs by my tiny allowance of (as Kendra Little calls them) “Azure Bucks”, but it should absolutely be possible!
So I decided to write a PowerShell script to do just that.
The first question I had to come up with an answer to was, how do I replace the Clone “Image” in this process, because I need something that is effectively a copy of our Production (or as near as possible) environment so we have something to base EVERY copy from – so I created the idea of a Golden Copy within the script; effectively this golden copy could be created by copying masking and copying back down from Production using something like Redgate Data Masker and my scripts here in GitHub but as a stop-gap, if it doesn’t find one in the Resource Group and Server you select, it will simply create one from your Dev DB. Best answer I could think of, you’re welcome to improve it!
All that remained was effectively to go through and just mimic the functionality of the Clone script but using Az: so if you are switching to a new branch where you don’t already have an existing Dev DB, then you get a new copy of Golden. If you’re switching to a branch you’ve checked out before, it renames the DBs to swap you back to the correct branch.
Here is an empty AzureSQL DB called DMDatabase_Dev:
When I now issue the git checkout “feature/newfeature” command it asks me to sign in to my Azure account:
and then gets to work:
And… that was it really.
I now have an Azure SQL DB called DMDatabase_Dev_master as I switched from the master branch, and I have a branch new DMDatabase_Dev DB that I can use for my featurebranch. You’ll notice I didn’t include -b in my git command, let’s assume a colleague is already working on this branch. I can now just update my copy (of my golden copy) with their work:
And we’re good to go!
But now if I switch back to my main branch, the object is gone and I can carry on with work on this branch:
It was really straightforward I can’t believe I haven’t seen this in use in more places, but hey guess what? The PowerShell is yours right here if you want it:
You should have a Dev DB and you should update the values at the top of the script with the Dev DB name, server and resource group it is in
The script make reference to and creates a Golden copy DB so that you have something you should always be creating from, for consistency – so when you get started, create your own “Golden copy” back from Test/UAT or something if you can – maybe using the script mentioned above – it should be the name of your Dev DB appended with “_Golden”
If you want to change how it is authenticated so you don’t have to enter your credentials each time, then go for it – this was just the simplest method for me (and it’s currently 11:05pm so I’m going to bed!)
Feel free to improve it, I’m sure there are plenty improvements that can be made, but it’s a starter for 10 for anyone out there just getting started with development in Azure SQL. Plus it’s kinda neat!
“Is there any way that I can use Redgate SQL Change Automation with Visual Studio based SSDT?“
It’s always a really difficult question to answer because fundamentally SQL Source Control (Redgate’s state based tool) and SSDT (Microsoft’s state based tool) functionally seek to do the same thing, making them competitors. However there are, on the odd occasion, good reasons as to why I am asked the question and one of those same scenarios came up today:
Our developers work in Visual Studio and have already been using SSDT for a long time, it works for them, we just want to extend it with Migrations to handle complex changes.
So the option here is, leave it as it is, or try to work with both. Not always going to be my first choice but it got me thinking.
Starting from a memory
A few months ago, when life was “normal” and I was working in my office with *gulp* PEOPLE, I tried to make this scenario work by simply linking SQL Change Automation to the project folder created by SSDT but unfortunately it was riddled with problems. The SSDT importer and repo look like this:
And if you point SQL Change Automation at the local repo with this in it will correctly tell you:
Now of course this wasn’t unexpected. It’s not designed to work this way, is it? No. But way back then I did figure out, shrewdly, that if I used SQL Source Control to carry out an initial commit just to a working folder, it would generate a RedGate.ssc and RedGateDatabaseInfo.xml file and you can copy them into the SSDT repo to trick SQL Change Automation into thinking that it’s a SQL Source Control repo… unfortunately this trick no longer works. Sad.
Add a hop and a step
But what got me thinking today was the context with which the question was asked. It was more about separation of duties. Once the developers have effectively done their job and delivered the change into the repo, their job was effectively done! “That’s how it should look moving forwards. What’s next?” – and then I had an idea.
Given that SSDT allows you to push and pull the code and apply it to your own database, what is stopping us from using SQL Change Automation to pick up on the changes against the database we sync our changes to from our SSDT project?
Genius. Evil genius.
So I created a new Database to simulate having another developer on my instance and gave it to Peter Parker:
You can then do a schema compare to another DB from your project, effectively PULL down changes from the remote to your local repository, and then sync them back up to your local development DB; this is how Devs stay up to date with each other but could, in this methodology, be how DBAs or senior developers pull down the changes to their local DB, where they test the new state, and then generate a new migration from it.
So I made a change on my dev database and captured it in the project right click on the project name > schema compare > dev db compare to project > update) and then committed and pushed:
and sure enough my repository was updated:
But then I simulated pulling down the change and applying it to Peter Parker’s DB (again using Schema Compare) and then I created a SQL Change Automation project in VS, in the same solution but pointing the project to a migrations folder in the repo:
Yes I accidentally called the project Database1 don’t remind me I’m embarrassed enough!
Then I added my baseline database:
It created the baseline and the project immediately with no issues and picked up on the changes I had made using SSDT:
and I was able to commit my project and changes into my repository in Azure DevOps:
It was just that easy! Now what this means for the development process is that developers _could_ feasibly work with SSDT, as they are comfortable with it, and then more senior members of the team can generate migration scripts from there, building the database from scratch and deploying in a reliable, repeatable fashion.
Just to prove to you my build even ran green from this:
So in summary what this gives us is the ability to adapt a regular SSDT workflow, one that developers are comfortable with and which has been in the team for months or years, add in the knowledge of DBAs or team leads, a greater separation of duties for high risk schema changes, and the control and flexibility (and peace of mind) that comes with a migrations based deployment process.
Nice.
The fine print
I’m sure by now you’ve realized something: this is not, nor will it ever (I believe) be a supported workflow. If you implement the above in a production sense for something other than just testing then it’s not something you’ll be able to get help with from one of the Redgate engineers if you need to troubleshoot.
Also, if you’re going to introduce a sequence of changes like this to achieve the hybrid model, it does make more sense that you implement SQL Source Control for the state side (given that it’s right there in the SQL Toolbelt with SQL Change Automation anyway).
But IS IT POSSIBLE to achieve a similar, Visual Studio based* hybrid workflow with SSDT and SQL Change Automation by using a database to ‘hop’ the changes across?
Yes, it certainly looks that way!
*If you’re planning on using SSDT in Azure Data Studio too then this workflow could also work for you, SQL Change Automation is present in SSMS and VS so it’s really up to you!
“You never know what you can do until you try, and very few try unless they have to.” – C.S. Lewis
Well I don’t have to, but many of the people I speak to on a daily basis are moving into GitLab, so it’s about time I tried it! You can find here testament to the mistakes I make as I try to set up a full end-to-end database change management process with SQL Change Automation and GitLab.
PLEASE NOTE (edit 18/12/2021): If you are just starting out with Redgate source control and deployment processes and are looking into using GitLab for database deployments, please read my updated blog post here using the newest source control and deployment technology Flyway Desktop)
Will it all work perfectly? I don’t doubt that everything will fall over at some point, but let’s see how we get on all the same, and hopefully if you’re setting up this same pipeline, you’ll be able to avoid the errors and failings I inevitably cause! So here we go!
Let’s set up a GitLab Project (and rename the default branch)
Naturally, I didn’t have a GitLab account, so I had to set one up. I’m assuming that if you’re using it already or you’ve just started using it you’re taking advantage of the more business features but I’ve just stuck with the good ol’ free account for now! It was remarkably simple, sign up, email address, confirm and here we are:
Ok there is something very cool I like about setting up a new project, can you tell what it is?
You can completely set up a new blank project but they have templates, you can import projects OR, and I love this, you can setup a full CI/CD pipeline from another repo! Having done this before in Azure DevOps it was not easy, let me tell you. It really seems like Azure DevOps hates you for setting up CI/CD from an external repo, even though it has plenty of helpful ways of doing so!
So I initialized my repository with a README and updated it:
Don’t ever say I’m not descriptive enough!
The first thing I did was a renamed my default branch to ‘trunk’ by going to branches, creating the new branch and then in Settings > Repository changing it to the default and then swapping out the protected status with the outdated master:
Then finally delete the old default in Repository > Branches:
Excellent. Now it’s time to clone trunk onto my machine as we will need the local repository to put our change automation project in!
I created a folder called GitLab test and cloned the mostly empty repository into it:
Simple!
Create a new SQL Change Automation project and push it to trunk
In SSMS I opened up the most recent version of SQL Change Automation an created a new project called “DoggosAreCoolDB” using a copy of a Dev database I had lying around from a previous demonstration (BlogsDotRedgate):
Then I created my baseline as a migration script against the up-stream copy, BlogsDotRedgate_Integration, because who has access to Prod for this? Am I right? *cough & shifty eyes* not me!
I successfully generated my baseline and a change script (I added a column to a table, nice and simple) and then committed them to my local repo, and pushed! Forget branching, today isn’t about that, we’re just PUSHING TO TRUNK, WOO-HOO!
Setting up the CI/CD Pipeline
Now that we have our project and migrations in GitLab we can build out a pipeline! So first stop I went straight to CI/CD > Pipelines and was presented and I hit “Get Started”:
They immediately throw you into a Quick Start “Help” style guide which is immediately a little un-intuitive but surprisingly helpful if you read the whole thing. Effectively we need a YAML file called .gitlab-ci.yml that will store our pipeline as code telling it how and where to build, and we need a runner to actually fire up and execute these steps.
In my experience with some other CI/CD tools, it’s been advantageous to actually create the Runner / Agent first on the machines you’re going to be using, so as I just have my laptop to do this on, I will set one up on there! I found the full documentation for a Windows Runner here, and followed it just so I would have it available.
The GitLab Runner was up and running in my services but I’ll be darned if I can see them anywhere in GitLab…
Aha! So it turns out after a bit of digging that you need to registerthe runner specifically using the CI/CD section on the project settings, that was probably my bad for not reading the documentation thoroughly but my counter-argument… who actually does? So I issued the register command, applied tags and a description and chose my runner type, I chose shell because I need to be able to run PowerShell on the machine (I’ll need the SQL Change Automation PowerShell components availableonthe machine where the Build and Deployment are happening of course):
I’ve got the runner on the machine, I’m using an instance of SQL Server to build against, now I just need the YAML file (fortunately GitLab has full documentation for how to structure this as well!)
So I can build my project I’m going to need to know where the repo is cloned to during the process (i.e. to find the .sqlproj file) so by taking a look I managed to find a list of environment variables that can be used in the YAML file, just to be sure though, I created and committed the most basic YAML file that would just echo back the location of the cloned files:
After this let me know the environment variable worked correctly and the build pipeline was being fired up correctly on my private runner, I tried something a little more ambitious, building the .sqlproj file using the cmdlet reference from the SQL Change Automation documentation for help:
I’m still using the same machine for the release portion too, so naturally I can use the same runner for this, if you have other servers you’re deploying to you will of course need additional runners.
We can very easily extend what we already have in our YAML file by just telling the process to create and export a new build artifact – I’m going to name it the same as everything else, and then append the BuildId to the end of the file so we always get something unique:
You’ll notice how I’m exporting the NuGet package to the project directory and then uploading it, this is so that we’ll have access to it to release but also so that we can use the artifacts argument in our YAML to upload the file and make it a downloadable package through the GitLab interface (if you go to that SPECIFIC job):
Whilst we’re on a roll here (and things haven’t gone wrong for a while) I’m going to add 2 additional stages ALL AT ONCE to both “Create a Database Release Artifact” and “Deploy from a Database Release Artifact” using, once again, the SQL Change Automation PowerShell cmdlets.
Woo-Hoo! I’m invincible!
I broke it.
Can you see what I did wrong? The error is:
New-DatabaseReleaseArtifact : The specified value for the Source parameter is neither a valid
41database connection string nor a path to an existing NuGet package file or scripts folder:
42'CI_PROJECT_DIR\Export\DoggosAreCool.Database.1.725147351.nupkg'
So 2 fun things. 1 – I forgot to highlight there was an environment variable at one point, so it was just looking for the name of the variable in the path and 2) it keeps erroring out saying my NuGet file isn’t a NuGet file, weird right?
On further inspection it is yet another mistake I made. I’m using the job ID to name the NuGet package, which means when it tries to find the file it’s 2 steps ahead because each stage is counted as a different job! Duh!
A few quick changes should hopefully sort this out! I’m going to put the instance of the pipeline ID in ($env:CI_PIPELINE_ID) and see if that makes a difference!
Wait. Did it just say the pipeline ran? SUCESSFULLY? That’s exactly what it said! We can verify that this actually happened as well by checking the DatabaseDeploymentResources folder for the Release Artifact to Integration:
And everything is there! Note you won’t have a changes.html report just yet because this is the first time we’ve successfully deployed to Integration, however if we run 1 more change through (I’ll add a stored procedure):
Boom.
Now of course we can add additional stages to this, for manual intervention or to promote to other environments, but I’m going to call it a win here and retire (until the next post) gracefully. I’m sure you’re all wondering what my final YAML file looked like too – well (counterintuitively) I’ve popped it all into GitHub for you and pasted it below. Enjoy!