Configuring Dynamic Data Masking in Azure SQL Database from SQL Data Catalog using PowerShell

Horror is the removal of masks.”
Robert Bloch

I spend a great deal of my time talking about Data Masking, don’t believe me? Checkout here, here, here and even here. I talk about it a LOT, but I’m always talking about Static Data Masking (SDM), which is the process of masking the data at the file level itself so it is irreversibly altered – this is fabulous for non-Production environments like Dev and Test, especially when you pair it with a good cloning technology.

But what about Staging / Production environments?

I often work with teams to implement SDM and one of the fastest routes to successfully generating your masking sets for cloned environments is, of course, SQL Data Catalog (or cataloging solution of your choice) – you’ve already put the effort in to classify your columns and figure out where sensitive information exists within your databases and instances… so doesn’t it make sense that we can just use THAT as a source of truth and generate masking from there?

Note: I actually produced an end-to-end video showing the process of Cataloging, Masking and Cloning in under an hour here: https://www.red-gate.com/hub/university/courses/sql-data-catalog/end-to-end-data-protection-with-sql-data-catalog-and-sql-provision – if you watch/try it let me know how you get on!

Funnily enough though we wouldn’t want to use SDM for Production (and potentially Staging) environments though – as it irreversibly changes the data, it’s just going to completely mess up all of our Prod data. To tackle this then, many people I work with turn to Dynamic Data Masking.

Dynamic Data Masking (DDM)

DDM is a method of masking the data based on your access rights to the data. As far as customers see they have access to their data through our site or application no issues, but if anyone else needs to query that data, or different people need to see different results when querying environments, DDM has been their way to go.

Whilst a lot of people like to pick up on some of the well known downsides of DDM, it’s not like you’re entrusting the entire security of an environment to it alone – there are a ton of measures we can put in place and DDM is just one; like an ex-colleague of mine (someone very wise whom I admired greatly and am still sad to this day I no longer get to work with them) used to say: “It’s about building a defensible position. The more you do the easier it is to prove you’re doing something and the more likely you are to BE protected.

So when a customer asked this week if it was possible to configure Dynamic Data Masking from SQL Data Catalog (because they’d seen the “Treatment Intent” category and the tag that clearly states “Dynamic Data Masking”), just like we’re able to configure Static Data Masking, well now that was a challenge I couldn’t turn down!

The SQL Data Catalog Taxonomy Page – Treatment Intent Category showing Dynamic Data Masking

DDM in Azure SQL DB

Configuring Dynamic Masking in Azure SQL DB is fairly straight forward through the Azure portal, you can go to your Azure SQL DB, click Dynamic Data Masking and it gives you the option to simply pick and save columns to apply Data Masking to, and to whom these rules apply / don’t apply:

DDM in the Azure Portal for the DMDatabase_Dev, with masks configured on customer_firstname and customer_email

However when we potentially have a lot of columns or DBs to configure masks for this is really going to get very old very fast. As with all things, I turned to PowerShell for the answer and fortunately I found it: https://docs.microsoft.com/en-us/azure/azure-sql/database/dynamic-data-masking-overview?#set-up-dynamic-data-masking-for-your-database-using-powershell-cmdlets – basically I can get existing DDM configurations and set new configurations for columns directly from my Azure SQL DB using PowerShell.

Now I’m not expert on DDM, and Redgate Data Masker for SQL Server is not a DDM solution (so I’ve only ever needed to know SDM really) I don’t pretend to be, but it seemed that I had everything I needed to tie Catalog into DDM.

PowerShell time!

I’ve written so much PowerShell to get classifications out of Data Catalog at this point it’s become second nature, but if you’re using the SDC PowerShell module and you need a reference you can view it here: https://documentation.red-gate.com/sql-data-catalog/automation-with-powershell but the standard “stuff” goes:

  • Pull down the PoSh module
  • Connect to catalog where it’s installed using an Auth token
  • Grab out the classifications with Get-ClassificationColumn
  • Shrink this down to just the columns we care about based on the tags

But the Az PowerShell cmdlets were honestly just as easy to use! I was surprised how easy it was to get up and running:

  • Connect to my Azure subscription
  • Get the current list of columns already with DDM masks
  • Remove these from the Catalog list
  • Update the remaining columns to use the default Mask

This was the full code I ended up using:

# This script is intended to be used with Azure SQL Database and Redgate SQL Data Catalog, however you are welcome to adapt and edit as required
# It will pull columns out of azure that are already being masked, and a list of columns that need to be masked with DDM
# It will then rationalise these, and configure Default DDM masks for any columns not already being masked on that Azure SQL DB

#Variables for Azure SQL DB & Catalog
$ResourceGroup = "DMDb"
$ServerName = "dmnonproduction" # Your instance minus .database.windows.net
$instance = "dmnonproduction.database.windows.net" # The instance or logical SQL Server as displayed in SQL Data Catalog
$DatabaseName = "DMDatabase_Dev"
$CatalogServer="http://pse-lt-chrisu:15156" # Your SQL Data Catalog location, leave off the trailing "/"
$authToken="REDACTED" # Your SQL Data Catalog Auth Token
$AzureSub = "Redacted" # Your Sub ID

# Get the SQL Data Catalog PowerShell Module & Connect
Invoke-WebRequest -Uri "$CatalogServer/powershell" -OutFile 'data-catalog.psm1' -Headers @{"Authorization"="Bearer $authToken"}
Import-Module .\data-catalog.psm1 -Force
Connect-SqlDataCatalog -ServerUrl $CatalogServer -AuthToken $authToken 

#Connect to your Azure Subscription
Connect-AzAccount -Subscription $AzureSub

#Get current active DDM Masks from Azure
$DdmMasks = Get-AzSqlDatabaseDataMaskingRule `
    -ResourceGroupName $ResourceGroup `
    -ServerName $ServerName `
    -DatabaseName $DatabaseName
$ListOfDDMColumns = $DdmMasks | ForEach-Object {$_.SchemaName + '.' + $_.TableName + '.' + $_.ColumnName}

#Get columns from Catalog currently marked with "Dynamic Data Masking" as a treatment intent
$CatalogColumns = Get-ClassificationColumn `
    -InstanceName $instance `
    -DatabaseName $DatabaseName | Where-Object {$_.tags.name -eq "Dynamic data masking"} 

#Filter down to a list of columns that need to be masked, that currently aren't configured with DDM
$ColumnsToDDM = $CatalogColumns | Where-Object {($_.SchemaName + '.' + $_.TableName + '.' + $_.ColumnName) -notin $ListOfDDMColumns }


#Set default DDM Masks for identified columns
$ColumnsToDDM | ForEach-Object { `
    New-AzSqlDatabaseDataMaskingRule -ResourceGroupName $ResourceGroup `
                                     -ServerName $ServerName `
                                     -DatabaseName $DatabaseName  `
                                     -SchemaName $_.schemaName `
                                     -TableName $_.tableName `
                                     -ColumnName $_.columnName `
                                     -MaskingFunction "Default"

}

But I have also uploaded it to my GitHub here in case anyone would like to take and adapt as they see fit: https://github.com/ChrisUnwin/PowerShell/blob/master/Demos/Redgate%20Demos/DDMFromCatalog.ps1

And this was the result – here were the two columns I had already being masked:

Customer Firstname and Customer Email with DDM Masks Configured

These were the columns I had marked as Dynamic Data Masking in Data Catalog:

Customer firstname, lastname, street addres and email all marked for DDM in Catalog

and after running the PowerShell it deduced that the delta was street_address and lastname and created the default DDM mask for them in Azure:

All columns now being masked dynamically

Considerations

1 – I have used the default mask in this process, however if you wanted to configure the mask (as per the link to the docs above) to be specific numbers or format you could absolutely do this, simply by modifying the PowerShell to look at the Data Type and then just passing into a different New-AzSqlDatabaseDataMaskingRule for each of those types.

2 – This only applies to Azure SQL DB and does not take into account the considerations when using DDM on say, a 2017 SQL Server Instance running on a VM – however you could use the same approach to pass the columns into some dynamic T-SQL which would in turn run the correct command to add DDM to that/those column(s)

3 – I would still use Static Data Masking (SDM) for non-Production environments, because if anyone bypasses the DDM they will have access to the full data, which we don’t really NEED in less secure non-Prod environments anyway, so Static might well be the way to go!

“But I don’t wanna INSTALL it!”: Data Masker on the fly in Azure DevOps (with an Azure SQL DB)

“There is always a way to go if you look for it.”
Ernest A. Fitzgerald

As many of you know, I really enjoy talking about Data Masking. I fundamentally believe it is an absolutely ESSENTIAL part of Test Data Management and specifically the provisioning of Pre-Production environments. If you hold sensitive PII/PHI/PCI in your Production environments, you have no excuses for porting any of that back into Dev and Test.

I also stand firmly behind the belief (and it is just that, my belief) that masking is much safer than anything you can achieve with encryption or limiting access alone. Static masking, when done correctly, means that even if all other security measures are bypassed, or we accidentally expose data somehow, it doesn’t matter because the PII/PHI has been wiped, but it is still fundamentally useful for development and testing environments.

I’m also a huge proponent of using classification and data masking as part of a DevOps process, and often you’ll find me using Azure DevOps to actually kick off my masking process – but one of the most frequent questions I get is “do I have to install Data Masker somewhere?” and the answer I always have to give is… yes.

When we use SQL Provision to spin up our environments in non-Prod, generally we know they have to be in IaaS VMs or On-Prem, due to the nature of the technology (keep your eyes open on THIS because big changes coming soon *squeals in excitement*) but sometimes we’re working entirely with PaaS DBs and VMs don’t even come into our vocabulary – but we still need them to be masked for non-Prod use.

I have some data in an Azure SQL DB – a copy of DMDatabase (get it from my GitHub):

DM_CUSTOMER table in the DMDatabase

and I would like to get this masked before I create copies of it. Now I have written scripts in the past to make sure that Azure SQL DBs can be masked and then copied into non-Prod environments and you can get those scripts here, and you could easily wrap something like the below INTO a script like that – but much of the work I’ve done on this always involves having Data Masker installed somewhere and invoked on that machine – a VM, my laptop, whatever.

So, how do we avoid having to install Data Masker each time?

Data Masker for SQL Server does not (at time of writing) have publicly available a docker container or method for installing using Choco or something like that – once we do, trust me, I’ll be blogging about it A LOT as I will be very excited. But there is an install file available: https://download.red-gate.com/installers/DataMaskerforSQLServer/ and this might be enough.

The Process

Initially I created a Masking Set locally for the DMDatabase copy in Azure, it was nice and simple just masking a few of the fields on the CUSTOMER table:

It relies on SQL auth for the connection and I’m remembering the credentials, though these could be subbed in later on using the PARFILE (documentation on that here).

Next I put this masking set into a newly created Azure DevOps Git repo, which I cloned down onto my local machine – and then committed and pushed my changes up into Azure DevOps:

Now that this was all in ADO, it was time to set up a pipeline for it – so let’s jump into some YAML! Now, Data Masker currently needs to run on a Windows machine so we’ll set the pool to Windows-Latest:

trigger:
- main

pool:
  vmImage: windows-latest

The next step is to grab the installer – which I know I can easily do with PowerShell. I’m sure you could be more clever about this, but with limited time I chose the most recent version and hard coded that in to a PowerShell Invoke-WebRequest cmdlet:

- task: PowerShell@2
  inputs:
    targetType: 'inline'
    script: |
      $source = 'https://download.red-gate.com/installers/DataMaskerforSQLServer/2021-03-15/DataMaskerforSQLServer.exe'
      New-Item -ItemType directory -Path C:\download
      $destination = 'C:\download\DataMaskerforSQLServer.exe'
      Invoke-WebRequest -Uri $source -OutFile $destination
      New-Item -ItemType directory -Path C:\download\DMlogs
  displayName: 'Download Data Masker'

This PowerShell task is going to grab the most recent exe for Data Masker and pull it down into C:\download on the hosted VM we’re using for the pipeline and it’s also going to create a directory for the Data Masker logs as well (if you wanted to extend the YAML at the end to wrap these logs up and publish them as a result of the pipeline masking, then go for it and tweet me to let me know!)

Next we have to extract and install Data Masker from that download, which is fairly easy to do with a cmd call:

- task: CmdLine@2
  inputs:
    script: |
      "C:\download\DataMaskerforSQLServer.exe" products "Data Masker for SQL Server" log c:\download temp c:\download /IAgreeToTheEula RG_LICESE=%RGLICENSE%
  displayName: 'Install DMS Headlessly'

Note I’m using the guidance from this page, making sure to accept the EULA and I’m passing in my Redgate License as a variable, that I have specified for the pipeline and kept secret. This will put Data Masker in the default location in C:\Program Files\… and means we will then be able to call it. I do however need to make sure that this now works. So I saved my pipeline and ran it to see what happened:

Fabulous, that all works nicely. Now to pass the DMSMaskSet file to Data Masker and get it to run – ah but I forgot, I’m going to need a PARFILE as per the cmdline documentation that specifies where the files etc. are for the run. So I create my PARFILE.txt as such:

MASKINGSET=C:\download\DMDB_MaskingTime.DMSMaskSet
LOGFILEDIR=C:\download\DMlogs
DATASETSDIR=C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataSets 
REPORTSDIR=C:\download\DMlogs
INTERIM_REPORTS=false

As you can see, very simple. But to make sure these files are “simply” in the right folder, and because I don’t have time to explore how I could pass an environment variable into the .txt file itself, I’m going to add a quick file copy task to make sure that my masking set and PARFILE both make it into that location:

- task: CopyFiles@2
  inputs:
    SourceFolder: '$(Build.Repository.LocalPath)'
    Contents: '**'
    TargetFolder: 'C:\download'
  displayName: 'Copy files from working directory'

Of course now that we have Data Masker installed on the pipeline VM, the masking set AND the PARFILE… let’s get masking!

- task: CmdLine@2
  inputs:
    script: '"C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataMaskerCmdLine.exe" PARFILE=C:\download\PARFILE.txt'
  displayName: 'Run Data Masker'

I agree the command is now less than impressive given all the prep work, but when you run all of this in it’s entirety…

Big success! But let’s check the data of course and make sure it is as we expect:

It all works, Lynne has been masked to Muna (and the rest has been masked too!), and I didn’t need to have Data Masker installed on a VM in my environment, with an Azure DevOps self hosted agent to run it – I could just do it programmatically. #Winning.

I’ll put the full YAML below however some caveats:

  • You cannot run this pipeline a LOT in a short space of time, I found that out. The RedGate downloads page is not particularly designed for this process, so it should probably be run sparingly otherwise the pipeline times out because the server, by way of balancing, will prevent you from pulling the file too often.
  • You will need to update the PowerShell step reasonably often if you want the DMS most recent installer, or make it more futureproof to grab the latest version – I just didn’t investigate that.
  • Data Masker is dependent on the power of the machine it is running on in MANY ways and Azure DevOps pipeline VMs are not particularly the most powerful beasts in the world – so if you have a lot of masking that needs doing, I would be weary of this method and might stick to an Application Server VM you’ve got hanging around, just in case.
  • This is likely to change a lot in the future so may not be relevant when you’re reading it after a few months – so before implementing anything like this, if it’s been a few months, contact me or Redgate and just confirm that there isn’t some better way of doing this, if I haven’t already blogged it!

Thanks for stopping by and have a great week!

Full YAML:

trigger:
- main

pool:
  vmImage: windows-latest

steps:
- task: PowerShell@2
  inputs:
    targetType: 'inline'
    script: |
      $source = 'https://download.red-gate.com/installers/DataMaskerforSQLServer/2021-03-15/DataMaskerforSQLServer.exe'
      New-Item -ItemType directory -Path C:\download
      $destination = 'C:\download\DataMaskerforSQLServer.exe'
      Invoke-WebRequest -Uri $source -OutFile $destination
      New-Item -ItemType directory -Path C:\download\DMlogs
  displayName: 'Download Data Masker'

- task: CmdLine@2
  inputs:
    script: |
      "C:\download\DataMaskerforSQLServer.exe" products "Data Masker for SQL Server" log c:\download temp c:\download /IAgreeToTheEula RG_LICESE=%RGLICENSE%
  displayName: 'Install DMS Headlessly'

- task: CopyFiles@2
  inputs:
    SourceFolder: '$(Build.Repository.LocalPath)'
    Contents: '**'
    TargetFolder: 'C:\download'
  displayName: 'Copy files from working directory'

- task: CmdLine@2
  inputs:
    script: '"C:\Program Files\Red Gate\Data Masker for SQL Server 7\DataMaskerCmdLine.exe" PARFILE=C:\download\PARFILE.txt'
  displayName: 'Run Data Masker'