I had some really positive feedback on my last “Using things weirdly” post, and truth be told, I love to use things weirdly. The number of times I’ve heard: “Oh, well, sure yeah I guess it works that way too…” is just too many to count. So imagine how my eyes lit up when I realized that you can do something weird with one of my favorite things to talk about at the moment, The Hybrid Model.
Now if you don’t know about the Hybrid Model then I would suggest you check out my post here that’s all about the different source control models available for your databases!
Across both the State based and Migrations based offerings within the SQL Toolbelt, you have access to something very cool: the ability to easily (alongside the schema) source control your static data. Now don’t ‘@’ me because you think I should be referring to it as “Semi-Static” because it might change occasionally and ‘that’s not truly static then is it?‘; I could easily also refer to it as ‘Lookup Data’ or ‘Reference Data’, basically whatever you class things like “Country Codes” and “Currency Codes” as.
Whatever you call it, it can go into your VCS like any part of the database schema:
But. One thing that – as of writing this RIGHT NOW (23/07/2020 10:09am BST) – is not officially available in the Hybrid Model combining these two methods… is Static Data. The Data tab in SQL Change Automation even disappears when you set it up as a Hybrid workflow:
And this gives me a sad.
Got your Hybrid Model setup and ready to go? Let’s use it weirdly!
1 – Use SQL Source Control to commit some static data to your state repository. This is as easy as right clicking on your (highlighted green) source control linked database and selecting “Other SQL Source Control Tasks” > “Link/Unlink Static Data“, and picking your tables.
Nothing should be showing in SQL Change Automation:
2 – Unlink SQL Change Automation from the state repository for a moment and link it instead directly to the development database. This will cause it to go into Migrations-First mode. You can do this by clicking the blue source name in SSMS and picking Existing Database instead:
Because it’s technically the same database as you’re source controlling in the state repository, it should all just work™ and should tell you there are no dev changes to the source. Then you’ll see the “Data” tab has been enabled:
3 – Select the same tables to source control as you did with SQL Source Control by using the Add Tables wizard:
Isn’t it now going to generate a migration for our static data?? This isn’t included in the baseline or anything at all so far, so is it going to try and insert all of my static data into tables later on that already have it?
No. Actually we’re fine!
4 – Generate the static data migration script (and look at it for peace of mind). Notice that the script actually has checks in there – because we’re newly adding these tables, the migration will check if the tables are empty before trying to run the script, and only AFTER this migration will SQL Change Automation start generating the differential, incremental static data migrations:
Commit this migration script to your migrations repository, and that’s all we need to do here!
5 – All that’s left now, is to re-hook-up the Hybrid pipeline, follow the same steps you did before in Step 2 but this time, instead of an existing database, just link it back to the state repository like it was before. If you’ve done this right, you should see no changes pending for migrations:
BUT you will notice that if you change any static data with SQL Source Control, it should now show up in SQL Change Automation!
Is it an intended use? No absolutely not, the reason it’s disabled is that with all things at Redgate they are considered heavily to ensure users are offered the best possible user experience, functionality and essentially something that meets requirements across the board.
But. We can use it weirdly to, as i say, just make it work™.
“Doing the best at this moment puts you in the best place for the next moment.” – Oprah Winfrey
One of the things that comes up the most when I’m talking to people about database classification is multi language support. Often people are resigned to:
Waiting on additional language support from their vendor of choice (defer)
Writing their own software to do it in their native tongue (shift effort)
Going the long way around i.e. manual classification (brute force)
Do nothing (abandon)
There are currently over 7000 languages in the world and when it comes down to classification, you can’t expect that whatever your choice of technology for this classification workflow will have packs or native support for your languages and/or the languages your databases have been configured in.
Now, deferring and abandoning as above really aren’t options any more and if you’ve read my post here then you’ll know that database classification is a key concern to address before trying to base any business decisions on data. But how then do we tackle t he above issues of shifting effort and brute force?
Sometimes it’s about finding ways to still be effective and still take advantage of the benefits such a classification technology affords you.
With this in mind – I wanted to share with you 3 tips and tricks in SQL Data Catalog to make it work better for your language(s):
1 – Customize your taxonomy to be relevant / Passen Sie Ihre Taxonomie an, um relevant zu sein / Ajustați-vă taxonomia pentru a fi relevante
The great thing about SQL Data Catalog is that it is incredibly easy to alter the taxonomy to best reflect your view of your data. Not all standard fields will be relevant to you and some fields that are ABSOLUTELY necessary will be missing.
So add them in!
Let’s say, in my country of Christopia, it is mandatory that people identify a certain government specified “Classification Category” from 1-5, where 1 is the most sensitive and 5 is basically public knowledge. I can add that as a category and tag:
That goes equally for fields that we will not be making use of. If it is irrelevant then remove it. Simple.
I will say this for the above step though – don’t think of this as being the easiest and most obvious part, in many ways this is actually the hardest step because you will need buy in. Guess who also cares how you describe data stored in databases? Data Governance, Legal, InfoSec, Database Admins… the list goes on. Just make sure you do the 2 most fundamental parts that are required for success here – communicate & collaborate. If you have buy in from other concerned parties, you will establish a ‘common language’ with which data can be described and consumed, and based on which complex data-driven business decisions can be made.
2 – Customize your search rules / Passen Sie Ihre Suchregeln an / Personalizați-vă regulile de căutare
Going through every single column is a colossal waste of time, especially when we have more than about 6 databases. I’ve spoken with people who have spent day, weeks and even months working on classifications in SSMS or Excel, only to find out that 90% of their columns matched common regular expressions, and moreover that 50-60% of their columns were out of scope anyway!
Fortunately within SQL Data Catalog, you can configure the rules that the tool runs against columns and tables to identify any clearly sensitive columns which it will then suggest to you each time it finds something that matches.
In my experience, between the analytical filters and the suggestions, you can make excellent progress very very quickly, and do you know the best part?
Data Catalog is evergreen.
It constantly keeps track of your ever evolving schema and checks the suggestions against them and that means if somebody makes a change to the schema (like in my post from last week) you’ll get the suggestions coming through and you can easily catch anything that might put your sensitive data in harms way.
3 – Use PowerShell / Verwenden Sie PowerShell / Utilizați PowerShell
Ok I get it – this is my catch all for everything whenever I talk about using Data Catalog – but did you know, using PowerShell with Data Catalog is really easy.
Like even I can do it easy.
But the good thing is, when you use PowerShell you’re not constrained by particular functions or waiting for things to happen, rather you can take control of the process and fill it with your own logic! There’s even a full on guide that shows you:
How to generate an authorization token
A full cmdlet reference
Complete worked examples
and the best part is it’s just part of the documentation! Don’t you love it when people keep documentation up to date and useful? Yeah me too!
Bonus Tip / Bonus-Tipp / Sfaturi Bonus
If you work in an environment that is multi-lingual and has a mix of languages associated with different databases which haven’t been standardized, that should just highlight further that this is a team game.
Pairing and mobbing, just like in coding, can be vital and powerful tools at your disposal – so it’s important to have the right people in place to handle this.
Classification is not infallible, much as you may wish to think it is. Data is data, and if there’s one thing I have learned about data it’s that it always manages to surprise you, so a second pair of eyes (or more) can be invaluable in helping you weed out some of the problems you may be facing.
“Some people try to make everything complicated, be the person who tries to make everything simple.” – Dave Waters
Simplicity is in my blood. That’s not to say I am ‘simple’ in the sense I cannot grasp more than the most basic concepts, but more that I am likely to grasp more complex problems and solutions when they are phrased in simple ways.
This stems from my love of teaching others (on the rare occasion it falls to me to do so), where I find the moment that everything just ‘clicks’ and the realization comes over them to be possibly one of the most satisfying moments one can enjoy in life.
Now recently I’ve been enjoying getting my head around Flyway – an open source JDBC based migrations tool that brings the power of schema versioning and deployments together with the agility that developers need to focus on innovation in Development. There’s something about Flyway that just… ‘clicks’.
It doesn’t really matter what relational database you’re using; MySQL, IBM DB2, even SAP HANA! You can achieve at least the core tenants of database DevOps with this neat and simple little command line tool – there’s not even an installer, you just have to unzip!
Now I’ve had a lot of fun working with Flyway so far and, thanks to a few people (Kendra, Julia– i’m looking at you both!) I have been able to wrap my head around it to, I would say, a fair standard. Caveat on that – being a pure SQL person please don’t ask me about Java based migrations, I’m not quite there yet!! But there is one thing that I kept asking myself:
“When I’m talking to colleagues and customers about Database DevOps, I’m always talking about the benefits of continuous integration; building the database from scratch to ensure that everything builds and validates…” etc. etc. so why haven’t I really come across this with Flyway yet?
Probably for a few reasons. You can include Flyway as a plugin in your Maven and Gradle configurations, so people writing java projects already get that benefit. It can easily form part Flyway itself by virtue is simply small incremental scripts and developers can go backwards and forwards however and as many times as they like with the Flyway Migrate, Undo and Clean commands, so is there really a need for a build? And most importantly, Flyway’s API just allows you to build it in. So naturally you’re building WITH the application.
But naturally when you’re putting your code with other people’s code, things have to be tested and verified, and I like to do this in isolation too – especially for databases that are decoupled from the application, or if you have a number of micro-service style databases you’d want to test all in parallel etc. it’s a great way to shift left. So I started asking myself if there was some way I could implement a CI build using Flyway in Azure DevOps, like I would any of the other database tooling I use on a regular basis? Below you’ll find the product of my tinkering, and a whole heap of help from Julia and Kendra, without whom I would still be figuring out what Baseline does!
Option 1) The simplest option – cmdline
Flyway can be called via the command line and it doesn’t get more simple than that.
You can pass any number of arguments and switches to Flyways command line, including specifying what config files it’s going to be using – which means that all you have to do, is unzip the Flyway components on a dedicated build server (VM or on-prem) and then, after refreshing the migrations available, invoke the command line using Azure DevOps pipelines (or another CI tool) to run Flyway with the commands against a database on the build server (or somewhere accessible to the build server) and Bingo!
And that’s all there is to it! You get to verify that all of the migrations up to the very latest in your VCS will run, and even if you don’t have the VERY base version as a baseline migration, you can still start with a copy of the database – you could even use a Clone for that!
But yes, this does require somewhere for Flyway to exist prior to us running with our migrations… wouldn’t it be even easier if we could do it without even having to unzip Flyway first?
Option 2) Also simple, but very cool! Flyway with Docker
Did you know that Flyway has it’s own docker image? No? Well it does!* Not only that but we can map our own version controlled Migration scripts and Config files to the container so that, if it can point at a database, you sure as heck know it’s going to migrate to it!
This was the method I tried, and it all started with putting a migration into Version Control. Much like I did for my post on using SQL Change Automation with Azure SQL DB – I set up a repo in Azure DevOps, cloned it down to my local machine and I added a folder for the migrations:
Into this I proceeded to add my base script for creating the DMDatabase (the database I use for EVERYTHING, for which you can find the scripts here):
Once I had included my migration I did the standard
Git add .
Git commit -m "Here is some code"
and I had a basis from which to work.
Next step then was making sure I had a database to work with. Now the beauty of Flyway means that it can easily support 20+ RDBMS’ so I was like a child at a candy store! I didn’t know what to pick!
For pure ease and again, simplicity, I went for good ol’ SQL Server – or to be precise, I created an Azure SQL Database (at the basic tier too so it’s only costing £3 per month!):
Now here’s where it gets customizable. You don’t NEED to actually even pass in a whole config file to this process. Because the Flyway container is going to spin up everything that would come with an install of Flyway, you can pass it switches to override the default behavior specified in the config file. You can adapt this either by hard-coding strings or by using Environment Variables alongside the native switches – this means you could pass in everything you might need securely through Azure Pipeline’s own methods.
I, on the other hand, was incredibly lazy and decided to use the same config file I use for my Dev environment, but I swapped out the JDBC connection to instead be my Build database:
I think saved this new conf file in my local repo under a folder named Build Configuration – in case I want to add any logic later on to include in the build (like the tSQLt framework and tests! Hint Hint!)
This means that I would only need to specify 2 things as variables, the location of my SQL migrations, and the config file. So the next challenge was getting the docker container up and running, which fortunately it’s very easy to do in Azure Pipelines, here was the entirety of the YAML to run Flyway in a container (and do nothing with it yet):
So, on any changes to the main branch we’ll be spinning up a Linux VM, grabbing Docker and firing up the Flyway container. That’s it. Simple.
So now I just have to pass in my config file, which is already in my ‘build config’ folder, and my migrations which are in my VCS root. To do this it was a case of mapping where Azure DevOps stores the files from Git during the build to the containers own mount location in which it expects to find the relevant conf and sql files. Fortunately Flyway and Docker have some pretty snazzy and super clear documentation on this – so it was a case of using:
-v [my sql files in vcs]:/flyway/sql
as part of the run – though I had to ensure I also cleaned the build environment first, otherwise it would just be like deploying to a regular database, and we want to make sure we can build from the ground up every single time! This lead to me having the following environment variables:
As, rather helpfully, all of our files from Git are copied to the working directory during the build and we can use the environment variable $(Build.Repository.LocalPath) to grab them! This lead to me updating my YAML to actually do some Flyway running when we spin up the container!
Effectively, this will spin up the VM in ADO, download and install Docker, fire up the Flyway container and then 1) clean the target schema (my Azure SQL DB in this case) and 2) then migrate all of the migrations scripts in the repo up to the latest version – and this all seemed to work great!*
*Note: I have an enterprise Flyway licenses which enables loads of great features and support, different version comparisons can be found described here.
So now, whenever I add Flyway SQL migrations to my repo as part of a branch, I can create a PR, merge them back into Trunk and trigger an automatic build against my Flyway build DB in Azure SQL:
Getting up and running with Flyway is so very very easy, anyone can do it – it’s part of the beauty of the technology, but it turns out getting the build up and running too, when you’re not just embedding it directly within your application, is just as straightforward and it was a great learning curve for me!
The best part about this though – is that everything above can be achieved using pretty much any relational database management system you would like, either via the command line and a dedicated build server, or via the Docker container at build time. So get building!
“The most powerful tool we have as developers is automation.” – Scott Hanselman
It is no secret that I love to talk about data protection, specifically from the perspective of structured data. When we talk about database development practices, we often find ourselves talking about 3 things most often:
Continuous Integration and Continuous Delivery/Deployment (CI/CD)
Some people refer to this as “DataOps“, others refer to it as “DevDataOps” but in reality, it’s all DevOps guys. This may be an unpopular opinion (and if it clashes with yours please forgive me, it’s just my opinion) but just because a certain niche area hasn’t been specifically called out within a subset of DevOps doesn’t mean you have to invent your own term for it!
Now this leads me on to DevSecOps, or as I like to call it… More secure DevOps.
No but seriously this is a slightly different case – DevSecOps is like DevOps but fortified with security from the ground up. There’s a fantastic article and diagram of this on Plutora from Mark Robinson of how this looks (below) and if you haven’t read his article I would definitely go and give it a read!
Good DevOps practice is a combination of different things working together, bringing the right mentality, the principles, processes and amazing tools at our disposal like automation but this all includes security from the ground up too. DevOps is about putting those principles and practices in place to strengthen the pipeline, so why don’t we treat security in the same way?
Take, for example, 3 pieces of legislation that have been very much in the spotlight:
“The controller shall implement appropriate technical and organisational measures for ensuring that, by default, only personal data which are necessary for each specific purpose of the processing are processed.“ – GDPR (Europe) Art. 25 “Data protection by design and by default”
“Processing agents shall adopt security, technical and administrative measures able to protect personal data from unauthorized accesses and accidental or unlawful situations of destruction, loss, alteration, communication or any type of improper or unlawful processing.“ – LGPD (Brazil) Chapter VII, Art. 46 “Security and Secrecy of Data”
“A Controller or Processor is required to implement appropriate technical and organisational measures to demonstrate that Processing is performed in accordance with this Law…” – DIFC LAW NO. 5 OF 2020 (Dubai) Part 2D, Art. 14 (2) “Accountability and notification”
There’s a common running theme here and although lots of global legislation will either allude to, or directly tell you ways you can be compliant and what some of these “organizational” and “technical” measures are, it’s still pretty blurry.
How do we know what we can do? How do we know what “default” and “design” mean in this context? Well, we build it into the DevOps process.
Now I could sit here forever and talk about why transforming your database development, deployment and provisioning processes allows us to be more secure, but that’s a lot of material and it might have to come in chunks! So what we’re going to focus on today is as the title suggests: Data Classification and Cataloging.
Why is Cataloging important?
Cataloging structured data is incredibly important because it can be one of the first steps we take to securing sensitive Personally Identifiable Information (PII) or Protected Health Information (PHI) wherever it exists across our database environments. It allows us to make strengthened, contextual decisions about the data we hold including how we treat it in pre-Production, how long we retain it for and which systems and processes consume it.
But the most important part of this is simply: it tells us where the risk is.
Read through any of the most recent data protection laws and you will notice that a few things come up quite a lot including “Data Protection Impact Assessment“, or DPIA. Effectively if you can assess the risk of processing activities you can more readily answer the data protection questions and challenges you may face.
Knowing where your data resides can be the first step to helping you assess this risk, and to more readily answer your own data questions. If you want to read more about Cataloging specifically and why it is useful, you can read more about it on my previous blog here.
Where does Cataloging fit into DevOps?
This one is simple to answer. Once you have fully classified your entire estate, you’re not done. No, if you’re a development house or indeed even a single developer – if you are making any schema changes to the tables holding that sensitive data, you’re never done.
The reason for this is that Cataloging is an evergreen activity – if you update tables by removing columns, adding columns, splitting tables, adding tables… anything! Well then you need to be ready to make sure that you are:
a) Prepared and equipped with knowledge of the tables you’re working on and if this is a high risk activity.
b) Updating classification information to reflect the new “truth”, i.e. if you’re adding a column that will collect people’s Twitter handles, then that column should be classified as sensitive, and this should be reflected the moment it is deployed to Production.
So it is important to have the correct people working on this, with the right knowledge, preparation and processes and using the correct tools ensuring that those updates are persisted properly and securely through your deployment pipelines.
Huh… people, processes and tools… That sounds familiar!
The Process: SQL Data Catalog, SQL Change Automation and Azure DevOps
For this little experiment of mine I used Redgate’s SQL Change Automation (Migrations First approach in SQL Server Management Studio) and SQL Data Catalog to both develop & deploy and classify/categorize respectively, and for simple version control and orchestration of this pipeline I opted for Azure DevOps (with SQL Change Automation CI/CD plugins):
Fast forward a little and I had my example databases, VCS and pipeline all up and running:
Step 2) – The “Theory”: This is where things get interesting. So we have an example pipeline set up and we are able to completely deploy all the way through to “Production” so let’s talk theory.
In SQL Data Catalog I have covered both my Production and Acceptance Databases:
Now, in development we don’t make changes directly to Production, so why should Classification be any different? Now how you adapt the above code is up to you, feel free to split it, move it around, incorporate it into Pull Requests if you want to… But I’m going with a bit more of a simple situation.
Situation: Developer makes a change in Development, which gets committed, reviewed and merged o the main branch, resulting in a build and a deployment, in this case to Acceptance and then it is later deployed to Production.
Now, by Acceptance we should only have the “good work”, i.e. all of our testing is shifted left within DevOps so Acceptance is basically the last stop before Production. Therefore we should classify the work we have done on Acceptance, crucially, before it gets to Production and starts gathering sensitive data, and then copy this classification up on deployment.
Ideal: We should have no columns on Production that have not been classified.
Step 3) – In Practice: Fortunately it’s very easy to automate a lot of these steps with SQL Data Catalog utilizing it’s PowerShell cmdlets and REST API. The cmdlets are fully documented and very easy to use (docs here). This allows us to easily scan, classify and copy classifications up to other databases, but we’ll also need to do some checks and report if there are discrepancies, as part of the deployment pipeline that can be investigated.
Are there any columns on Acceptance that aren’t classified but have been deployed to Production? (failure to comply with process)
Are there any columns on Production that have not been classified? (classification drift)
Are there any unclassified columns on Acceptance that have not yet been deployed to Production (for pipeline hygiene purposes)
The other part of this ‘fun’ is reporting what has been changed in the same process. Now fortunately SQL Change Automation spits out a Changes.json file with its Release Artifacts and we can steal that away and find out how many tables have been created or changed in this release and report that back so we can correlate what has been done and what is missing:
So actually getting this up and running is just going to require 3 things:
Data Catalog available and pointed at Acceptance and Production (or your versions of these environments)
Variables set in Azure DevOps to fill the gaps (e.g. Where is Data Catalog? Whats my PowerShell Auth token? What are my Acceptance and PROD DBs called? etc.)
3 is the last step there so you’ll need something like this to run the script:
DatabaseDeploymentJSON – where the JSON file will be with the latest changes in the Prod release
DataCatalogAuthToken – Your PowerShell Auth token from Settings in Data Catalog
DataCatalogUrl – The full URL to your Data Catalog installation, missing the “\” at the end (ending :15156)
ExportPath (Optional) – I specified the path for my Database Deployment Resources to save typing it out in the Redgate plugins
ProdDB / StageDB – As you would expect, the Production and Acceptance/Staging DBs you’re deploying to/from
ProdInstance / StageInstance – As above, except the instance the Database are located on
In the variables above the Instance and DB names are purely used within Data Catalog, so there’s no need to worry about anything happening to the actual databases themselves!
Once you’ve run through the deployment pipeline a couple of times and the changes.json file is being produced, you can go ahead and copy the script into an inline PowerShell script step in your release and you should find it will fire to life! I simulated an example by modifying my Contacts table and my Articles table, adding 1 column each and deploying both to Acceptance. I then classified just 1 of these in Acceptance in Data Catalog:
and then approved the deployment to Production and tada!
Ok you probably can’t make all that out, but it effectively says:
(Information) Table dbo.Articles was modified in this deployment. (Information) Table dbo.Contacts was modified in this deployment.
That much we knew!
1 column(s) with classifications were discovered on VoiceOfTheDBA Acceptance that are not classified in VoiceOfTheDBA Production: dbo.Articles.TestingPineapple
Excellent, we classified that one so it gets copied up and we can verify that in data catalog against Production:
and finally, we get a warning about Production now containing unclassified columns:
(Alert) The following columns have been discovered on VoiceOfTheDBA Production that require classification: … dbo.Contacts.TestingPineapple … You should classify these columns in VoiceOfTheDBA Acceptance prior to the next deployment.
Just as we expected. Success!
Classification and categorization belongs as part of DevOps, if you expect the context for your business decisions around data to remain evergreen and informed then it cannot sit on the shoulders of one or two people to support it, and it cannot live in a manually updated Excel sheet or document.
By including it within the DevOps process, not only do you add an additional layer of security but you also make it an automated, team activity that can be audited, checked and easily kept up to date.
Is this DevSecOps? Well… not really no. Is this a more secure approach to Database DevOps? Absolutely! Happy DevOpsing!
“But I can hardly sit still. I keep fidgeting, crossing one leg and then the other. I feel like I could throw off sparks, or break a window–maybe rearrange all the furniture.” – Raymond Carver
I understand that starting off a blog about Azure SQL Database with the above quote is a little weird, but honestly I’m _really_ excited about what I’m about to tell you.
***Note before starting: This blog post assumes you’re familiar with the concepts of Database Source Control, CI and CD, Azure SQL Database and pipelines within Azure DevOps, otherwise here be dragons.***
I am a huge fan of SQL Change Automation – mostly because of the migrations functionality. In my mind it represents an ideal workflow for making complex SQL Server database changes. If you’re not sure about the different models (State, Migrations, Hybrid), take a look at my blog post from last week here! But until this time it has had one thing that I could not easily do with it… Platform as a Service, Azure SQL DB.
Now don’t get me wrong, SQL Change Automation could easily deploy to Azure SQL Database but I had a problem. The words:
“Chris how do we benefit from the migrations approach and put the shadow database and build db in Azure SQL too? We don’t have any local instances or VMs we can use for this and Dev, Test and Prod are all in PaaS!”
elicited this response:
But. No. Longer.
Now for those of you who don’t know, the _SHADOW_ database that SQL Change Automation creates is effectively a schema and static data only copy of your database, and it is dropped and built each time you verify, to ensure that all of the migrations run successfully and you can effectively check your work and shift the build left (!!), before you even check into source control.
This shadow database and the build database shared one thing in common and that was that you couldn’t build them in Azure SQL DB, which left 2 choices:
Use an instance of SQL Server. Developer for the shadow locally maybe; a VM in Azure or on-prem hosted instance for building
[For build specifically] Use localDB. Not advisable if your database contains any objects not supported by localDB because (juuuust in case you didn’t know) it is SQL Server Express.
But on May 12th 2020 (and I only found out about this like 2 weeks ago) the SQL Change Automation team at Redgate released version 4.2.20133 of the plugin for SSMS which included a few super cool things like additional Azure SQL support and the Custom Provisioning Scripts feature.*
Now this is great because not only can we now easily create SQL Clones to be used as the development source (and I’ll blog about THAT a little later) but of course you can use it to use an Azure SQL DB for the shadow AND to use a persistent Azure SQL DB for the CI build as well!
Now unfortunately Kendra kinda beat me to the punch here and she produced a fabulous 3 part video series you can watch on using SQL Change Automation solely with Azure SQL DB, and you can view those here if you don’t want to see me try it out:
The first thing I did was make sure that I had all of the necessary environments to try this out – I created 3 Azure SQL Databases to mimic Development, Build and Production environments on 2 separate Prod and Non Prod Azure SQL Servers. I ran the DMDatabase prep scripts (you can find these here) to setup Dev_Chris and Production, but left BuildDB empty.
Next It was time to create my project, so I hopped over into Azure DevOps and created a new project, initialized it with a README and then Cloned it down onto my local environment:
Everything was ready to go so it was time to create my project!
Setting up SQL Change Automation in SSMS
*cough*or if you’re me, update it first because you’re on a REALLY old version*cough*
Then I hit “Create a New Project” and it allowed me to just specify the connection string to the Dev Azure SQL DB and the project location was the checked-out local repo:
Didn’t change any of the options because I’m a rebel and I didn’t feel like filtering anything out! But of course now comes the fun bit… the baseline. I chose my production Azure SQL DB as it’s my only upstream DB at this point, and it’s time to hit “Create Project”.
…and Huzzah! It’s worked and we’re all good!
Now… that’s actually not the best bit! The reason why Andrew there is clapping so hard? Well that little piece of magic has happened in the background! A Shadow database has actually been created for me against my azure server automatically! This is done by using the connection string that is used for dev!
Now… one thing to check, and I didn’t think to do this, but you can specify the connection string in the SQL Change Automation user file but I just left mine for a bit not realizing it created an Azure SQL DB for the Shadow that was CONSIDERABLY higher tier than my dev environment (bye bye Azure credit!), but fortunately I was able to scale it down quickly to basic and that has stuck, but be warned!
So I did what all ‘good devs’ would do now… I committed and pushed my initial commit directly to my main branch! (Don’t tell my boss!)
and safely sat my Database in Azure DevOps:
Setting up the build and deployment stages
This bit was actually just as easy. I used to hate YAML but thanks to a certain (wonderful) Alex Yates I jumped in anyway and it turned out to be just fine!
I created a new basic YAML file within Azure DevOps (and used the assistant to just auto populate the Redgate defaults, if you don’t know YAML or what it can do already, there’s a really good MS article here) and committed it to the main branch again (whoopsie) and the only component was the SQL Change Automation plugin I pulled in from the Azure DevOps marketplace, and I configured the build to target my “nonprod” server and the Build DB I had created previously.
On saving and running the pipeline succeeded!
All that was left to do was to create a Release Pipeline. So naturally, I jumped straight in and created a new pipeline, and I started with an empty job and called it Production – *note* make sure you also choose your Build artifact before configuring your release stage too by clicking the Add an Artifact option!:
I added the SQL Change Automation: Release step to the agent job (note because this is all hosted, I’m using an Azure DevOps hosted agent to do this step):
Now you’ll need to add 2 stages (both the SQL Change Automation: Release plugin) at this point, a “Create Release” and a “Deploy from Database Release Artifact” because one will look at the target and figure everything out for you, and you’ll be able to review exactly what will be deployed, and the other will actually _do_ the deployment:
You’ll definitely want to use the project variables to pick up the right package, and also leave the export path blank in both steps for now:
You can Clone the step by right clicking instead if you want to which will preserve all the connections you’ve already provided! Then once it’s all pointed at the right place, save and queue the release!
And of course, we were successful:
and then finally with a couple of triggers set to automatically build and deploy I made a change to my Contacts table in my Azure SQL Dev DB and a few minutes later, thanks to Azure DevOps and Redgate SQL Change Automation the very same change appeared in Production, with no reliance on anything other than Azure SQL DB and SQL Change Automation:
If you have all of your databases in Azure SQL Database**, fear not because SQL Change Automation to the rescue! You can very easily set up and configure a pipeline in Azure DevOps or indeed any pipeline of your choice, but it’s never been easier to persist development changes all the way through to Production in a low risk, incremental, “DevOps” way!
*An important word from the release notes: Note that it is still generally recommended to locate the shadow database locally where possible as that will usually result in a faster database connection. The default CreateDatabase.sql and DropDatabase.sql scripts can be altered to improve performance or implement custom provisioning logic.
**If you have all of your Databases in Azure and you need them masked for Dev/Test too, check out this previous blog post in which I outlined how to do that using Azure DevOps too!
“[My] kids haven’t responded to my GDPR requests so I don’t think I’m legally allowed to tell them when dinner’s on the table.” – @mrdaveturner
Ah masking. You would have thought I’d be sick of it by now, no? No, fortunately now, more so than ever, I find myself answering question after question and tackling use-case after use-case. So when I was asked this week:
I thought to myself, well that sounds easy enough… and it was! I know what you’re thinking, c’mon Chris, surely there is more to it? But no, it’s actually pretty straight forward!
I pointed them at the PowerShell module and cmdlets for SQL Provision and the Azure DevOps plugin to automate all of their Provisioning and Masking process, thinking all the while “pffft, they could have made this harder!” and then…
#1Figure out where you want Data Masking to run in your process
This empty Azure deployment stage looks good enough for now! If you wanted to chain other processes either side of it, that’s cool too! Maybe you have your own provisioning process in place and you want to point Data Masker at it to sanitize it? Makes sense to me! For now I’m going to stick with a single agent job for simplicity.
#2 Figure out what is actually going to run Data Masker
Data Masker is a client install and as such will need to be installed on a *gasp*actual machine!
No but seriously, any server you have lying around, physical or VM will do the trick as long as it meets these requirements. Now this Server/VM will need to have an Azure DevOps agent on it already, which of course is the ideal candidate for being the “thing” that calls Data Masker – this could be the Staging/Non-Functional/Pre-Prod environment also of course, so you could copy down PROD and then immediately invoke masking.
The PARFILE you could of course dynamically replace with variables so that it only calls the relevant parameter file for that particular database as well, a nice benefit!
My PARFILE just simply looked like this:
It was calling a local Data Masker set “AzureFun” – now the thing to bear in mind is that Data Masker will run with the Windows authentication credentials that are being run as by the Azure DevOps agent, unless you specify otherwise. In this case because the Azure DevOps agent has the correct permissions to update the databases on this instance anyway I’m fine to use Windows Authentication:
It’s very easy to simply call the command line of Data Masker for SQL Server directly from Azure DevOps, does this same approach work from other CI/CD tools? If they can call executables on the target server then absolutely! So it’s very easily included in the process – you just have to think about where Data Masker is installed and what credentials you’re using for it!
Bonus Point – what about if it’s all Azure SQL Database?
You had to do it didn’t you, you had to say it!
“But Chris, now we know we can call this all from Azure DevOps, what if wewanted to mask and copy Azure SQL Databases into Dev/Test etc.?”
Well actually the good thing is, it’s also pretty similar! When you’re connecting Data Masker to an Azure SQL DB you only need to specify this in the connections in the controller. Again, authentication will likely have to be SQL Auth at this point, and you need to be in Cloud mode, and I’d recommend setting the connection timeout to 10s rather than the standard 5s, but it can still be called as normal from the PARFILE:
So the Data Masker element is reasonably straight forward – that’s the good news. But the thing you REALLY need to stop and think about is:
Where are our Dev and Test copies going to BE?
Option #1: If they’re going to be on VMs or local dev and test servers / developer machines then you could follow a similar approach to one I laid out in this blog post for Redgate in which you create a BACPAC file and split it out on premise before importing it and then provisioning from there. And you could use this code in my Github to achieve something very similar. Caveat: I am no PowerShell guru, who do you think I am? Rob Sewell? Chrissy LeMaire? No. Sadly not. So you can build your own logic around my code though, have at it, I don’t mind! ^_^
Option #2: Keeping everything in Azure. You can copy databases around in Azure and it seems to work pretty well! So I wrote this PowerShell (also in my GitHub for y’all) to effectively copy a PROD DB into the same resource group, mask it and then copy it across to a Dev/Test resource group, dropping the temp copy so as not to incur lots of extra Azure costs (this is just one of the methods I’ve seen people use, again it’s up to you!) – again, see the caveat in option #1 above for my statement on PowerShell! The good thing is, you can use the ‘&’ simply from PowerShell to call Data Masker’s command line.
Either of these options can be run from Azure DevOps also as part of your provisioning or working processes, but instead of including a call to the command line, you can run a fun PowerShell script instead:
Second Conclusion *sigh*
There are lots of ways to get what you need into Dev and Test, but these copies should be masked if they contain personal, identifying information. There are some methods above but there are plenty of others out there on the internet and if you’re not sure about getting started with data masking; try my post here – happy masking!
“That’s who you really like. The people you can think out loud in front of.” – John Green
Can we all take a moment to appreciate how awesome Kendra Little is? No really, go on over to her Twitter or something and remind her. Because not only is she a genius but she brings out the best in a lot of folk, and I don’t think she gets enough credit – so my quote above today is for her!
This is one of those times though, where we stumbled on an idea, and together we fleshed it out and thanks to her ingenuity and straight up desire to help we ended up with a full on video about it! Thank you Kendra! So if you don’t want to sit here and read about Reusable Schema Deployments, take a look at the video below instead where we cover everything from the key differences between State and Migrations, what a filter file is and how to use YAML in Azure DevOps!
For those of you who prefer a nice read, grab a coffee or tea and a biscuit (cookie) and read on!
We find ourselves in the unenviable situation where we have a Production database that is delivered to customers to support different applications that we provide to them. When a new customer comes on board and chooses any number of our products, we then deliver them along with a copy of the database containing a set of objects that are specific to their setup.
Example. We produce 3 applications for our customers; Beep, Derp and Doink. In the database that we supply with these applications, we have corresponding schemas ‘Beep’, ‘Derp’ and ‘Doink’, as well as ‘dbo’ which holds a number of objects common across all instances of the database.
The question then is: “How do we deploy only the combinations of these schemas to the relevant customers AND make sure there is as little down time as possible?”
I mean, besides magic, of course!
Solution 1: Less Work, More Schema
There’s a reason why, when you buy a new ERP or CRM system for your company, many times you will receive ALL of the schema! Just bought in a new system to help manage your General Ledger, Accounts Payable and Accounts Receivable, and those are the only modules you’re going to use?
Whooooo-boy-howdy you better believe you’re going to get schema objects for Asset Management, Billing and Risk Management too!
The reason for this is that it is much easier to deliver. It is a single package that needs to be deployed everywhere and if the customer already has the relevant objects in the database then it is MUCH easier to just turn on corresponding application functionality that starts to populate and use those objects!
The problem is, if EVERY customer get’s EVERY object change across EVERY schema… well then it’d be a lot of changes and potentially some quite big changes that could impact our customers, perhaps unnecessarily. This could easily be an argument to be made for the migrations approach to source controlling and deploying changes, but that’s one for another day!
Solution 2: You get a filter, and you get a filter, EVERYBODY gets filters!
In the state based way of deploying we can actually use filter files (.scpf) which allow us to filter at the point of creating a database release artifact. This is a game changer because that means we can have the convenience of developing against a single source database in Dev, source controlling ALL object changes together and it’s only once we actually get to the point of deploying to the customer do we include the filter file in the Create Release Artifact step to include ONLY the necessary schema objects that are relevant to them.
Now this is also a great way of doing it because it means that everything in source control is still our single source of truth and we’re able to validate that everything builds together and we can run unit tests broadly against all objects etc. however it does also mean that we have to either maintain separate filter files for every customer, or for every combination of our Schemas that a customer could receive and update them as and when people add or remove certain applications. It also doesn’t give us any validation that THIS particular release artifact that has been created for that customer actually works independently from the rest of the schema objects and therefore we’re deploying something that hasn’t actually been tested in isolation first!
Finally, the secondary problem with this approach is that it is SLOOOOOOW. Like super slow. This is because the heaviest part of the state based database deployment process is the creation of the release artifact determining what changes should be included in the release that is going out of the door and this is being carried out and putting overhead independently on every. single. customer. Not fun.
Solution 3: Reduce, Reuse, Recycle.
If we take a step back from this problem and look at exactly what we’re trying to do and what we want to do. We want to deliver ONLY the necessary changes to a particular schema in the database that supports that specific application.
But this means that there is a commonality across customers – if for example we assume that we have 30 customers that have a variation of the Beep schema (I’m going to ignore dbo for now because everybody has that), they may also have Derp, or Doink or no other schemas, but the point is all 30 of those customers will require the exact same updates to their Beep schema full stop.
This means, if we can generate a single artifact once, that can be used for all 30 customers receiving this schema, or indeed ANY schema, then we can:
a) Reduce the amount of comparisons taking place to create release artifacts b) Reuse and Recycle the same release artifacts for multiple customers c) TEST these artifacts before they actually go out the door!
This is effectively achieved by adding an additional layer on top of the development and deployment process:
An additional step is introduced to produce a single reusable artifact prior to the final Prod deployment, Pre-Prod all receives the same package which contains every object regardless of schema, however when a Production release needs to go out the release artifact is built against the Beep database (in this case, which only has the Beep and dbo schemas) so the pain of the creation of the artifact actually sits outside of the customer environment AND is created only once, allowing us to now distribute that change to any customer who many require it to upgrade their Beep schema.
The same is done for each schema in turn which means we then deploy the fast reusable artifacts, and the only process change required is the step immediately before deploying to Production, almost like the independent databases are an exact mirror.
Don’t get me wrong there are challenges with this model as well.
What if we want to deploy a completely new database with a set number of schemas? Well. You may have to do an ad-hoc deployment or add an additional process to the pile which does that create for you!
What if we create versions 10.4.1, 10.4.2 but these only make it up to one of these Pre-Prod mirrors and then we want to push 10.4.3 to Customer Production environments? We will no longer receive the artifacts from .1 and .2, only .3! Here you will have to create the specific filtered artifacts ONLY prior to deploying to Production so that EVERY change is captured. In the video above I used a Golden DB which had every deployment on it and used this to test my schema specific deployments prior to deployment but it depends on what setup you want to adopt.
Filters are incredibly powerful and if you have subtle differences in hosted environments across your customers or even across your own DBs they can be an ingenious way of being able to keep all core objects AND key variations within Dev, version control and Pre-Production but then making sure that the target ONLY receives what it needs.
But be aware, subtle variations can snowball and you do have to be careful how you handle them as it is not very easy to scale to multiple customers. Fortunately in this scenario 1 schema was mapped to 1 application being delivered which makes it easy to determine who gets what, but the more differences you have, the harder they will be to continuously integrate and deliver.
But it’s happening more and more now. We have to work from home and this is starting to turn up some problems.
And no. I don’t just mean that my wife is now acutely and accurately aware of how annoying I am.
I’m talking about clones. Specifically SQL Clone, and clones do not work great from home.
Now for anyone out there who has never heard of SQL Clone, where have you been?? It’s an incredibly intuitive, reliable tool for rapid provisioning of database copies (*cough* definitely not just lifted from the website) – in any case, it’s pretty darn cool.
One of the coolest things about this technology though is how it seamlessly plugs into the Microsoft Ecosystem by leveraging the VHD(X) technology available in x64 Windows. This means there’s no special file systems, no hard or software “appliances”, it’s very much plug and play (with a little tinkering around ports 14145 and 14146 of course!)
Naturally though it does come with it’s challenges, as does all technology, and it has been highlighted recently more so than ever.
Whilst SQL Clone is a beautiful, elegant and easily automated answer to the database provisioning problem – there are 2 gotcha’s that you must be aware of when you start using the solution. SQL Clone relies on the relationship between the Image file (the centralized, virtualized parent, as it were that exists on a windows fileshare) and the Clones themselves (the 40mb diff disks on the hosts).
Now many people choose to put Clones onto their workstations, which for many of us is our laptops. We have SQL Server Developer installed and we pull down a few clones for different branches, for testing etc. etc. and all is well with the world.
When you’re in the office that is.
When a user queries or works with a SQL Clone and it requires any of the schema/data that was in the original copy and is not in the changes the user has made, a call is made back to the image file (the VHD mounted copy) to fetch it. When you’re in an office setting (with your cup of coffee in one hand and a colleague sat next to the other telling you about their weekend) this is fine because you’re connected directly to the company network and therefore the link between the clone on your laptop and the image file on the fileshare is short, strong and stable.
At home though this isn’t the case. Many of us work on VPNs that are “ok” on internet connections that can only be described as temperamental. So what happens when a clone tries to call back to the image file across this VPN, which is now much further away, across a sea of uncertainty and poor connection?
Bad things happen. Either the Clone cannot connect to the Image and it decides it no longer knows what it is, and it falls into Recovery Pending for a while until the connection is re-established, or if the connection is present it is just so slow.
This isn’t a fault of the tool, I hasten to add, it is just the nature of the technology. Much as we would expect our RDP sessions to be a little laggy if we were based in Greenland and the server was in Australia, it is just part of life.
So… are there solutions? You bet there are!
Option 1: The “jump box”
Many people I have worked with have found that they are still able to leverage SQL Clone whilst working from home by reducing the physical distance between the Clone and the Image, and have done so by introducing a Dev/Test “jump box”.
The way this works is by having an instance of SQL Server available within the company network onto which the Clones are provisioned.
This works great because it means that the link between the Clone and the Image is once again short and strong and stable, relying on the company network, but you can easily connect to or RDP onto this jump box if you need to work with them. Still using your VPN and internet from home? Check! Able to work with a Clone now though? Check!
Option 2: The Cloud
Welcome… to the world of tomorrow!
By which of course I mean, the world that has been available to us for a while. Infrastructure as a Service (or IaaS) allows us to very easily spin up an additional dev/test server which can be used in the interim. Whilst this does incur a little extra cost in infrastructure it still means that you won’t need as much space on the VMs themselves, thank you SQL Clone!
As long as you have a fileshare available in say Azure or AWS and a Windows VM you’re pretty much good to go and once you’ve got Clones and Images up in the Cloud, you’re reliant on the networks of the providers, and I’ve got to say, they’re not too shabby at all!
Sneaky side note: ALL demos I give of SQL Clone actually run on an EC2 VM with a 91GB Image on a fileshare and it works great!
Option 3: Roll-your-own solution
Ok. I realize this one is not really a solution. But the thing we’ve highlighted here is that it’s all about the distance between the Clone and the Image. Those two love-birds cannot exist long distance.
So if you might have another ingeniously simple way of solving this, I would recommend having a read of this magnificent document written by an incredibly clever ex-colleague of mine and the “How It Works” documentation and then let me know what YOU come up with:
There are a few ways of getting around this problem but they’re not always obvious or clear, and at this trying time, you just want solutions you have in place to work. We all do.
So if you are working with any of Redgate’s tools, or even if you just have a question that I or a colleague might be able to help with – please reach out to us, talking to us won’t cost you anything, but it sure as heck might gain you something!
“What you stay focused on will grow.” – Roy T. Bennett
As of Monday 16th March 2020 I have been a remote worker.
Ever since I left University I have worked in busy, bustling offices where the air runs thick with collaboration, questions and social commentary.
But now… I’m at home. I guess you could say I’m not quite sure how to come to terms with this. It certainly is strange knowing that my commute in the morning has gone from 50 minutes on the bus to 50 seconds from bedroom to dining room (via the kitchen for coffee). It’s even stranger though that I don’t get to see colleagues; friends, who I’ve worked with for years and who’s smiling faces, cheerful demeanor and indomitable spirits have been huge contributing factors to my desire to tackle the working day and make as much of it as i possibly can.
In the past remote working was simply the odd days I worked here and there from home, with nothing but my laptop because it was a quieter day when I could do some learning as well as some other life task like going to the doctor or dropping the car off to be serviced – therein lies the problem.
I had come to associate working from home with quieter periods of time, when I could focus on 1 thing at a time, maybe work from the sofa and treat myself to the occasional snack.
As of Monday however, the entire office has been locked down and we are _all_ working from home… for how long? I don’t think anybody knows. The only certainty in the world right now is that nothing is certain – so it is time to put into practice something that I always preach, but rarely am pushed to exercise. I will have to change my mindset.
I’m not alone in this at all – you may be reading this thinking “but Chris, working from home is so easy, I actually get more done and I’m more focused!” and if that is the case I applaud you, and I wish I could share those same abilities. However if you’re like me and suddenly working from home by directive has jarred you, here are my top 3 tips for working from home that have helped me get to grips with it and maintain my productivity:
1- Find a routine
This is perhaps the most important on my personal list because in the past I have found myself getting out of bed at 7.30/8am on days where I worked from home. This is ~2 hours after I would normally get up when commuting into the office and whilst this sort of lay in is great at the weekend to get some rest, it also leads to me being groggy and not fully awake when your start your morning meetings/calls/work and doesn’t help you focus or build a reasonable mental list of priorities.
Take aspects of your normal daily routine and replace them so that you build up a Monday-Friday (for example) that represents something similar to the structure you enjoyed when working from the office. Here is an example of how I have changed my schedule to adapt to my new working situation; I would normally catch the bus to the office which would take anywhere between 40 and 50 minutes first thing in the morning. Now at exactly the same time in the morning I go for a 30-40 minute brisk walk to simulate that commute – match something you did to something creative, compelling, healthy or otherwise you can do from home instead.
The thing to bear in mind is that finishing work for the day should also factor into your plan. It is very very easy when one works from home, to simply leave the laptop open or code running or join a late call. These things will rapidly eat into your personal life though so once you hit your magic “clocking off” time… clock off.
2 – Create separation between work and home
Even though they are now one and the same, you have to keep a separation between your work and home lives. I’ve heard it described by many people that they have set up in the study or in the office – but what do you do when you live in a small flat? Or if you live with your parents who are _also_ working from home and you have to say, work in your bedroom?
My solution to this, as I have a similar problem, is to make certain touches to transform myself and the space i’m working in to make it feel as though there is a transition between going from home to work.
Once I have come back from my walk I relax and make a coffee and some toast or cereal whilst watching YouTube videos, then at 7.55am precisely, my Amazon Echo buzzes an alarm, I turn the television off and I begin “building” my work space. A process which involves taking out the monitor and accompanying cables, setting them up on the dining room table, neatly arranging where everything should go and plugging in my phone and finally ensuring that I have a full bottle of water on standby to stay hydrated.
Conversely at the end of the day I will go through the same ritual in reverse to the point where we are all able to sit around the dining room table and have dinner with no trace that I was ever there working. This act of transformation somehow makes the space seem different, and gives it a different energy, which means i’m not thinking about work problems whilst I eat with my family.
3 – Communicate, Communicate, Communicate
Ok. This one is a no-brainer, but there are subtle levels to communication that can really help when you’re feeling the weight of working from home hovering above you like so much foreboding.
Whenever you’re feeling the pinch of distraction or you’re lacking motivation, message a colleague you would normally speak to on a daily basis at the office and ask them how they are. It seems simple but if you rotate who this person is and just check in on them you make them feel though about, cared about and appreciated. Not only does this give them the motivation to carry on but is a selfless act that can also revamp your own spirits and the resulting conversation can even inspire the thoughts you need to help with whatever you’re working on.
Turn your webcam on – be seen and see other people and enjoy the smiles and the thinking and the body language you would otherwise be missing out on.
React and be reacted to -if you use Teams or Slack or any kind of collaboration tool, make sure that when people make statements or ask questions, you either offer feedback or at least react. A thumbs up emoji at least lets people know that you are listening, allowing them to feel validated and supported and not think they’re just shouting into the void.
Learn to use a mute button – we have all been in situations where someone has mouth breathed the meeting to postponement or someone has had a coughing fit or their child(ren) have come in asking questions – all of these are perfectly fine because we’re humans… but. That is no excuse for everyone in the meeting internal or external to hear that. Liberally use your mute button on your meeting software or headset and make sure you are able to contribute meaningfully, but that when you’re not contributing, you’re also not hindering others from doing so.
Right now is a difficult time. Very much so – and I hope you’re all doing well and staying healthy and happy. Working from home makes up a small portion of what everyone is feeling and struggling against at this point in time and so we should remember to be mindful and grateful that we even are a part of companies who are able to support us working from home as the ability to even do so is a blessing.
Remember to take your mindset and find a way to turn each of the things you struggle with into positives; recognize them, appreciate them and ask yourself why you’re feeling that way, and then find a way to deal with it that makes you more productive and/or happier – most companies have a wealth of people who are experienced in or who are dedicated to help you working from home, don’t be afraid to ask for help if you need it and see if someone can help you tackle some of the problems you’re facing. Everyone is different and will need different things, just because you don’t feel comfortable working from home, for any reason, doesn’t mean you’re doing things wrong. We’re all learning together.
So stay safe, stay healthy and stay awesome and I’ll leave you with my favorite tweet so far on the matter:
P.S. We also just recorded a DBAle Special Episode on working from home, where all podcast participants were working from their respective homes, if you’re interested, you’ll be able to listen on Spotify, Apple Music and directly here: https://www.red-gate.com/hub/events/entrypage/dbalewhen it goes live in the next few days 🙂
“It’s still magic even if you know how it’s done.” – Terry Pratchett
For a long time now I have worked heavily with Redgate’s SQL Provision software. I was involved from the very beginning when SQL Clone was but a few months old and just finding it’s little sheepish (teehee) legs in the world and before Redgate Data Masker was “a thing”.
The one thing I’ve never comprehensively been able to help a customer with though, was PaaS. Platform as a Service has been something that has plagued my time with this wonderful software and that is because you simply cannot take an Image (one of SQL Clones VHDX files) from an Azure SQL Database or from an Amazon AWS RDS Instance directly, helpfully.
But then in January 2019 I did some research and I wrote this article on how you could achieve Provisioning from Azure, through the BACPAC file export method, this was great and several customers decided this method was good enough for them and have adopted it, and in fact completely PowerShell-ed out the process (links to something similar in my GitHub which I used for my PASS Summit Demo 2019), however this never solved my AWS problem.
I’ll be the first to admit, I didn’t even try. AWS for me was “here be dragons” and I was a complete n00b; I didn’t even know what the dashboard would look like! However, in early December 2019 I was on a call with a customer who mentioned that they would like to Provision directly from RDS SQL Server and they don’t want any “additional hops” like the BACPAC Azure method. On the same day, Kendra Little (sorry Kendra, you seem to be the hero of most of my blogs!) shared some insight that it was possible, with AWS, to output .bak files directly to an S3 bucket. That got me thinking, if we can get access to a .bak file directly from S3, surely we could provision it all the way to dev with little- to-no involvement in the process?
My reaction to this news was that it was the universe telling me to get off my backside and to do some thing about it, so with renewed determination and looking a little bit like this:
I set off into the world of AWS.
1 – Setup
Now naturally, I am not a company. Shock. So i don’t have any pre-existing infrastructure available in AWS for me to tinker with, and that was the first challenge. “Can I use anything in AWS for free?” – The answer? Actually, yes! AWS has a free tier for people like myself who are reeeeeally stingy curious which at the very least will let me better understand how to interact with the various moving parts for this.
First step. I’m going to need a Database in RDS, so I went to my trusty DMDatabase (scripts here for y’all) which I use for EVERYTHING, on-premise, in Azure, EV-ERY-THING.
In AWS I went to RDS and setup a SQL Server Express instance called dmdatabaseprod (which fortunately kept it on the free tier). Luckily, AWS provides an easy getting started tutorial for this which you can find here – why re-invent the wheel? After creating the DB I had some major issues actually connecting to it in SQL Server Management Studio; I thought I had allowed all the correct ports for traffic, put it in the right security groups blah blah blah… and guess what it was?
Public accessibility. Was set. To “No“. *cough* well that problem was sorted quickly so it was onto the next challenge.
After following the advice of Josh and walking through his steps, getting a new S3 bucket setup and configured and creating a new backup of my DMDatabase, I was a good step of the way there! As you can see my .bak was nicely sat in my S3 bucket – marvelous!
3 – Making the S3 bucket visible to SQL Server
This was the tricky bit. My approach to solving this problem was “I need SQL Server to be able to see the .bak file to be able to create an image and clones from it. So, logically, I need it to be mapped as a network drive of some kind?” – simple, no? It turns out that it was the best approach from what I found online but there were a number of ways I found of tackling it.
I started out using this article from Built With Cloud which was super informative and helpful, I managed to get rClone running and the S3 bucket was showing as a local drive, which was exactly what I wanted:
But I ran into a problem – SQL Server could not access the mapped drive.
So is there another way? I found a bunch of resources online for CloudBerry, TnT Drive and MountainDuck but, like I mentioned I’m on a very limited budget ($0) so naturally… I put this on twitter. I received a tonne of replies giving some examples and some ideas and the one idea that kept coming up time and time again was AWS Storage Gateway. I had never heard of it, nor did I have any idea of how it worked.
So. Back to Google (or in my case Ecosia, it’s a search engine that plants trees if you search with them, what’s not to love???)
To simplify it. Storage Gateway is solution that is deployed “on-premise” i.e. as a hardware gateway appliance or a virtual machine, and it allows you to effectively use your S3 (or other AWS cloud storage service) locally by acting as the middle-person between AWS and your on-premise systems, and it does fancy local caching which means super low latency network and disk performance. There are a few different types you can utilize but for this exercise I went with “File Gateway”, from Amazon: “A File Gateway is a type of Storage Gateway used to integrate your existing on-premise application with the Amazon S3. It provides NFS (Network File System) and SMB (Server Message Block) access to data in S3 for any workloads that require working with objects.”
Sounds ideal. Time to set it up!
I have access to VMWare Workstation Pro on my machine so I downloaded the OVF template for VMWare ESXi and loaded it up in VMWare (the default username and password threw me a little but it turns out it’s admin and password as standard, and you can change it as you configure):
Then it was a bit of a checkbox exercise from there:
Now I wasn’t 101% sure of exactly how best to set up my fancy new Gateway, so fortunately I found this super helpful video, funnily enough from Teach Me Cloud as opposed to the aforementioned Built With Cloud, and although it was a little out of date, I also had one of Redgate’s finest engineers (the wonderful Nick) on hand to help out. Between the video and us (mostly Nick) we were able to get everything connected!
But I ran into the same problem. SQL Server couldn’t access the backup file.
Fortunately though, after some frantic Googling we managed to find a very straightforward article that fixed all of our pain! We needed to map the drive in SQL Server itself – Thanks Ahmad! Now, yes, I did use XP_CMDSHELL (insert DBAReactions gif at the mere mention of it here) but this was for testing purposes anyway, I’m sure there are other ways to get around this problem!
…and guess what? It worked. Huzzah!
If you can substitute my poorly named image “blah” and assume instead it says “HeyLookImAnImageFromAnRDSBackupFileArentIClever“, this means I can now schedule my PowerShell process to create new images at the beginning of every week to refresh my DMDatabase Clone environment, no manual steps needed!
Whilst there are a number of steps involved, you can easily take advantage of some of the fantastic features offered by AWS like Storage Gateway and even if your database is hosted in RDS, you can fully provision copies back into IaaS (Infrastructure as a Service) or On-Premise workflows to keep your Pre-Production copies up to date and useful in development!
Just remember to mask it too!
P.S. I’m sure you could probably find some clever way of using the free rClone method I also mentioned and having this readable by SQL Server, but I haven’t figured it out yet, but will blog when I do!