“If you quit on the process, you are quitting on the result.“
– Idowu Koyenikan
SQL Provision is really cool. But you knew that didn’t you? It’s obvious – we get teeny-tiny clones, based on an image with completely sanitized data we can use for just about anything in dev and test, and if we break them? Boom! There’s a new one.
I’m not just talking about refreshing Dev & Test environments though, oh no! I’m talking:
- Clones as baseline with SQL Change Automation – baseline scripts for projects are a thing of the past, goodbye invalid object headaches!
- Clones every single time you switch a branch – keeping everything separate and not cross-pollenating database work between branches
- Clones to check Pull Requests instead of relying solely on the code itself in Version Control
Watch my session on all 3 of these from Redgate Streamed back in August: https://www.red-gate.com/hub/events/redgate-events/redgate-streamed/redgate-streamed-global-august-26
But one question always comes up about clones in any workflow and that is – how often should I refresh Images and Clones?
This question obviously depends a lot on the process but in reality I think the question should be less about clones and more about the images themselves. Clones are transient and can be flipped at a moments notice, but the image, or the “clone tax” as Steve Jones calls it, is the thing that takes time, resource and space.
I’m going to take my own go at answering this question as I would in any customer meeting or architecture session – but if you want some excellent detailed advice and examples, check out this awesome documentation page here: https://documentation.red-gate.com/clone/how-sql-clone-improves-database-devops/self-service-disposable-databases-for-development-and-testing
Q: So, how often should we refresh it?
A: It depends on your use of the Clone – how often do you need up to date data?
As a rule of thumb though, I tend to see the following behaviours:
- Customer Support – overnight during the working week: Where you have data that needs people to troubleshoot customer issues, it always helps to have data as close to now as possible to help resolve issues. You want an image on standby ready so that at any second a member of support can pull down a copy to look through (if it NEEDS to have sensitive data for this purpose, you can restrict who can create clones from these images by using SQL Clone’s Teams functionality)
- BI / MIS and Report testing – once a week (if not more often): Business Intelligence and reporting workflows can just mean that you’re reading a lot from your clones in which case they should stay small and you should be able to move seamlessly between clones. But. If your ETL process puts a very heavy load on your clones (like truncating and re-populating tables) you may cause bloat and need to rethink your refresh frequency to be more often where possible, perhaps overnight so that any transformations are captured in the new images, and clones by extension.
- “BAU” Development (Schema and Static Data Changes) – Every 1 or 2 weeks: If you’re not affecting a large number of changes to your clone, or they are limited to schema and static data only then you should be absolutely fine with a wider refresh cadence – keeping the clones around for the whole sprint or only refreshing once during the sprint can mean everyone more easily stays up to date with the same environment consistently.
- Ad-Hoc and Test workflows – once per month: There are going to be times where you occasionally need a copy of the live DB, but the fact it is 99% similar in terms of schema, and the data is a few weeks out of date isn’t a big deal. You can pull one down from this “cold copy” for any kind of test, destructive or even to validate certain behaviors / sense check if an update or query will work. It’s also handy to maintain a slightly older copy where possible if you need to start digging into failed updates made in development, so need to have a milestone to compare from.
Again – these workflows may vary and you may have needs to be more or less frequent based on differences being recorded, bloat, space available on the fileshare etc. but generally I find customers are pretty happy with this.
Q: Once we have our refresh rate in place – how do we move developers across?
This is a great question I get a lot of the time, and it stems from the fact a developer may have made a few dozen changes to a clone, and then the frequent refresh rate blows their clones away (and they forgot to commit to version control – D’oh!) – so it’s important to bear in mind that development work, and as a result the cloning of environments is not “cut and dried“. We should give developers a chance to move across as-and-when they’re ready, so I often end up recommending the below workflow, to ease this process.
For the sake of this proposed workflow I’m assuming a couple of things:
- The selected workflow is BAU Development and we want to refresh once per week
- We have enough space available on our fileshare to allow for 2 (or more) distinct copies of the primary image
- Clones are being delivered to jump boxes / VMs within the network that are always connected (and not developer machines), and we can control when they are deleted
- We operate on a standard western work schedule where the week begins on Sunday, Saturday and Sunday are considered non-working days and developers typically work anywhere between 8am and 6pm
- This can all be automated using SQL Clone’s PowerShell module
Week 1 – Sunday night
- We create Image A of Primary Database from most recent backup file onto fileshare, applying data masking
Week 1 – Monday to Friday
- Developers X, Y and Z create their own clones of Database A as they begin the working week
- The clones are linked to a Git repo where, using SQL Change Automation, the developers commit all changes they make to their clones throughout the week
- Developer X finishes with their changes, makes their final commit and push on Thursday and works on a different task on Friday
Week 2 – Sunday night
- We create Image B of Database A – with slightly more up to date (and sanitized) data and capturing any deployed changes the team committed and pushed to git previously
- We retain Image A for now but do a check for which developers have clones remaining (Developers Y and Z) and either nudge them in the team stand up that they only have a few days left, or automate the sending of an email to those developers warning them their clones are now 1 week old
Week 2 – Monday morning
- Developer X creates their new clone from Image B and links it to Git ready to start making changes
Week 2 – Tuesday to Friday
- Gradually over the course of the week as Developers Y and Z finish with their tasks and commit their changes they remove their clones and create new ones from Image B
- A final reminder, as an email or a notification in MS Teams / Slack goes out on Friday morning that any clones of Image A will be deleted over the weekend
Week 3 – Sunday night
- Image A with no clones remaining is deleted (or any remaining clones are deleted first) and Image C is created to begin the cycle again
Although this workflow requires the duplication of the central image, it has a number of benefits:
- It is easily automated using PowerShell
- The source control process suffers minimal disruption and developers don’t need to rush to finish anything
- We don’t accidentally destroy developer work – the onus is on the developer to ensure work is committed
- If, for any reason the image creation process fails, you still have a persisting image, so you don’t prevent developers from doing any work / waiting for the image process to manually complete
- Moving to newer clones is a more organic process
- If you wanted to maintain an image throughout the week and refresh a second image overnight for more up to date data, you can simply re-purpose the above principles. This could then be used for a number of the different teams and workflows simultaneously
Bonus Point – Naming Conventions
Many people choose to append the images they create with a date stamp like Image_A_16102020 so we know when it was taken and what the latest is. This is good practice but be warned if you’re using Clones as baseline or for branch switching etc. you will need to have a persistent name else that link will break. An alternative is always having the same name for the most current image and then simply renaming the older image with the date time stamp e.g. Image_A is current, but before creation of a new Image_A, it is renamed to Image_A_16102020 – this will not disrupt the clones that already exist on it, and it allows you to always know which one is most recent.