A brief history of PlantBasedSQL

There is no fundamental difference between man and animals in their ability to feel pleasure and pain, happiness, and misery.”
Charles Darwin

Note: The post below is my viewpoint on going Vegan and is not designed in anyway to attack or criticise anyone for the choices they make. I will not describe in depth what I witnessed during my research into making this choice for myself but I will provide optional links at the end of the post if you wish to start looking into Veganism. Thank you.

In December 2016 I went vegetarian. I had been living with my then-partner, now-wife for about 9 months and things were going great. When we got together though, I was a vehement meat-eater, in fact eater of all things animal; meat, dairy, eggs, you name it.

I even remember arguing with one of our friends over Christmas in 2015 that, and I quote, “I can’t see myself ever NOT eating meat. Ever.”

My wife though, at that time, was mostly pescatarian and therefore we never really had or cooked meat at home. I love to cook so from the moment we moved in together into a tiny (TINY!!!) flat in March 2016 I saw it mostly as a challenge that I could rise to, to cook more vegetarian food; so I started doing some research.

What I found horrified me to my core.

Many of the vegetarian and vegan bloggers I started to check out included (as part of their blogs and recipes I was following) justification for their lifestyle, reasons why they chose a vegetarian or plant based lifestyle and I was intrigued. I checked out the references, the sources and studies and documentaries, I made notes and discussed my thoughts with my wife and family and others I knew who were veggie or vegan and realized I had lived a life in ignorant bliss of the suffering that took place to fulfill my need for a burger, or bacon, even sweets like wine gums (which I loved but are full of gelatin).

So I made the switch and honestly, it shocked everyone around me (particularly my family) that I, the meat eater, the lover of BBQ, meaty curries and Tex-Mex, would give it up for the rest of my life. But everybody blamed my wife for this. Perhaps blame is too strong a word though… they attributed it to my now living with a mostly-vegetarian.

But no, I came to this conclusion myself. From pictures of slaughterhouses, caged animals and intense farming of everything from cows to pigs to fish, I realized that I would never see meat in the same light again, and it’s not as though I didn’t KNOW this happened. When I was a meat eater of course I knew this was the case, but when I really looked, I realized that my personal dinner preferences should never, ever cause something like this.

The research continued and just over 2 short years later on 1st January 2019 I did something great, I tried Veganuary. Veganuary is a vegan-January challenge that asks only that you give up animal products and try eating and living a Vegan lifestyle.

At this point the teenager in our house had already been vegan for some time; she had come from a predominantly meat-eating country (Romania) and so at home we were mostly cooking plant-based so we could all eat together anyway! Curries, stews, soups, pasta, pizza, nut-burgers, salads, buddha bowls, our diet was not restrictive – but i was still eating eggs and cheese at work, and when we went out for dinner. It wasn’t long before the articles and documentaries led me to look at the dairy and egg industries.

Again, absolutely terrifying.

Shortly before trying Veganuary, in September or October 2018 I had a nightmare. I won’t go into detail but involved trying desperately in vain to free cows from a dairy-slash-slaughterhouse and it was harrowing. I woke up completely drenched in a cold sweat and decided that it would not be long until I completely phased out all animal products, and January was the time to do so.

I don’t miss cheese, or eggs. I thought it would be hard, but it wasn’t. Yes, vegan cheese isn’t quite there yet (unless you’ve tried the new Applewood UK Vegan cheddar OHMIGOSH) but honestly, even if the alternative isn’t there yet – it’s still better than the version requiring we first exploit a living being that doesn’t have the means to defend itself.

Now, almost 12 months later, I still maintain that (besides the decisions to marry my wife, to look after our teen and to join Redgate) it was one of the very best decisions I have ever made.

I encourage you, if you’ve ever been curious, to try it for yourself. It’s surprisingly easy, but most of all it gets you to think about what you eat, how you fuel yourself and about the well-being of all life on the planet. If you need some resources, or want to answer some common questions, I’ve included some resources below:

Where do I get my B12 and Protein? Watch the Game Changers Netflix Documentary – trailer here: https://www.youtube.com/watch?v=iSpglxHTJVM

Is a plant based diet healthy? Watch the Forks Over Knives documentary on Netflix or Youtube – trailer here: https://www.youtube.com/watch?v=DZb-35oV_7E

How would eating vegan help to stop animal cruelty? Watch the Earthlings and Cowspriacy documentaries – trailers here: https://www.youtube.com/watch?v=Hm7Babs_FJU and https://www.youtube.com/watch?v=nV04zyfLyN4

Where are some good resources for plant-based cooking? You can follow the below leaders in this arena (there are loads on YouTube in general though):

Where can I eat Vegan? There are tonnes of good places that have their own vegan menus and options, a few chain restaurants in the UK who offer great vegan alternatives inlclude:

  • Bella Italia
  • Pizza Express
  • Frankie & Benny’s
  • Zizzi
  • Wagamama
  • Byron Burger
  • Pret a manger
  • Giraffe
  • Pizza Hut
  • Papa Johns
  • and Subway!

Happy holidays and here’s to a happy 2020!

“How long does data masking take?” Part 1

“Exploring the unknown requires tolerating uncertainty.”
Brian Greene

Another in the data masking series and I thought I’d go with an easy one to answer. It’s right up there alongside the other easy questions we all ponder like:

“What is the meaning of life?”
“How do I build a fusion reactor using common household objects?”

and the classic “Can Vegans eat Animal Crackers™?”

The answer to any of the above questions is, of course, it depends.

If you’re content to ‘live life to the fullest, travel and be happy’, you happen to have spare protium and boron-11 just around the house and you’re content to bypass the ethical conundrum that although the crackers look like animals, they do not actually contain any animal products then the above are theoretically answerable, but data masking… whooo that’s a doozy!

There’s no easy way to benchmark this process, honestly. It is something that will be subject to many things you may wish to think about prior to the deed:

  • How many columns are on the tables you want to mask?
  • How many rows are in the tables you want to mask? (Which are the biggest tables?)
  • How many indexes are on the tables? Will these get in the way of a static masking process?
  • Is data duplicated across multiple tables and does it need to be synchronized to maintain referential integrity?
  • How powerful is the machine you’re running the process on?
  • WHERE is the Database you’re running this process on? (Same machine, Network, PaaS…)
  • Will anyone be trying to connect to and running queries on against the Database being masked?
  • Do you need to mask across Databases for consistency?

All of these things will affect the process, both in terms of how you configure it and how long it takes.

I will endeavor though, to give SOME idea. At the very least it’s a good starting point and hopefully if you decide to run any testing of your own you will see similar, or even better performance. For this test I will use Data Masker for SQL Server but in future blogs I’m hoping to test out a few other methods as well, so stay tuned!

For this test, I will be using the following setup – note this isn’t what I would expect a real setup to be, I would expect the staging server being used for masking (and potentially Image creation in SQL Clone) would be a teensy bit more powerful!:

  • Microsoft SQL Server Developer 2016 (64-bit)
  • Microsoft Windows NT 6.3 Server hosted in AWS EC2 (c5.2xlarge)
  • 4 Processors (vCPU), 16 GB Memory
  • Redgate Data Masker version 6.3.20.5034 installed
  • DMDatabase (tables below in Data Masker) – this is my demonstration database in simple recovery mode with:
    – 1,000,000 rows on the DM_Customer table
    – 50,000 rows on the DM_Customer_Notes table
Data Masker Tables Tab showing row counts for large tables

For the purposes of this test I will mask: firstname, lastname, company name, street address, region, country, telephone number and zip code on the DM_Customer table and the notes field on DM_Customer_Notes (a free text field). Once I have completed this, I will then create an email address on DM_Customer using the masked first and last name values, and then finally I will synchronize the masked names across to the DM_Customer_Notes table, where they are duplicated.

The output masking set is as below. It uses a substitution rule for the columns on DM_Customer, a row-internal synchronization rule for the email address, a search-replace rule to randomly replace all of the characters in the free text field and then finally a table to table synchronization rule to copy the masks across to DM_Customer_Notes – for the rules I have enabled Bulk Substitution and set the commit frequency to 100,000 to speed up building and committing the statements (hence Simple Recovery mode):

When run this process takes… well actually I stopped it. It’s paaaaaaainfully slow. This is probably because I’ve not utilized Data Masker’s worker threads to their best and I’m reliant on a single connection to carry out the process – we’ll be here for a while!

So, I’m going to make some performance improvements! I’m going to split the substitution and row-internal rules into ‘Split Range Manager’ rules (video on how to do this here) – This is what it looks like now (rules expanded) so i can utilize concurrent connections to break this big table up:

…and guess what?

Masking stats report

Total run time to mask and synchronize all of these rows was 1 minute 26 seconds. I would say that’s pretty good, and the results set looks great too!

Results set from masking: Contacts

So in conclusion – how long does data masking take? Well, it still depends.

I didn’t have lots of non-clustered indexes over these tables that I had to take care of first, I didn’t have very complex operations being performed as command rules or table to table rules matching on free-text fields, but I did have an integer field I was able to split the range across, so things went my way… this time.

But it can go fast and you can make improvements, as shown above.

If you have any questions on how to improve the performance of your Data Masker masking sets feel free to let me know! Tweet me or get in touch!

Happy masking!

Oh, and Happy Holidays / Merry Christmas / Happy Tuesday! 🙂

I got to watch a team work fast and it was awesome!

“Unless someone like you cares a whole awful lot, nothing is going to get better. It’s not.”
Dr Seuss

I care a great deal about anything I work on, as I mentioned in my previous post here (ironically about not working for once), so it goes without saying that I am super invested in any customer I help to get up and running with any of the Redgate solutions, and none more so than Data Masker.

Ever since it became part of the Redgate family Data Masker has been an integral part of my workday – there aren’t many days where I don’t interact with the tool or it’s concepts in some way and when things go wrong, the tool breaks, something doesn’t work like it should, well not only is it less than ideal for me (showing me up) but it’s not delivering value to the customer.

Now that. I hate with a passion.

However, I’m lucky enough that once that happens, that is not the end of it – we don’t pack everything up and say “well… sorry all, that’s your lot.” No – I get to speak directly to our fantastic support team and the equally as fabulous and helpful development team directly behind the tool, and guess what? They care too. Immensely.

I can think of 2 key examples of this team working in the most incredible way, you wouldn’t even believe (well maybe you would), but it goes to show you what is possible, especially when you break down the silos in your organisation. I never became just “a ticket from a sales engineer”, and this is how they helped me fix 2 problems:

1 – UTF-8 encoding of strings for substitution rules

I was working closely with one of the Business Development folk (little side nod there to Kendra for saying folk so often I’ve started saying it) in Redgate’s sales team who were working with a potential customer in a country where Arabic is the primary language. As such, you would expect them to want to use Data Masker to mask Arabic names like اَمير‎ (“Amir” in English) into data sets, instead of something like “Frank”, which just doesn’t have the same ring to it.

It turns out that in the port across of Data Masker from it’s older v5.5 to the swanky new v6.0 (yes this was a little while back) the ability to change the encoding of strings from user defined data sets had been broken, which meant that the values from Data Masker weren’t being inserted correctly in the table, rendering all of their Arabic sets useless. This was a huge blocker to their trial, which was under time constraints anyway.

I reported this to the Data Masker team on the 7th February 2019 at 12:53pm, created a support ticket for reference at 2.03pm and had personally spoken to them by 3.00pm. The lead developer, support rep, product designer and myself quickly met up to discuss it and agreed that as this was a bug, broken functionality that should exist (and which could block not just this customer but any customer requiring it for other language sets) that they would down tools and work on a fix immediately.

By 9.00am the following day a fix went out the door. Built, tested, deployed. Who did they involve in the testing? Me, initially. Then they waited on feedback from the potential customer, who also confirmed it worked after upgrading.

Wow. That’s what I call fast!

2 – Time-out tennis

More recently (think November 2019) I was working with a customer of ours who I had built up a great relationship with – they were super friendly, super responsive and all round great to work with. Unfortunately as we were getting their masking sets set-up (pun definitely intended) we started encountering time out issues when waiting for masking stats to return.

This was a little irksome, as it was a slight dent in my relationship and credibility and was slowing them down, as it was causing sets to not complete at all.

The problem was though, it wasn’t such an easy fix as with Number 1 above, as it wasn’t exactly clear what was causing the timeouts and I wasn’t really sure what were the best places to check! Fortunately the “Masketeers” as they are more commonly known around Redgate towers, did have an idea of where to look. A nominated member of the development team (and to him I will be forever grateful) almost became a little subdivision of the team – it didn’t require their full might, just someone who knew even more intimately what was happening than me!

Through a few ‘back and forth’s with the customer, experimenting with timeouts and making some tweaks we were able to establish what was going wrong, and ultimately provide a fix. This work became a new branch which was merged into the main base after testing once again and was released the very next day. Finally, the customer let us know it was all working again and sets were completing as they should.

Conclusion?

Sometimes you have the pleasure of working with some unsung heroes where you work, I do it on a regular basis – from the facilities and cleaning team here in the building who do the most incredible work to look after us, to the Sales management team who are constantly looking at ways to make us the best possible company to deal with – Redgate is definitely a place where people can do the best work of their lives.

But on these occasions, I got to witness something special. Cross-functional collaboration. Communication. Empathy. Passion.

And just getting the work done. When it’s about delivering value to customers, feedback, development and testing are everybody’s job, and I’m lucky enough to work with people who put that theory into practice.

Why I dont work on planes, and why I dont think you should either.

“Work, love, and play are the great balance wheels of man’s being.”
– Orison Swett Marden

I’m a big supporter of mindfulness, and treating yourself with the kindness and compassion you deserve, and which we as human beings need. When you go to work I’m sure many of you (like I do) give the metaphorical 110% to make sure that when you get a job done, it is done well… but it shouldn’t be at the cost of your mental well-being.

Some countries and businesses practice 4 day work-weeks to help people maintain a work life balance; others provide longer breaks, mindfulness rooms or meditation sessions (among others). Why do you think they do this? First and foremost, we can trace it back to the old saying “happy workers are productive workers” and assuming we, the employees, don’t get complacent with the benefits / perks our employers provide us, this really rings true and it enables us to do the best work of our lives and feel valued doing so.

I love what I do (notice I didn’t say my job). Having the ability to help people tackle some of the hardest problems they face in day-to-day life is a benefit all in itself, and hearing about your input being revolutionary to someone else just really can’t be beaten. So when I frequently come into the office at 6am to answer a tide of emails or prepare for multiple back to back demonstrations, or when I reply to colleagues in offices across the globe from my phone at gone 9pm at night; I don’t resent that fact… I thrive on it. I feel like I truly make a difference.

But what about when I’m asked to travel for work?

“Chris, can you take the train down to London? There’s an event we’d like you to be at!”

“I need you to fly to LA/Austin in a couple of weeks to help with some inter-office training”

“We’re attending an event in the Netherlands, can you fly out Friday and spend your Saturday on site?”

That’s great – I can’t wait!

But. Travel is work too. It involves taking you out of your routine, away from your family/friends for any length of time and genuinely requires you to be “on”. The whole process is exhausting; waking up early, getting home late, bustling airports, overcrowded train stations, uncomfortable hotels and sitting in traffic for hours on end – does that not sound like work? Especially for those of you (us) who suffer with anxiety in these sorts of situations. Even if you get that time back in lieu, it is time you will never really get back.

When you work whilst you’re travelling you just add stress to your existing stress and double-stress™ yourself – and the effect of stress on our body is well documented and blogged about (see here and here and all the smart people here!) and this creates negative associations and resentment. Resentment kills your passion for the thing you love to do, and turns it into “just work”.

So what do I actually do when I travel instead of pulling out my laptop and replying to someone’s questions about database deployments?

I stop. I relax.

Invest your time in what brings you joy and actually turn something tedious, time-consuming and stuffy into a fun and exciting journey. For many this could be anything from watching a film or reading, to personal development and learning something new, or even just watching the trees go by out of the window (not so easy on a plane but you get my point!) – but it should be something that makes you happy, we can thank Stephen R. Covey for this nugget of wisdom: “Most of us spend too much time on what is urgent and not enough time on what is important.” and this is something we can all stand to be reminded of occasionally.

You don’t need to feel guilt at the thought of not working when travelling, 7/8 hours at work followed by 4, 5, 6+ hours travelling is still at the very least an 11+ hour work day. Ultimately, taking time for yourself is the very least you deserve when you are travelling for your business.

If you’re struggling with this same stress though, you might well take the opportunity to explore what matters in your life and build a happier you. Happiness is a choice we make for ourselves.

So, what’re you going to do on your next business trip?

Deterministic data masking – the who, who and who? (and how?)

“Security is always excessive until it’s not enough.”
– Robbie Sinclair

You may not already know this about me, but I kinda like data masking.
Scratch that, I LOVE data masking.

Increasingly both around Redgate and in general I seem to be getting a bit of a reputation as “the data masking guy” but for good reason – to include yet another quote, from Joe Kaeser this time: “Data is the oil, some say the gold, of the 21st century…”, more and more I hear stories about people leaving their oil/gold out for everyone to see, opening up the widest attack surface area by doing things like copying backups down into non-Production environments or exposing test systems to the internet – the list goes on.

This means that people turn to all of the protective methods they have available to them: encryption (TDE, row and column level etc.), static and dynamic data masking, access control… and many combinations of them.

One of the big points I always have to cover when it comes to static data masking though, is something called “deterministic masking”, so let’s start with 2 definitions of my own to make sure we’re on the same page:

Static data masking is the process of de-identifying sensitive data-at-rest within the tables of your Database. It is typically used to provide realistic, Production-like data into non-Production environments like Dev and Test, and even sets that are given to 3rd parties. This relies on retaining non-sensitive business specific fields within rows and taking anything considered PII (Personally Identifiable Information) or PHI (Protected Health Information) and either scrambling or replacing it with similar but ultimately false data.

Deterministic data masking is the process of masking data with values in a repeatable way, such that it will give the same value when masked in any and all future runs on any value that matches and will create a new record for values which have not been previously masked. An example of this would be if you were to mask “Chris Unwin” to “Brad Pitt”, it should appear as “Brad Pitt” not only in our (for example) dbo.Contacts table but also all associated tables (regardless of PKFK relationships at the DB level) and every single run should provide the same output. This is useful for building up familiarity with the data and utilizing for future test runs.

Now. I should caveat this blog post (you’ll find I’m always caveating my posts) with the fact that deterministic masking does have it’s benefits but the very idea that one thing should always become another, in my eyes, is inherently less secure than something that always gets de-identified to a different value. As such, I will always recommend that where possible, masking runs should produce differing values. Deterministic masking is also compute heavy because instead of simply randomizing in values, it will have to check up front the value to be replaced and then replace it with the corresponding value, another potential downside if speed is a key driver in the process you’re trying to put in place.

Most masking tools either support deterministic masking directly (*cough* dbatools.io *cough*) or require a little bit of configuration to get started, but it is a workflow that should be catered for, for those who need it. So here is a quick getting started guide for deterministic masking, if you’re writing your own scripts or (as I will in this example) you’re using Data Masker for SQL Server from Redgate.

Step 0 – Figure out where you’re going to do the masking

There are lots of different ways to move data around and get things masked before exposing it to development and testing teams (or handing it to 3rd parties). In this example, I’m going to assume that the only thing I have available to me is a backup and restore process for moving data around, nothing fancy like SQL Clone to help.

So for this I will assume I have a staging instance somewhere that I can use as part of this process, lets call it WIN2019 and the process I’m going to follow is:

  • Restore a full .bak file to WIN2019
  • Carry out the data masking process on this instance
  • Backup the masked DB
  • Move the new .bak into lower environments (restore / make available to Devs)

Step 1 – Map it for multiple future runs

The first problem you have to contend with is needing to maintain a record of how we want to mask things. If we always want to mask the credit card number 3422-567157-24797 to be 3488-904546-46471 then we need to have a place this is stored. The question to ask ourselves here though is WHAT needs to be recorded. There is a huge difference between:

CreditCardBeforeCreditCardAfter
3422-567157-24797 3488-904546-46471

And

CustomerIDCreditCardAfter
1000000001 3488-904546-46471

The latter is obviously far preferable because it does not contain any sensitive PII – it is purely the masked value and a non-sensitive CustomerID which only really makes sense within the company or is a system identifier.

So we should get a mapping location set up on WIN2019, I don’t want to make my tables too wide so I’m going to keep this fairly small and atomic – we’ll create a Database on the server with a mapping table for the credit cards:

CreditCardMap table in SSMS - CustomerID as INT (PK) and CreditCard as nvarchar(60)

This is going to be the basis for our repeated masking. The reason for having this as a separate DB/Table though?

1) The mapping should persist – if it exists in the same DB then we will just overwrite it every time, rendering the mapping useless.
2) Devs/Testers don’t need the mapping – just the end result.

Step 2 – Set up the masking to cover the tables, regardless if there is a mapped value

One of the most important phrases in the GDPR is “Data Protection by Design and Default“, and it’s one of my favorites. In this context I am going to interpret this in a very specific way, and that is: “we must mask everything, before trying to map it back to a value that exists, just in case the link to the MaskingMapper DB were to fail for any reason.

I first restore a copy of the Database I’m going to mask (the DMDatabase) and then setup a Data Masker substitution rule to process the DM_Customer table, de-identifying the credit card numbers:

Data Masker substitution rule to mask Credit Card Numbers with invalid AMEX CC numbers, also (fake) customer credit card Nos. displayed in SSMS

Step 3 – Copy the distinct values across into the mapping table

This step is going to be as simple as writing a single tSQL statement to copy the values across – in Data Masker I will wrap this into a Command Rule and make it dependent on the previous substitution rule:

INSERT INTO MaskingMapper.dbo.CreditCardMap
(
    CustomerID
  , CreditCard
)
SELECT DISTINCT
       customer_id
     , customer_credit_card_number
FROM dbo.DM_CUSTOMER
WHERE (
          customer_credit_card_number <> ''
          AND customer_credit_card_number IS NOT NULL
		  AND customer_id NOT IN (SELECT customer_id FROM MaskingMapper.dbo.CreditCardMap)
      );

Step 4 – Sync everything back together

Finally – we need to bring any information back from the table if it had values written to it in previous runs. In tSQL we could write an UPDATE with an appropriate WHERE clause but I’m going to use an additional controller and Table-to-Table Sync rule in Data Masker to handle this:

Rules in Data Masker - Substitution to mask data, Command rule to update the mapping table and a Table to Table rule to sync back into the table

Result

If we now run this we will have achieved deterministic masking, because we have the following before and afters – first for the DM_Customer table:

DM_Customer Credit cards before
DM_Customer credit cards after masking

and for the CreditCardMap table:

Mapping table prior to masking
mapping table after masking

The mapping table now has 77 rows and if we repeat the masking step by step without changing anything we can see that the credit card numbers change in the first instance, but then synchronize back to the values that should persist, the images below represent just running the first two steps in isolation (i.e. masking everything regardless – left) and then the synchronization job restoring the predetermined values (right) and the mapping table still has 77 rows.

Now if in the next run one of the NULL/blank fields has a real credit card number, or we add any additional customer IDs (i.e. with a more recent backup with fresh data) they can be masked, accounted for and persisted between each run.

Conclusion

Deterministic masking is hard, but it is possible. You can use a number of methods to achieve it, such as the above, but the first question you need to ask yourself (after “do I feel lucky”) is:

“Do I NEED deterministically masked data, or is it a nice to have?”

9 times out of 10, I’m pretty sure the answer will be that it is not essential, and therefore you should focus on making sure the masking of the data is random, static and fast. Adding compute to this process will only slow it down and at the end of the day, we just need to make sure our customers data is protected.