“Doing the best at this moment puts you in the best place for the next moment.”
– Oprah Winfrey
One of the things that comes up the most when I’m talking to people about database classification is multi language support. Often people are resigned to:
- Waiting on additional language support from their vendor of choice (defer)
- Writing their own software to do it in their native tongue (shift effort)
- Going the long way around i.e. manual classification (brute force)
- Do nothing (abandon)
There are currently over 7000 languages in the world and when it comes down to classification, you can’t expect that whatever your choice of technology for this classification workflow will have packs or native support for your languages and/or the languages your databases have been configured in.
Now, deferring and abandoning as above really aren’t options any more and if you’ve read my post here then you’ll know that database classification is a key concern to address before trying to base any business decisions on data. But how then do we tackle t he above issues of shifting effort and brute force?
Sometimes it’s about finding ways to still be effective and still take advantage of the benefits such a classification technology affords you.
With this in mind – I wanted to share with you 3 tips and tricks in SQL Data Catalog to make it work better for your language(s):
1 – Customize your taxonomy to be relevant / Passen Sie Ihre Taxonomie an, um relevant zu sein / Ajustați-vă taxonomia pentru a fi relevante
The great thing about SQL Data Catalog is that it is incredibly easy to alter the taxonomy to best reflect your view of your data. Not all standard fields will be relevant to you and some fields that are ABSOLUTELY necessary will be missing.
So add them in!
Let’s say, in my country of Christopia, it is mandatory that people identify a certain government specified “Classification Category” from 1-5, where 1 is the most sensitive and 5 is basically public knowledge. I can add that as a category and tag:
That goes equally for fields that we will not be making use of. If it is irrelevant then remove it. Simple.
I will say this for the above step though – don’t think of this as being the easiest and most obvious part, in many ways this is actually the hardest step because you will need buy in. Guess who also cares how you describe data stored in databases? Data Governance, Legal, InfoSec, Database Admins… the list goes on. Just make sure you do the 2 most fundamental parts that are required for success here – communicate & collaborate. If you have buy in from other concerned parties, you will establish a ‘common language’ with which data can be described and consumed, and based on which complex data-driven business decisions can be made.
2 – Customize your search rules / Passen Sie Ihre Suchregeln an / Personalizați-vă regulile de căutare
Going through every single column is a colossal waste of time, especially when we have more than about 6 databases. I’ve spoken with people who have spent day, weeks and even months working on classifications in SSMS or Excel, only to find out that 90% of their columns matched common regular expressions, and moreover that 50-60% of their columns were out of scope anyway!
Fortunately within SQL Data Catalog, you can configure the rules that the tool runs against columns and tables to identify any clearly sensitive columns which it will then suggest to you each time it finds something that matches.
In my experience, between the analytical filters and the suggestions, you can make excellent progress very very quickly, and do you know the best part?
Data Catalog is evergreen.
It constantly keeps track of your ever evolving schema and checks the suggestions against them and that means if somebody makes a change to the schema (like in my post from last week) you’ll get the suggestions coming through and you can easily catch anything that might put your sensitive data in harms way.
3 – Use PowerShell / Verwenden Sie PowerShell / Utilizați PowerShell
Ok I get it – this is my catch all for everything whenever I talk about using Data Catalog – but did you know, using PowerShell with Data Catalog is really easy.
Like even I can do it easy.
But the good thing is, when you use PowerShell you’re not constrained by particular functions or waiting for things to happen, rather you can take control of the process and fill it with your own logic! There’s even a full on guide that shows you:
- How to generate an authorization token
- A full cmdlet reference
- Complete worked examples
and the best part is it’s just part of the documentation! Don’t you love it when people keep documentation up to date and useful? Yeah me too!
Bonus Tip / Bonus-Tipp / Sfaturi Bonus
If you work in an environment that is multi-lingual and has a mix of languages associated with different databases which haven’t been standardized, that should just highlight further that this is a team game.
Pairing and mobbing, just like in coding, can be vital and powerful tools at your disposal – so it’s important to have the right people in place to handle this.
Classification is not infallible, much as you may wish to think it is. Data is data, and if there’s one thing I have learned about data it’s that it always manages to surprise you, so a second pair of eyes (or more) can be invaluable in helping you weed out some of the problems you may be facing.