Rules of Scrubbing Data

After setting up a scrub server, we need to review each database for data that needs to be scrubbed.  The data you need to scrub depends upon the government rules you need to follow.  Discovering the data to be scrubbed will require us to query the database and also talk to our developers.  Here is a list of possible field names that we need to search for;

Email

Address

City

State

Zip

Postal Code

Name

Credit Card

Social Security Number (SSN).

 

Here is a query to search for the fields,

select t.name as TableName

,c.name as ColumnName

,x.name as ColumnType

,c.max_length

from sys.columns c

inner join sys.tables t

on c.object_id = t.object_id

inner join sys.types x

on c.system_type_id = x.system_type_id

where c.name like ‘%email%’

and (x.name = ‘varchar’

or x.name = ‘nvarchar’)

 

Remember, this will only find the fields that match a certain pattern.  Your developers may help you find other fields.

The database might also have fields that have been encrypted with cell-level encryption or a third party product.  If data is encrypted, you will need to change the encryption certification to a non-Production certificate.  I recommend sharing the QA/Dev certificate for a non-Production environment.

After you have discovered all the fields that need to be scrubbed, next we determine how we want to scrub the data.  One thing to note, you may need to do something special when scrubbing the data.  For example, when I worked for a credit card company, we had to maintain the first digit of the credit card number.  Yes, there was some code which had the value information hard coded.

Now, let’s discuss the different methods for scrubbing data:

  1. Change the data. For example, change the social security number by maybe replacing the data with primary key padded with zeros.  With this said, this is assuming your primary key is an integer value.
  2. Scramble the data. With the primary key, pull the data to scrub out of the table and randomize the data.  After the data is randomized we will put the data back into the original table using the primary key.
  3. Remove the data. You can null, zero out or space out the data.  Remember, you are changing the selectivity of the data.
  4. Change all email fields. Remember you don’t want to spam your customers during development, QA testing or UAT testing.

When I am scrubbing data, I want to keep as much of the data the same as I can.  If I can scramble the data, it is my first choice.  For other options of scrubbing or generating data, please check out this article from Red-Gate, https://www.simple-talk.com/sql/database-administration/obfuscating-your-sql-server-data/.

In our next article, we will wrap automating the process of scrubbing data.