Miscellaneous

Securing Vertica with Column Level Encryption

0

This blog will explore options for organizations who are interested in providing protection for sensitive data at the field or database column level. Although the focus on this blog will be on demonstrating different techniques on how to protect sensitive data from credential users such as a regular database user or DBA other topics on protecting fields in flat files will be covered as well.

In a previous blog on transparent encryption, I talked about how to encrypt a database with transparent encryption.  It is important to understand the differences between transparent encryption and column level encryption which is explained in more detail in the next section.

Decision Tree

The following diagram is a decision tree on building a strategy for protecting data in your database.

The left side of this decision tree represents the scenario where you are protecting someone from stealing the actual database files or running a strings command on the data to see clear text.  This access is done at the operating system level.  The right side of this decision tree is for scenarios where you are interested in protecting sensitive data from a credentialed user or database users.

Depending on the kind of threat you are interested in blocking will determine the type of encryption to implement. Keep in mind that it is typically faster to implement transparent encryption since no code changes have to be made.  Because of this, organizations who are interested in blocking credentialed users still might want to first implement transparent encryption to safeguard Operating System(OS) user threats while the development efforts are being implemented for column level encryption.

The diagram above is a good description to help better understand the kind of encryption to implement but a more detailed matrix titled Security Exposure Matrix is provided for reference in the Appendix in the download file at the end of this blog. The next section reviews many of the factors to be considered when implementing column level encryption.

Overview of Column Level Encryption

Before we get into the particulars of how to implement column level encryption a simple example can help clarify how it can be used from an end users perspective.  The idea is to use the custom function like any other database function and be utilized in existing Business Intelligence environments if that is the requirement of the organization.

In the above example the vormetricdetokenizedata is a custom user defined function that detokenizes the value in the database that has already been tokenized. As you can see when it ran the results of the vormetricdetokenizedata matches the pos_tranaction_number. This is for demonstration purposes only and in a production environment the table would not normally keep the un-tokenized column.   As you can see once a person logs on to the database and the sensitive data is not encrypted they will be able to run queries against the sensitive data and very easily dump all that data to a file.  If the data is tokenized or encrypted then it will not be possible to obtain access to the sensitive data, assuming the appropriate method of decrypt has been implemented.

There are many things to consider when implementing column or field level encryption such as:

  1. Options for column level encryption.
  2. Field or database column format considerations.
  3. Impact of using Custom User Defined Functions in a Database.
  4. How to handle unencrypted data already in database or flat file?
  5. How to process new data?
  6. What method to encrypt vs decrypt?

Options for Column Level Encryption

When making the decision to use field level or column level encryption there are many factors to consider like, complexity to implement, what degree of security does the option provide, does the method used provide separation of duties and the cost from an initial product purchase and maintain over time.  In order to best understand the various options available to protect your data a matrix below has been built that provides various factors to consider.

The first thing to consider is to decide if it is absolutely critical that the DBA be blocked 100 % of the time from having access to sensitive data or are there other ways to help reduce the risk of data theft such as monitoring database activities.

For example if it has been determined that it is critical to block the dba or any other database credential user from decrypting sensitive data then the only option that can be used is the last option in the matrix below “Vormetric SQL Function to Encrypt & External App Code to Decrypt” , either all code control and no sql functions or just have an encrypt database function on no decrypt database function.

In most cases the first example showing Native DB option will also be ruled out since it does not provide separation of duties and the limitations of using varbinary datatype and grant access.

Once sensitive data has been categorized, it may make sense to provide a hybrid approach to the various categories of data.  You may choose to use “Vormetric SQL Function to Encrypt & External App Code to Decrypt” for super sensitive data and Vormetric SQL Function credential in code” for other data elements that are not as sensitive.

The matrix below shows a couple of ways to implement column level encryption.

  1. Vormetric Application Encryption (VAE) – using standard pkcs11 libraries to implement encryption.
  2. Vormetric Tokenization Server (VTS) – a tokens server that uses rest calls to either tokenize or detokenize sensitive data.

The differences between these two types of technologies will be described later in this blog.

MethodComplexitySecurity LevelSeparation of DutiesCostMethodNotes
Native DB FunctionsLowLowNoLowNativeDBA has full control over functions and varbinary support only. Audit grants & decrypt function.  Different solution for each DB.
Vormetric SQL Function credential in codeMediumMediumDev., DBA

VAE - Key

VTS - Key & Voradim
Medium. Time to implement.

Upgrades
VAE or VTSOnly provide class file and protect it with obfuscation like proguard.  Audit grants & decrypt function.
Vormetric SQL Function to Encrypt & External App Code to DecryptHighBlock DBA, Most secure.Dev, DBA
VAE – Key
VTS – Key & Voradmin
High upgrades(VAE or VTS)
And Code for decrypt.
Only way to get unencrypted data is thru application.
Not SQL friendly for decrypt.

select vormetricencryptdata(‘-44-545-4′,’FPE’) from dual

Assumptions for various methods listed in the matrix above.

  • If varbinary output is required then VAE will be only option.
  • Key management is provided by the Vormetric DSM for all scenarios but the native db option. Clear text of the key is never exposed and only the key label is referenced.
  • All custom user defined functions scenarios listed below assume the code is written by a developer and the DBA has NO access to the source code. The DBA can only issue grants to the function.  Only the class file will be provided and obfuscated so it cannot be reassembled.
  • Secure Level – Low means less secure environment

Field or Database Column Format Considerations

The first factor to decide on is the format used to encrypt your sensitive data. Most traditional types of encryption generate an encrypted ciphertext that includes binary characters.  Using this type of ciphertext often requires more work from a development perspective since the data type of the original data is most likely will not be varbinary and as a result your files will need more space and the application logic will have to change in many places to accommodate this increase in field type and size.

There are many drawbacks to using varbianry datatype such as:

  • Cannot index a varbinary column.
  • Cannot create primary and foreign key relationships based on varbinary.
  • Have to modify any code that displays this data in a UI.
  • Increase storage required to support varbinary.
  • Degradation in performance from a query optimizer perspective.

Most native database column level encryption methods only provide varbinary as a datatype. Another format supported by external encryption companies like Vormetric is one called format preserved encryption (FPE).  This allows you to still encrypt or tokenize the data and keep its original format. Below is a simple example using a curl command.

Here is a screenshot of the UI for the Vormetric Token Server with showing the template used for the above example. As you can see it is using the FPE format along with accepting digits for the character set.

Impact of using Custom User Defined Functions in a Database

This is for scenarios where you have decided to enable your Business Intelligence(BI) and SQL users’ easy access to columns that have been encrypted in the database. It is important to keep in mind that Business Intelligence Tools will require code changes to use the new UDF. There are a couple of ways to implement this either using database views to access UDF functions or leverage the BI tools meta data layers to implement. For example IBM Cognos has Framework Manager, Business Objects has the  Universe Designer and Tableau allows for custom sql.

Any existing applications that need access via SQL will require code changes to use the UDF as well.

Implementing a Custom Database User Defined Function Process

Once it has been decided that allowing sql access to encrypted data there are three major tasks that you will encounter when writing a custom database UDF.

1.)Decide what encryption/tokenization technology meets requirements. – It is important to look at the architecture of the different kinds of encryption or tokenization capabilities and understand if one better fits your particular requirements.  Time permitting you may want to test both options and see which best fits your needs.  Listed below is a matrix as of 2/2018 that can assist with the various criteria.  This Matrix should be revisited on a regular basis for accuracy since new capabilities are always being added.  Another factor to consider are the various regulations requirements such as PCI scope reduction.

 Agent InstallAD/LDAPAPIKey RotationUIInput Data OptionsData MaskFPEVar BinaryPlatforms
VAEYes*NoPKCS11YesNoLessNoYesYes*
VTSNoYesRESTNoYesMoreYesYesNoAll

Notes:* It is possible to implement a multi tier architecture to enable VAE to work with any particular platform.  Another blog will be published on this topic in the future.

2.)Learn the appropriate encryption vendor API’s. – In this case Vormetric provides pkcs11 for VAE and a rest based API for the VTS Tokenization Server.

3.)Learn the database UDF architecture and decide on a programing language. Some programing languages are faster than others are so it should be a serious consideration when looking at the various options. Also most databases already provide some samples you can review to get started.

For code examples on how to implement custom UDF’s please see Appendix section Sample Usage UDF’s provided in the download file at the end of this blog.

Note: The intent of this blog is NOT to write a production ready UDF but to demonstrate how it can be implemented.  The code provided was built from a demonstration perspective. For example, the VAE code listed to encrypt accepts as a parameter the algorithm that most likely would not be done in a production environment.

How to Handle Unencrypted Data Already in Database or Flat File?

A common use case for most customers is deciding how to handle unencrypted data already in flat files or a database. There are a few different options that can be evaluated such as:

  1. Vendor provided batch utility.
  2. Custom code.
  3. Database UDF.

Option 1. Vendor batch utility.  Vormetric provides a utility called batch data transform (BDT), that takes an input file of sensitive data and produces an output file of encrypted data. This option can be used for regular flat files or an export can be done from a database and then reimported.  For more details see link.

https://www.thalesesecurity.com/products/data-tokenization-masking-and-transformation/vormetric-batch-data-transformation

Option 2.  Custom Code. Another option is to write custom code to loop through each record in a file and then call the encrypt function for the sensitive data and then write the output back out.  This can also be done for a database extract file as well or just write a jdbc application that updates the database. See Appendix (External Code Encrypt Example, provided in the download file at the end of this blog)

Option 3. Database UDF – If the data is in a database then the last option is to use a database function to update a particular column.  Here is an example:

Note: Since the method is FPE above no database changes are required to implement.  If other method was chosen such as CBC which requires varbinary datatype then the above would have to be changed to create a new column in the table to store the value.   See Appendix provided in the download file at the end of this blog for example UDF.

How to Process New Data?

The best method to encrypt data is at the point of entry into your applications.  This might not always be possible if for instance customers are sending flat files to a sftp site.  If this directory was protected with transparent encryption, it will ensure that file is encrypted but once it has been decided that certain columns need to be encrypted before it is loaded into the database then other options need to be evaluated with either code or using a database function or database triggers.   Please see Appendix provided in the download for examples.

Summary

This blog demonstrated how it is possible to prevent sensitive data from getting into the hands of persons who could use this information in an unintended manner.  It also showed how it is possible to even block a DBA from having access to sensitive data by only providing an encrypt function and only allowing access to decrypt using a separate application.

We also explored how it is possible to write a User Defined Function leveraging two different types of technologies

  1. Tokenization with rest API’s
  2. Application encryption using industry standard pkcs11 API’s.

My hope is that organization will aggressively start to implement these technologies to protect us from the possibility of identity theft when this data gets into the wrong hands.  For sample code and more details about this blog download this file.

About the author / 

Mark Warner

Mark Warner previously worked for Vertica for nearly 4 years with the Partner Engineering team.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.

Notice

This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: