Miscellaneous

Securing Vertica With Data At Rest Encryption

0

Time to value and concerns about performance are common reasons companies do not protect their most sensitive data with encryption.  Now that security breaches are becoming more common, organizations should consider encryption as an important element in their security architectures to reduce the impact of a hacker stealing their most sensitive data.  It is important for organizations to understand what kinds of encryption are available today and what risks are mitigated by implementing the various types of encryption.  Listed below are the different types of encryption available to protect your data.

  • Application or tokenization level
  • File level
  • Disk level

The focus of this blog will be on file level (transparent) since it provides the most protection with least amount of complexity and risk.   For instance, disk encryption only mitigates the risk of someone stealing your hard drive or laptop, once the system is up and running your files are still totally exposed with disk level encryption.

In addition to the database files there are many other files that should be candidates for encryption such as:

  • Backups
  • Backup config files that may contain clear text of passwords
  • Source files to be ingested into the database that make contain sensitive data.
  • Scripts, property files or system log files that might have userid and passwords in them.

Although some databases vendors provide their own encryption there are some good reasons to use an external product like Vormetric to encrypt your data such as:

  1. Policy management and auditing capabilities.
  2. Single solution for all data at rest.
  3. Shielding the DBA from sensitive data with column level encryption or application encryption.
  4. Reduce risk by implementing separation of duties by having a separate repository for encryption keys and encryption key management.

For more information on Vormetric see the following link.

Assumptions

  • The focus of this blogwill be transparent encryption or encrypting (VTE) of data at rest. Future blogs will demonstrate how Vormetric can also provide the ability to encrypt specific Vertica database columns with industry standard PKCS11 apis or via REST calls using tokenization.
  • Vertica 8.0.1 was used for testing.
  • Vormetric 6.0.2 was used for testing.
  • No optimization were done to the database. % Differences can be used as guidelines and starting points for your own testing.
  • Vertica was running on a VM with 4GB RAM and 2CPU with 250GB of total disk. All disk I/0 was on same disk in VM which was the bottleneck.
  • No other activity on the database or machine at time of tests.
  • Some knowledge of Vertica or another database will be helpful.

Overview

As indicate above a couple of reasons organizations do not encrypt their data are:

Time to value – Encryption should not be disruptive to implement.  This blog demonstrates how quickly encryption can be implemented without having to write any code or make any modifications to the database or applications accessing data from the database.

Performance impact .  This blog will show some examples of different workloads, which allows organization to plan for proper implementation when considering encryption of a database from a performance and applications perspective.

A big benefit to using an external encryption product is because a single solution can encrypt any database so for testing purposes the choice of a database does not really matter because the process is the same.   Other than being extremely fast, Vertica database was chosen for many reasons. Vertica has many system tables that can capture the state of the machine when queries or loads are running.  These metrics were captured so comparisons for each query and load can be done.  Although not tested, a Vertica cluster can very easily be setup to better understand how encryption may also be implemented in more complex multi node environments.  Vertica also has the ability to implement its own key value pair store called “Routable queries” so that kind of workload can be tested as well.

Workloads

When testing the impact encryption has on databases, it is important to test with various workloads since each will have different performance profiles. The tests performed for this blog were meant to cover some of the most common workloads such as:

VMARTQUERYTXN – These are typical transactional type queries that return only one or two rows and no database functions.  Typical response time is in milliseconds.  A total of 9 queries that contained fact tables with joins to dimensions.

VMARTQUERY – These are typical analytical queries that have analytical functions like sum, max, subselects etc.  Typical response time is in seconds.  A total of 9 queries that all contained fact tables with joins to dimensions.  These are the OOB queries provided by the Vertica samples.

VMARTLOAD – These are typical batch load scripts to load the OOB VMart tables.  Load time is directly related to number of rows ingested.  15 tables were loaded using the standard COPY statement. No inserts command were issued only copy commands.

All of the above tables and sql statements are based on the out of the box standard Vertica examples called VMart.

Note: The intent of this blog is NOT to set performance records but to give customers an idea on what kind of percentage difference can be expect given different work loads and to better understand how a database can be encrypted without impacting the users or applications using the database.

 

How “Transparent” (Data at Rest) Encryption Works

The most successful implementations of encryption incorporate three main principles, confidentiality, integrity and availability.  For instance, there are many opensource encryption libraries that provide confidentiality and integrity but when making the encrypted data available in clear text at the right time to the right person or process on the fly is very difficult to do.

Vormetric Transparent encryption does NOT require any code to implement which helps reduce the complexity and risk and is a major factor to achieving time to value.   An example of how transparent encryption works might be the best way to see the value.   It is estimated that 58 % of internal beaches are done by persons with privileged user access.   Most hackers attempt to obtain root access to the system, which is why it is important to block and audit those situations when they are attempting to access sensitive data.  An organization typically does not need the root administrator to actually see clear text data to do his job.  Here is an example showing how the root user on the Vertica VM image is blocked to viewing clear text of the source data to be loaded into Vertica.

[root@vertica801 loadfiles]# head Call_Center_Dimension.tbl

¦0b¦pP5¦N¦¦     ¦<¦rx?V¦¦jz¦j¦¦¦z¦De¦?$¦¦V¦%}¦l?¦C¦¦¦;¦b¦@5¦R?

JA¦¦¦¦;¦¦¦m¦¦¦¦¦¦¦?¦69>¦%?¦cw¦¦L¦¦¦CE:d#V¦¦¦2s¦¦}0¦¦¦¦H¦>¦)`1¦   /¦¦”c:?¦¦KpL+¦¦j@¦:¦T^¦?¦Q¦¦O¦e¦]\F^C¦Z?¦F¦”¦¦K¦?¦”>q¦¦6¦W¦¦¦781w¦¦¦~@L¦¦

¦¦¦c    ¦X¦IF¦¦?¦¦*¦!Xb6>¦0¦q]?d^¦¦u6″6¦b)¦¦d¦¦¦P]FO¦z*¦-¦>¦¦¦,,¦¦4¦]~qd@3y?¦¦SQ

¦:¦,¦¦jJøE¦?C¦54%¦¦(¦L¦¦¦¦J¦¦¦¦

What is needed with the file above is to allow the Vertica dbadmin user to run a job and have access to the file above in clear text to be loaded into Vertica. It is important to log any unauthorized attempt to sensitive data to potentially identify a security breach.   Here is an example of what an audit log of the above attempt to view the file above.

CGP2601I: [SecFS, 0] PID[110055] [AUDIT] Policy[test-linux-operational-vertica] User[root,uid=0,gid=0\root\] Process[/usr/bin/head] Action[read_file] Res[/data/loadfiles/Call_Center_Dimension.tbl] Key[testkey-AES256-2017] Effect[PERMIT Code (1A,2M)]

The following screenshot is an example of how to set up a policy in Vormetric allowing user2 to read the data but NOT see clear text.  When user2 attempts to access the data he will see cipher text as shown above.

 

Most security products allow for these kinds of messages to be sent to a (SIEM) reporting tool like Splunk so security administrators can take appropriate action on these kinds of activities.   Another great benefit to a security product like Vormetric is to block the root user to having access to switch users and assume their permissions. Vormetric can detect this kind of activity and log this activity as a “fake” user activity and deny them the ability to see clear text.   As you can see that is why this kind of encryption is called transparent since implementing it is “Transparent” to the people and processes that need access to the data.

Summary of Results

Listed below is a summary of the tests performed.  These rates should only be used as a guideline on what to expect based on various workloads.  Obviously, there are many factors that impact performance.  This blog attempts to provide as much detail as possible so expectations on the overall impact of encryption on your environment is understood.    Your workloads may or may not have the same characteristics as what was tested but these results should provide a good starting point when running your own tests.

Use %
Case Work load Type VMARTLOAD (Batch Load) Avg. Seconds Difference
  Base Line 451.4  
2 Encrtypted database files(DB) 483.2 7.04%
1 Encrypted source files 485.8 7.62%
3 Encrypted both DB and source files 505.8 12.05%
   
  %
  Work load Type VMARTQUERY (Analytical Query) Difference
  Base Line 16.8  
2 Encrtypted database files(DB) 18.6 10.71%
   
  %
  Work load Type VMARTQUERYTXN (Transaction) Difference
  Base Line 4.2  
2 Encrtypted database files(DB) 4.4 4.76%
   
  %
   Work load Type Vertica Backup Difference
  Base Line 109  
4 Encrtypted backup files 134 22.94%

 

Multiple runs were executed and the numbers above represent the average times after removing highs and lows.   As you can see from the results the impact of encryption for half of the use cases in in the single digits.   The appendix of this document shows some metrics on how I/O bound my VM image was which was the obvious bottleneck for my tests.  Most likely, your environment will have a better-tuned system and as a result have better performance.

Having some performance degradation should be acceptable to most organizations considering the benefits encryption provides.   I hope this blog demonstrated how using transparent encryption is not disruptive to implement and gives you a better understanding on how it performs under different workloads enabling you to protect your sensitive data achieving value quickly without complexity.  I also hope organizations will adjust their cost benefit models to look at encryption more seriously and to make sure the right type of encryption is implemented.  Doing this will ensure sensitive data is protected from thieves and protect us from the possibility of identity theft when this data gets into the wrong hands.

For more details on the how the tests were done and the above results, please see following link.

A drill down dashboard with all associated metrics on the above results can also be viewed at this link.


About the author / 

Mark Warner

Mark Warner previously worked for Vertica for nearly 4 years with the Partner Engineering team.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.

Notice

This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: