Best Practices

Performing Update With Multiple Projections

2

Being a column-oriented massive parallel processing solution, Vertica can satisfy the most demanding analytical queries in amazing time. Having this type of optimization means there will be penalties for other operations, such as update. This post will demonstrate what happens to projections during an update and why updates should be properly managed.

In this demonstration, a simple update will take place against a table with a super-projection and a user-defined projection.

In the update below, column b will be used. Note that the user-defined projection excludes column b.

In theory, this update should not modify the user-defined projection. However, the delete vectors show that in fact, both projections were involved in the update:

In this test, having a segmented or unsegmented user-defined projection had the same result. Performing an update requires the affected rows (in all projections on the table) to be deleted and inserted again. Given a large update, this creates a lot of I/O overhead if multiple projections are involved. It is crucial to properly manage updates (or deletes) by limiting the number of affected rows, proper projection design, and purge policies.

About the author / 

Norbert Krupa

Norbert is the founder of vertica.tips and a Solutions Engineer at Talend. He is an HP Accredited Solutions Expert for Vertica Big Data Solutions. He has written the Vertica Diagnostic Queries which aim to cover monitoring, diagnostics and performance tuning. The views, opinions, and thoughts expressed here do not represent those of the user's employer.

2 Comments

  1. Sharon Cutter January 5, 2015 at 12:48 PM -  Reply

    I suggest clarifying your initial statement in the first paragraph that updates “should be avoided”. You state later on that large updates create a lot of overhead. That’s a more accurate statement. I would suggest that deletes and updates need not be avoided but “should be managed”, limiting the number of rows that are DELETEd or UPDATEd, ensuring proper projection designs for optimized DELETE and UPDATE, and also implementing an appropriate purge policy.

    The examples in this post have no RLE on any of the columns. Use of RLE is key to optimized DELETE and UPDATE performance, just as it is for regular queries. I realize that the Vertica documentation also uses projections without RLE in their examples, but let’s not perpetuate that 🙂

    –Sharon

    • Norbert Krupa January 6, 2015 at 1:40 PM -  Reply

      I appreciate your feedback, Sharon. I’ve made some changes based on your input. Great point about the use of RLE!

Leave a Reply

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.

Notice

This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: