Optimization, Performance Tuning

Overview of Pre-join Projections

3

This post will give an overview of Vertica’s pre-join projections. Pre-join projections are not frequently used due to the manual set up work and limitations around its usage. However, they can deliver increased performance at query execution by moving the join step into the load.

Vertica’s projections are materialized views that provide physical storage for data. A pre-join projection is more query specific and takes the optimization step out of the actual query execution. It performs additional work at load to give increased performance when queries using joins are executed. The result of the join between two or more tables is sorted in the pre-join projection to provide additional performance to the query.

Pre-join projections have the following limitations:

  • A maximum of one pre-join projection can be used per query; any other pre-join projections are treated as regular projections
  • The query must contain a superset of the join predicates used in the pre-join projection
  • The anchor and fact tables must have primary and foreign keys explicitly defined
  • Due to the primary and foreign key constraints, the query can only be an INNER JOIN
  • Data must be loaded into the anchor table before it can be loaded into the fact table
  • The anchor data cannot be partitioned in order to prevent the dropping of the fact data; which would lead to a huge delete operation

The definition of primary and foreign keys must explicitly contain a NOT NULL in the table’s structure. The uniqueness is enforced with pre-join projections during load. Additionally, changes to the table structure could cause data expansion and impact query performance. A poorly written pre-join could produce incorrect results.

A pre-join is not recommended for tables that experience high deletes and updates. Since the pre-join projection uses an INNER JOIN, the delete must be cascaded between the anchor and fact tables to delete associated data.

The most important factor is being able to complete the load before another attempts to start. When loading to WOS (not directly to disk), if the moveout does not complete in the allotted time, it will be queued and could cause the moveout to time out. However, the moveout interval can be increased and the WOS container size can be shrunk.

Aside from these considerations, the advantage is that there is no need for the optimizer to join the tables at query execution. This saves on IO as there is only a single set of files to read from, and no network or local processing of the join taking place. Also, if the columns used in a GROUP BY exist in the pre-join, a GROUP BY PIPE can be performed. This solution would be ideal for a frequent query that has infrequent loads.

The documentation below provides an example of how a pre-join projection schema is used.

If a pre-join is not a viable solution, the anchor and fact tables could instead be optimized for a merge join. Lastly, the denormalization could take place outside of Vertica (ie. in an ETL tool). If you create a pre-joined object before it comes into Vertica, you aren’t not constrained by the Vertica pre-join limitations.

Documentation

About the author / 

Norbert Krupa

Norbert is the founder of vertica.tips and a Solutions Engineer at Talend. He is an HP Accredited Solutions Expert for Vertica Big Data Solutions. He has written the Vertica Diagnostic Queries which aim to cover monitoring, diagnostics and performance tuning. The views, opinions, and thoughts expressed here do not represent those of the user's employer.

3 Comments

  1. Nirrav June 18, 2015 at 10:21 AM -  Reply

    Hi Norbert,

    This Row shouldn’t not be added and need to generate a missing foreign key error that results in a
    rollback because the reference is to a non-existent dimension row. but it is not happening and row is getting inserted.

    • Norbert Krupa July 12, 2015 at 1:48 PM -  Reply

      Hi,

      I received an error when I attempted an insert after creating the pre-join projection:

Leave a Reply

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.

Notice

This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: