Administration

First Look at Live Aggregate Projections

0

One of the newest features of Vertica 7.1 is Live Aggregate Projections. These projections are designed to improve query performance for frequently used queries containing aggregates, as the physical data will already be aggregated. This reduces the necessary operations required at query time. However, this benefit has a cost at load time as aggregation has to occur. This post will explore considerations, configuration, and performance of these projections.

Introduction

Live Aggregate Projections (LAPs) come in two flavors: aggregate and Top-K. Aggregate projections imply that the query contains one of four supported aggregates: SUM, MAX, MIN, or COUNT. A Top-K LAP retrieves the top k rows from a window partition. As mentioned above, the cost associated with performing these operations is moved from query time to load time. It’s important to consider the cost of an additional projection and impact of load speed vs. cost of running an aggregate or Top-K statement at query time.

Both these LAPs require an anchor projection (segmented) to derive from. Vertica recommends that LAPs get created on large tables which are frequently used for querying aggregate queries. Once a LAP is created, data cannot be modified UPDATE, DELETE, MERGE) on the anchor table. This creates a situation which would require a LAP be dropped and recreated if anything other than an INSERT occurs. Further, the table definition can not be altered.

Other restrictions include not being able to use joins, subqueries, or DISTINCT aggregates. The complete list of considerations is available in the documentation.

Configuration

This example will use the online_sales.online_sales_fact table in the VMart example database and evaluate a basic LAP. The query used in the tests below is as follows:

Since the order of the GROUP BY columns must match the same order of the segmentation clause, a suitable anchor table must first be created:

Next, the LAP can be created:

For the tests below, statistics will be updated after each partition is created. Output is directed to /dev/null, and timing is enabled. Only the All rows formatted time is recorded.

Results

Default Superprojection
TestTime (ms)
1191.202
2156.055
3151.890
4150.119
5152.056
Average160.264

Explain Plan

Live Aggregate Projection
TestTime (ms)
177.382
270.829
380.585
471.553
569.943
Average74.058

Explain Plan

DBD Suggested Projection

The Database Designer generated the following projection based on the query above:

TestTime (ms)
145.184
231.947
333.961
445.820
533.906
Average38.164

Explain Plan

Conclusion

The results from the tests above may not accurately reflect the performance of LAP against a production data set. However, the query-specific projection suggested by the DBD returned the request 64% faster than a LAP:

Looking at the explain plans between the DBD projection and the LAP, it would seem that as long as the query can be optimized to perform a GROUP BY PIPELINED, it would make more sense to use a query-specific projection given the restrictions around using LAPs. The manual task of creating of a LAP requires much more effort, and until the DBD can suggest LAPs, it doesn’t seem that this feature will get much appreciation.

Update

The entire point of LAP was missed in the test above, as the query was only using the anchor projection. In order to query data directly from the LAP, the LAP name is used in place of the table name:

This produces the following times:

TestTime
122.467
223.422
322.945
422.215
522.238
Average22.657

Explain Plan

The overall times are as follows:

There’s an increase of 50% over the DBD suggested projection, however it is marginal. The implementation of a LAP seems much more confusing than just adding a couple projections. This explicit step for querying the LAP seems unnecessary, and I would hope Vertica makes this more transparent in future releases.

About the author / 

Norbert Krupa

Norbert is the founder of vertica.tips and a Solutions Engineer at Talend. He is an HP Accredited Solutions Expert for Vertica Big Data Solutions. He has written the Vertica Diagnostic Queries which aim to cover monitoring, diagnostics and performance tuning. The views, opinions, and thoughts expressed here do not represent those of the user's employer.

Leave a Reply

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.

Notice

This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: