Database Design, Optimization, Performance Tuning

Overview of Join Operations

4

There are two operator types, merge and hash; each which can be performed on the local node (a local join) or across two or more nodes (a network join). This post will discuss the two operator types and the differences between a local and network join. The mechanics of each operator may be discussed in future posts.

Merge vs. Hash Joins

The most common type of join, the hash join, may be just as efficient as a merge join if it can build out the hash table in memory. Otherwise, optimization for a merge join should be performed.

The issue with a hash join is that if it spills to disk, you will have wasted valuable resources to process the query. Initially, Vertica will attempt to build out the inner table in memory. If the memory fills up, by either hitting the memory limit of the resource pool or filling up all the memory, the query will fail.

At this point, the query will be reprocessed with a hint indicating that the hash table for the join needs to be built out on disk. When a spill to disk occurs, the hash table is being built out on disk incurring more IO cost than necessary. Most importantly, the processing of the query has been slowed down.

In a merge join, both projections participating in the join are sorted or the subqueries for the join are sorted. This allows matched values to be streamed through memory rather than built out as a table. This type of join is optimal for dealing with large data sets as you do not want a spill to disk to occur.

The merge join will always be more efficient than a hash join, but not necessarily faster. If the hash join is working with a small set of data, it’s possible that it can process it faster. However, the hash join will always use more memory.

Local vs. Network Joins

When optimizing join’s, you want to minimize network activity in order to process the query. If the join being performed is with two segmented projections, a network join will take place. If the projection of the dimension table is replication, the query can be processed without involving other nodes and a local join will take place. If there are two identically segmented projections and segmentation is the same on each node, the join can happen locally.

The following illustration shows a NETWORK JOIN (not identically segmented) because the matching data is not on the same node.

3 node cluster with 2 unsegmented projections

The following illustration shows a LOCAL JOIN (assuming dimension table is replicated) because the matching data resides on each node.

3 node cluster with 1 unsegmented projection

Documentation

About the author / 

Norbert Krupa

Norbert is the founder of vertica.tips and a Solutions Engineer at Talend. He is an HP Accredited Solutions Expert for Vertica Big Data Solutions. He has written the Vertica Diagnostic Queries which aim to cover monitoring, diagnostics and performance tuning. The views, opinions, and thoughts expressed here do not represent those of the user's employer.

4 Comments

  1. Matthew Cornell March 24, 2014 at 7:49 AM -  Reply

    Thanks for the article. As a newbie to Vertica there is so much to learn. You must have studied it for a while. For me, just getting a basic understanding of what is going on under the hood is challenging. EXPLAIN and PROFILE are where I’m at now, but it’s tough for me.

    • Norbert Krupa March 29, 2014 at 12:41 PM -  Reply

      Matthew–thanks for reading. I’ve been using Vertica for about 6 months, but I’ve been fortunate to attend training which broadened my knowledge on the internals. I’m just passing along my knowledge but I have a long ways to go.

  2. Kushal Sen Laskar April 13, 2015 at 6:32 AM -  Reply

    Norbert- Thanks for sharing your experience here. i would try to go throuigh each one of your articles. Apart from the Web based Training videos do you have any good video materials provide by HP Vertica team? Do you attend the Web based training or Classroom led training for Vertica? TIA

    • Norbert Krupa April 13, 2015 at 7:07 AM -  Reply

      Kushal,

      Take a look at the Administration page. It contains links to the official Vertica Web Based Training as well as their Vimeo channel.

      Hope that helps,
      Norbert

Leave a Reply

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.

Notice

This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: