Best Practices

Copying Data Between Clusters

4

Some environments may require data to be copied between clusters; i.e. for DR purposes, or application environments. This post will discuss the different methods available to copy data between clusters in Vertica, as well as external scripts.

Depending on the requirements for copying data (full copy, individual tables, diff, etc.), some methods mentioned below may fit better than others. Choosing a suitable method for a long term solution should be carefully approached. Some methods may have more flexibility, but have other restrictions. For example, using the copycluster command with vbr.py requires that the target database be stopped. The COPY and EXPORT statements allow the use of SQL commands to perform the data movement tasks.

Additionally, the COPY FROM VERTICA and EXPORT TO VERTICA methods can benefit from network compression if additional CPU load is acceptable. Enabling the network compression configuration parameter will compress data which is sent over the network. In one case, a 20TB copy had a 30% improvement with compression (mileage will vary).

The following table provides a general functionality of the methods described below:

MethodEXPORT TO VERTICACOPY FROM VERTICAvsqlvbr.pyPyVertica
Exports Table DataYesYesYesYesYes
Exports Table StructureNoNoNoYesYes
Export Specific ColumnsYesYesYesNoNo
Allows SELECT StatementYesNoYesNoNo
Requires CONNECTYesYesNoNoNo
Requires Public Network ConfigYesYesNoNoNo
Target Database StoppedNoNoNoYesNo
Choose Loading ModeAA, D, TYes-No
Version AgnosticMRBMRBYesMVYes

Loading modes: A = AUTO, D = DIRECT, T = TRICKLE
Version agnostic: MRB = Major Release Behind, MV = May Vary

Methods

EXPORT TO VERTICA

The EXPORT TO VERTICA method allows for the export of an entire table, specific columns from a table, or the result of a SELECT statement to another database. This method uses the AUTO loading mode.

Documentation

COPY FROM VERTICA

The COPY FROM VERTICA works similarly to the EXPORT statement, however does not allow for the result of a SELECT statement. This method does allow the loading mode (AUTO, DIRECT, TRICKLE) to be defined.

Documentation

vsql

vsql can be used to import flat files or can be used by piping out an output into an input of another vsql command which runs a COPY statement from STDIN:

This technique is limited to individual tables and copying only data, not the structure.

Documentation

vbr.py

This backup utility comes packaged with Vertica and allows for copying of the entire database to another cluster. It has heavy restrictions including having the same number of nodes, same database name, and same node names in both clusters. The target database must also be stopped. However, being an external script gives it slight more flexibility and allows the backup & restore to occur in a single operation.

Documentation

PyVertica

These Python based scripts from Spil Games include various modules including a high speed loader, batch importer, and migration manager. The migration module allows for data to be migrated between Vertica clusters. It features about a dozen arguments to customize the migration.

Although the last commit for the project was on May 28, 2014, it’s worth checking out. Read more about PyVertica in the their documentation.

About the author / 

Norbert Krupa

Norbert is the founder of vertica.tips and a Solutions Engineer at Talend. He is an HP Accredited Solutions Expert for Vertica Big Data Solutions. He has written the Vertica Diagnostic Queries which aim to cover monitoring, diagnostics and performance tuning. The views, opinions, and thoughts expressed here do not represent those of the user’s employer.

4 Comments

  1. Lili December 29, 2015 at 1:03 AM -  Reply

    Hi Norbert,

    Thank you very much for this terrific blog to summarize the ways of moving data between clusters.

    I am writing to see whether I understand the row “Allows SELECT Statement” in the table correctly. Does it mean whether the query “SELECT” is allowed during the data migration process using that specific method? If so, is the query “SELECT” not allowed on both old and new database server host?

    Thank you in advance.

    Regards,
    Lili

    • Norbert Krupa December 29, 2015 at 10:44 AM -  Reply

      Lili, the way you understood it is correct. However, logically you wouldn’t perform a SELECT on a target system, as you would just write the statement on the source system which sends the result to the target (unless I misunderstood).

  2. Rikus May 13, 2016 at 7:28 AM -  Reply

    Hi Norbert,

    It looks like COPY FROM VERTICA does not work for Vertica Place data types e.g. GEOMETRY. I get the following error message: “Expression for user-defined type column geom cannot be coerced”.

    Do you have any recommendations for copying Place data between clusters?

    Regards,

    Rikus

Leave a Reply

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.

Notice

This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: