Some environments may require data to be copied between clusters; i.e. for DR purposes, or application environments. This post will discuss the different methods available to copy data between clusters in Vertica, as well as external scripts.
Depending on the requirements for copying data (full copy, individual tables, diff, etc.), some methods mentioned below may fit better than others. Choosing a suitable method for a long term solution should be carefully approached. Some methods may have more flexibility, but have other restrictions. For example, using the
copycluster command with
vbr.py requires that the target database be stopped. The
EXPORT statements allow the use of SQL commands to perform the data movement tasks.
COPY FROM VERTICA and
EXPORT TO VERTICA methods can benefit from network compression if additional CPU load is acceptable. Enabling the network compression configuration parameter will compress data which is sent over the network. In one case, a 20TB copy had a 30% improvement with compression (mileage will vary).
The following table provides a general functionality of the methods described below:
|Method||EXPORT TO VERTICA||COPY FROM VERTICA||vsql||vbr.py||PyVertica|
|Exports Table Data||Yes||Yes||Yes||Yes||Yes|
|Exports Table Structure||No||No||No||Yes||Yes|
|Export Specific Columns||Yes||Yes||Yes||No||No|
|Allows SELECT Statement||Yes||No||Yes||No||No|
|Requires Public Network Config||Yes||Yes||No||No||No|
|Target Database Stopped||No||No||No||Yes||No|
|Choose Loading Mode||A||A, D, T||Yes||-||No|
Loading modes: A = AUTO, D = DIRECT, T = TRICKLE
Version agnostic: MRB = Major Release Behind, MV = May Vary
EXPORT TO VERTICA
EXPORT TO VERTICA method allows for the export of an entire table, specific columns from a table, or the result of a
SELECT statement to another database. This method uses the AUTO loading mode.
COPY FROM VERTICA
COPY FROM VERTICA works similarly to the EXPORT statement, however does not allow for the result of a
SELECT statement. This method does allow the loading mode (AUTO, DIRECT, TRICKLE) to be defined.
vsql can be used to import flat files or can be used by piping out an output into an input of another
vsql command which runs a
COPY statement from
vsql -U username -w passwd -h testdb01 -d vmart -At -c "SELECT * from store.store_sales_fact" \
| vsql -U username -w passwd -d vmart -c "COPY store.store_sales_fact FROM STDIN DELIMITER '|';"
This technique is limited to individual tables and copying only data, not the structure.
This backup utility comes packaged with Vertica and allows for copying of the entire database to another cluster. It has heavy restrictions including having the same number of nodes, same database name, and same node names in both clusters. The target database must also be stopped. However, being an external script gives it slight more flexibility and allows the backup & restore to occur in a single operation.
These Python based scripts from Spil Games include various modules including a high speed loader, batch importer, and migration manager. The migration module allows for data to be migrated between Vertica clusters. It features about a dozen arguments to customize the migration.
Although the last commit for the project was on May 28, 2014, it’s worth checking out. Read more about PyVertica in the their documentation.