When using a
COPY bulk load in Vertica, it’s important to properly identify the load using the optional
STREAM NAME parameter. This parameter makes it possible to easily identify loads when monitoring system load streams since they can generally be resource intensive.
For basic query monitoring, I reference the
query_requests system table under the
V_MONITOR schema. When I was examining the columns, I stumbled upon
request_label. This seemed like it could be a really useful resource when it comes to identifying queries and monitoring system resources.
The way I read the documentation is if this parameter is not set, Vertica will try to name the stream (I have yet to see):
By default, HP Vertica names streams by table and file name. For example, if you have two files (f1, f2) in Table A, their stream names are A-f1, A-f2, respectively.
In order to manually set the
STREAM NAME parameter, it just needs to be supplied with the
COPY to_table FROM LOCAL STDIN DIRECT STREAM NAME 'Data load ABC';
The stream name will appear in the
query_requests system tables under
request_label columns respectively. While the documentation suggests that
load_streams maintains historical load metrics, I haven’t found this to be the case. I only seem to find the active load metrics, and per the documentation, there’s no way to increase the retention parameters on this system table.