Administration, Best Practices, Development

Time Series with Last Value

0

This post will briefly cover working with time series stock data in Vertica. In my first experience, I use the TIME_SLICE function to aggregate and window the data as described in documentation. However, due to a lack of a window frame, I received incorrect results when paired with LAST_VALUE. After adding a window frame, the expected results were returned. This post will show how to alternatively return the desired results using the TIMESERIES clause and lastly compare the plans and performance of both.

Setup

To follow along with the demonstration, you can download this setup script.

Background

The examples in this post will be using a 3-second interval to demonstrate the time series functions.

To better understand time slices, it’s easier to see a graphic time representation of the sample data:

Time Slice

The graphic above shows that :00 is the beginning of the first 3-second time slice, and :03 is the end (:02.99999). This concept is further explained in the documentation.

Applying our prices, the following first and last values are the results we expect to see with the queries that follow:

Time Slice

Time Slice with First Value

Following the documentation, the following query produces the expected results based on the FIRST_VALUE of each time slice:

Result

A manual check against our graphic confirms that those prices are the correct first values based on 3 second time slices.

Time Slice with Last Value

Replacing FIRST_VALUE with LAST_VALUE, the following query produces duplicate results:

Result

Working with windows, the above query would require a specific frame (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) to return the expected results. The documentation mentions that if this frame is not specified, it will default to a frame of UNBOUNDED PRECEDING AND CURRENT ROW. Adding the specific frame in the following query returns the desired results:

Result

Timseries with Ts Last Value

While the following query replaces the time slice with the TIMESERIES clause, it semantically performs the same computation as the TIME_SLICE() function:

Result

While not applicable in this example, I feel it’s worth noting that TIMESERIES performs gap filling, where TIME_SLICE does not. Also, the largest restriction with this clause is that GROUP BY and HAVING are not allowed in the same query block.

Query Plan

The query plan for the TIMESERIES has a total cost of 27 and takes two path steps compared to a total cost of 30 and three path steps when using a TIME_SLICE. The extra step for the TIME_SLICE path is to perform a sort based on the time slice. The TIMESERIES plan shows that the time slice and last value functions and sort are performed under the analytical step.

Time Slice

Timeseries

Performance

Measuring performance on this small sample set would not be realistic. Thus, the following results are based off a profile for each last value of TIME_SLICE and TIMESERIES which covered 420,000 rows:

Metric TIME_SLICE TIMESERIES
bytes received 5854045 1565544 (▼ 115.6%)
memory allocated (bytes) 10333689696 9997033952 (▼ 162.5%)
clock time (us) 27071310 (▼ 101.8%) 83160962
execution time (us) 1466254 (▼ 123.3%) 6181952
producer wait (us) 2362281 1609808 (▼ 37.9%)
estimated rows produced 223087437 215945792 (▼ 3.3%)
rows to process 460067444 460078683
intermediate rows to process 309294 154647 (▼ 66.7%)
rows processed 460376738 460233330
rows produced 2049099 796459 (▼ 88.0%)
rows received 309294 155247 (▼ 66.3%)

Conclusion

I feel it’s important to understand working with time series data using an analytical database system such as Vertica. Further, both methods of obtaining last values in this example returned results fairly quickly. A larger test set would be needed to provide an accurate representation of the performance of both methods. However, overall TIMESERIES seems to be more efficient in processing the request while taking longer to execute. Further testing for a larger sample set may be performed in the future. I would advise that users perform their own profiling of each to determine the best method for their environment.

About the author / 

Norbert Krupa

Norbert is the founder of vertica.tips and a Solutions Engineer at Talend. He is an HP Accredited Solutions Expert for Vertica Big Data Solutions. He has written the Vertica Diagnostic Queries which aim to cover monitoring, diagnostics and performance tuning. The views, opinions, and thoughts expressed here do not represent those of the user's employer.

Leave a Reply

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.

Notice

This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: