News

Webinar Recap: Upcoming Dragline Features and Beyond

0

Vertica held a webinar on their upcoming Dragline release this passed Tuesday. You can watch the recorded session through Vertica’s Vimeo Channel . This webinar covered the main features of the next release as well as previewed the roadmap for future releases. It’s anticipated that Dragline will be coming in as soon as a few weeks.

While the focus of Dragline is the convergence of the Hadoop storage layer, there was some insight into the direction for upcoming releases. Vertica will be doing heavy work around providing functionality for creating value from human information such as text, audio and video as well as unstructured data (ie. machine data).

Live Aggregate Projections

A neat feature of this release is the live aggregate projections. As the name implies, it allows for projections to be created on aggregate functions. The benefit is faster query times when using these aggregates, with the inherit cost at load time. However, if the aggregate is used often enough, it could offset the cost required at query time to perform the aggregation. It was not mentioned what aggregate functions are included, other than supporting “common aggregate queries.” The use case mentioned for this feature was the monthly minute usage on a cell phone line. Once this release is public, it will be interesting to test the performance of this feature (the webinar claims this feature to be up to 10x faster). It was mentioned that these projections will need to be manually created.

Pluggable File Systems

An upcoming feature will be a pluggable file system for enterprise private clouds such as OpenStack Swift, Amazon S3 or Glacier. This sets the stage for functionality to tier off data with different file systems based on usage (hot to cool to cold) in this release. Within this tier structure, frequently used or “hot” data will be kept within Vertica’s file system providing the best performance. The next tier, “cool” data can stay in HDFS format. Lastly, “cold” data would be archived. In the Dragline release, data which has not been accessed will automatically be tiered off based on set policies. This approach lowers the cost of storing data.

MapR Hadoop Distribution

Dragline Webinar Vertica on MapR

In the MapR Hadoop Distribution, there is tighter integration as both systems share an NFS storage layer. In this integration, Vertica runs on the same nodes as MapR. This allows for data to be accessed as if it were local storage. Again, more cost savings by combining Vertica and Hadoop while providing direct access to Hadoop data.

Open Parser API

An Open Parser API will give functionality beyond FlexZone parsers. This allows for any unstructured data to be queried on Hadoop. External tables can also be used to access Hadoop data without being loaded in.

HCatalog Integration

Data in Hive tables can be queried directly. Native Vertica data can be joined to Hive data or any other unstructured data.

Text Analytics

This feature was summarized in a slide:

  • Provides a more natural way to analyze data for many applications
  • Handles free-form text in weblogs and other machine data
  • Understand opinions expressed in more unstructured data

Management Console Enhancements

Enhancements in the Management Console will provide for better insight into resource utilization. It will be possible to move running requests to different resource pools. Long running requests will be easily identifiable but more importantly allow resources to be freed with a few simple clicks. In addition, a resource pool activity window will show nodes or queries that are not correctly distributed and give the ability to drill down into a request.

Dynamic Workload Management

Requests with excessive running times can be automatically moved to a different resource pool.

Pulse

The “HP Vertica Pulse” solution provides sentiment analysis on social data. This extension takes a feed such as Twitter (will work on any short pieces of text) and allow entities to be quickly extracted.

Place

The “HP Vertica Place” solution provides geospatial analysis on geographical data. This will also come with “optimized special joins with memory-resident geospatial indexing that replaces expensive scans.”

Security Enhancements

More detail will come out at the Big Data Conference; however, SHA2 hashing will be introduced for password algorithms as well as SSL mutual authentication.

Looking Forward

The next minor version codenames are “Excavator” and “Frontloader.” Vertica “Excavator” will have a data in-memory feature, allow for cubes and expand on text analytics for unstructured data. Vertica “Frontloader” will allow for streaming data as well a unified SQL Navigation and joins-across HBase and Other NoSQL Tools. It was also mentioned in the Q&A that in “Excavator”, the Distributed R feature, will be licensed separately. These next versions appear to go through 2015.

About the author / 

Norbert Krupa

Norbert is the founder of vertica.tips and a Solutions Engineer at Talend. He is an HP Accredited Solutions Expert for Vertica Big Data Solutions. He has written the Vertica Diagnostic Queries which aim to cover monitoring, diagnostics and performance tuning. The views, opinions, and thoughts expressed here do not represent those of the user’s employer.

Leave a Reply

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.

Notice

This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: