Vertica held a webinar on their upcoming Dragline release this passed Tuesday. You can watch the recorded session through Vertica’s Vimeo Channel . This webinar covered the main features of the next release as well as previewed the roadmap for future releases. It’s anticipated that Dragline will be coming in as soon as a few weeks.
While the focus of Dragline is the convergence of the Hadoop storage layer, there was some insight into the direction for upcoming releases. Vertica will be doing heavy work around providing functionality for creating value from human information such as text, audio and video as well as unstructured data (ie. machine data).
Live Aggregate Projections
A neat feature of this release is the live aggregate projections. As the name implies, it allows for projections to be created on aggregate functions. The benefit is faster query times when using these aggregates, with the inherit cost at load time. However, if the aggregate is used often enough, it could offset the cost required at query time to perform the aggregation. It was not mentioned what aggregate functions are included, other than supporting “common aggregate queries.” The use case mentioned for this feature was the monthly minute usage on a cell phone line. Once this release is public, it will be interesting to test the performance of this feature (the webinar claims this feature to be up to 10x faster). It was mentioned that these projections will need to be manually created.
Pluggable File Systems
An upcoming feature will be a pluggable file system for enterprise private clouds such as OpenStack Swift, Amazon S3 or Glacier. This sets the stage for functionality to tier off data with different file systems based on usage (hot to cool to cold) in this release. Within this tier structure, frequently used or “hot” data will be kept within Vertica’s file system providing the best performance. The next tier, “cool” data can stay in HDFS format. Lastly, “cold” data would be archived. In the Dragline release, data which has not been accessed will automatically be tiered off based on set policies. This approach lowers the cost of storing data.
MapR Hadoop Distribution
In the MapR Hadoop Distribution, there is tighter integration as both systems share an NFS storage layer. In this integration, Vertica runs on the same nodes as MapR. This allows for data to be accessed as if it were local storage. Again, more cost savings by combining Vertica and Hadoop while providing direct access to Hadoop data.
Open Parser API
An Open Parser API will give functionality beyond FlexZone parsers. This allows for any unstructured data to be queried on Hadoop. External tables can also be used to access Hadoop data without being loaded in.
Data in Hive tables can be queried directly. Native Vertica data can be joined to Hive data or any other unstructured data.
This feature was summarized in a slide:
- Provides a more natural way to analyze data for many applications
- Handles free-form text in weblogs and other machine data
- Understand opinions expressed in more unstructured data
Management Console Enhancements
Enhancements in the Management Console will provide for better insight into resource utilization. It will be possible to move running requests to different resource pools. Long running requests will be easily identifiable but more importantly allow resources to be freed with a few simple clicks. In addition, a resource pool activity window will show nodes or queries that are not correctly distributed and give the ability to drill down into a request.
Dynamic Workload Management
Requests with excessive running times can be automatically moved to a different resource pool.
The “HP Vertica Pulse” solution provides sentiment analysis on social data. This extension takes a feed such as Twitter (will work on any short pieces of text) and allow entities to be quickly extracted.
The “HP Vertica Place” solution provides geospatial analysis on geographical data. This will also come with “optimized special joins with memory-resident geospatial indexing that replaces expensive scans.”
More detail will come out at the Big Data Conference; however, SHA2 hashing will be introduced for password algorithms as well as SSL mutual authentication.
The next minor version codenames are “Excavator” and “Frontloader.” Vertica “Excavator” will have a data in-memory feature, allow for cubes and expand on text analytics for unstructured data. Vertica “Frontloader” will allow for streaming data as well a unified SQL Navigation and joins-across HBase and Other NoSQL Tools. It was also mentioned in the Q&A that in “Excavator”, the Distributed R feature, will be licensed separately. These next versions appear to go through 2015.