Reviews

Review: HP Vertica Essentials

1

I recently responded to a request to review the first (I’ve seen) published book on Vertica. The book, HP Vertica Essentials (Amazon & Packt Publishing), by Rishabh Agrawal intends to cover deployment, administration and management of Vertica. This review will attempt to break down the chapters and see how well the book covers those topics.

First Impression

I was quite surprised how light the book is in terms of content. Out of 106 pages, 30 are for opening/closing details (cover, copyright, credits, reviewers, table of contents, index). This leaves some 76 pages to cover the following topics:

  • Chapter 1 – Installing Vertica
  • Chapter 2 – Cluster Management
  • Chapter 3 – Monitoring Vertica
  • Chapter 4 – Backup and Restore
  • Chapter 5 – Performance Improvement
  • Chapter 6 – Bulk Loading

The target audience for this book are “Vertica users and DBAs who want to perform basic administration and fine tuning.” Although, prior knowledge of Vertica is not mandatory (in my opinion, a user will most likely be lost in this book).

Chapter 1 – Installing Vertica

The author begins outlining the differences of Vertica from other MPP databases and mentions that data is stored in a columnar fashion, but misses encoding on top of compression, as well as other critical features of Vertica such as its high availability. I feel the author should have also included the features that come with each version (Community vs. Enterprise) of Vertica when mentioning the Management Console. It would have been helpful to mention that the logical design that’s typically performed at the database level can be taken to the schema level.

The pre-installation requirements attempt to be covered in 4 short paragraphs. There are many other critical steps to pre-installation such as OS level configuration and hardware planning that should have been touched on. The author suggests to keep 20-30 percent of disk space free on each node, however, the official recommendation is 40%. There is also no mention that Vertica can be run on AWS or that it can be run locally using a VM image from the Marketplace.

The rest of the chapter steps through the software installation process, and mentions it aims at covering a two-node cluster installation. I can’t really come up with any good reason to demonstrate a two node installation, as the most common installation has three nodes. The output from the installation script seems completely unnecessary.

Chapter 2 – Cluster Management

Most of the material in this chapter seems to imply that projections are strictly segmented, where they can obviously also be replicated (especially in the case of smaller dimension tables in a star schema). This isn’t hinted until Chapter 5. However, the author does a descent job of explaining how skew plays a factor in segmentation.

There appears to be confusion between adding hosts to a cluster and adding nodes to a database. The distinction should be more explicit and each process individually called out as it is with removing the node from the database and removing the host from the cluster.

There is also no mention that a K-safety higher than 2 isn’t really recommended or that a minimum K-safety of 1 is required for production clusters.

The rest of the chapter does a fairly descent job describing node/host management. However, the section on spread isn’t really applicable after version 6.1 since spread is integrated into the OS.

Chapter 3 – Monitoring Vertica

This chapter fell short on a critical part of the workload management of Vertica. At the very least, there should have been a mention of resource pools, query requests, background processes, monitoring for potential problems, and on the data collector and its role in aggregating system data.

Chapter 4 – Backup and Restore

The author provides a good basic overview of the backup and restore process. Mentioning the miscellaneous settings, or parameters for vbr.py seems unnecessary and a reference to the documentation would have sufficed. When mentioning the copycluster, the author could have added more detail about how a dormant node can be used in a production environment for failover. I feel that this chapter could have highlighted more of Vertica’s high availability strategy, as some customers don’t even use backup.

Chapter 5 – Performance Improvement

The author does a poor job of comparing Vertica’s columnar architecture to traditional a row-store. While it is true that Vertica can only use columns involved with the query, this is also true in a traditional row-store under certain conditions with proper indexes. I feel a better approach would be showing an example of how the physical data is stored in each architecture.

The first section also incorrectly states that a superprojection gets created when the table is created (occurs at first data load). The remainder of the section does a reasonable job of introducing the concept. With regards to high availability and recovery, I feel it’s important to mention how Vertica uses checkpoints epochs with projections to recover data.

The material on the Database Designer seemed to completely miss the performance design priority (Balanced/Query/Load).

The remainder of the chapter briefly discusses the concept of ROS/WOS and Tuple Mover operations.

Chapter 6 – Bulk Loading

The COPY command is covered in extreme brevity. There should have been some details about monitoring loads.

Closing Thoughts

I feel that the book falls short on the discussed topics. Critical concepts such as the architecture, resource management, and monitoring/troubleshooting are not adequately covered. I couldn’t find anything that isn’t more thoroughly covered in the official documentation. There is too much space used on script outputs and screenshots.

It also seemed that the book tried to be version agnostic, however, there are many features such as the installation script, database designer and management console that have been dramatically improved and overhauled. The author should have explicitly mentioned this book focuses on version 6.1. The book comes late into the game as Version 7.0 was released late last year.

The amount of material in understanding the essentials of the platform would probably require at least three books (with 500+ pages). A proper anthology on the platform would probably look like:

  • Fundamentals
  • Administration
  • Querying
  • Development
  • Internals
  • Performance Tuning

I rate the book 2 out of 5 stars.

About the author / 

Norbert Krupa

Norbert is the founder of vertica.tips and a Solutions Engineer at Talend. He is an HP Accredited Solutions Expert for Vertica Big Data Solutions. He has written the Vertica Diagnostic Queries which aim to cover monitoring, diagnostics and performance tuning. The views, opinions, and thoughts expressed here do not represent those of the user’s employer.

1 Comment

  1. Jose September 22, 2015 at 1:37 PM -  Reply

    Dear,

    Ok, that true, Interesting, but I need more….
    At the beginning i bought the books, was my starting point…

    Then, Should be nice that you or HP could make a better books beside the knowed Documentations my.vertica……
    The users and Vertica system are waiting for that. And much more, In order to promote Vertica. A deeply development Vertica Book
    should be essential and effective. Where complete examples were put for example handling flex table, built UDx UDF UD… or API SDK
    extension of Data/Text Mining (today just sentiment analysis) or more yet a complete R UDXYZ UD… integration by examples.

    Really Vertica Is an interesting product. But I need more examples or more information.
    Could you give me some examples to handle flex table via R-UDF I will appretiate that.

    Thanks
    Cheers,
    Jose

Leave a Reply

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.

Notice

This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: