Hadoop the Pet Rock


Did your CIO/CTO get what they wanted for Christmas? The Hadoop Pet Rock has been a popular gift for many organizations. They make the perfect pets; coming with their own instruction manual on how to care and train. But sometimes, when asked to “sit” or “roll over,” it doesn’t play along.

Hadoop Plush Toy

What makes up the perfect Hadoop pet? One that is manageable, useful, meets business needs and doesn’t pee everywhere. When the hype of Big Data peaked in 2014, it was the cool pet to buy. You didn’t want to be a Fortune 500 company walking outside without it. Many organizations drank the Kool-Aid and are at a point where they are struggling to extract useful and meaningful data from their pets. Most of the time it’s because the pet was designed to really do one thing; and that’s to sit (or store data).

To teach it new tricks, organizations introduced different methods for getting data in and out. Some of these methods were Sqoop, Flume, or Hive. This doesn’t include management or security components. By the time Hadoop was useful, organizations became pet circuses and were trying to figure out how to coordinate each pet’s tricks; especially with a new pet (project) being released each week.

Until Spark came along, processing data was slow and inefficient. Very few organizations actually managed their environments correctly and were getting their promised value. The largest benefit to having Hadoop was it could store unstructured or semi-structured data. But if the organizations only had structured data, Hadoop was really not useful beyond storing the data. Their problem still remained; they couldn’t get data out to meet business needs.

When Vertica began to embrace Hadoop, it did so knowing Hadoop could not compete with its performance or scalability. There were very few organizations who understood that Hadoop can’t be trained to “shake hands” or “attack.” Bundled with connectors to push and pull data to Hadoop, the value in Vertica was immediately realized and getting data in and out was simply fast and easy. Semi-structured data is also not off limits with Vertica’s Flex library. And even further, Vertica’s sister (IDOL) can handle any type of unstructured data. Together, any type of data can be quickly acquired, stored, and analyzed.

The beauty of Vertica is that it comes out of the box with a plethora of functionality: Social Media, Geo-spatial, Event-Based Windows, Time Series, Event Series Joins, and Pattern Matching to name a few (all using familiar SQL). IDOL can handle any type of file format, from e-mails to audio and even video scene analysis. Further, IDOL is not a black box and can be custom tuned for different types of applications.

At the end of the day, Hadoop will have its place. But if an organizations wants to have their cake and eat it too, they’ll need more than just a Hadoop pet. That’s what Vertica and IDOL were designed to do. Together, they empower organizations to tap their darkest data and deliver value.

About the author / 

Norbert Krupa

Norbert is the founder of vertica.tips and a Solutions Engineer at Talend. He is an HP Accredited Solutions Expert for Vertica Big Data Solutions. He has written the Vertica Diagnostic Queries which aim to cover monitoring, diagnostics and performance tuning. The views, opinions, and thoughts expressed here do not represent those of the user's employer.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.


This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: