Development

Introduction to Flex with JSON

Off

Introduced in Vertica 7.0 as Flex Zone, the ability to load, parse and explore semi-structured and un-structured data became a possibility. The first release of Flex in 7.0 included 3 parsers (fdelimitedparser, fjsonparser, and fcefparser). Vertica 7.1 added fdelimitedpairparser and fregexparser parsers. This introduction will focus on creating Flex tables and loading JSON data. Future posts will expand on how to explore and manage this JSON data.

The JSON schema allows for a semi-structure about the data it contains. There are many API’s which utilize JSON to return requested data, and Vertica makes working with it very easy. Each Vertica edition comes with 1 TB of Flex license and is licensed per raw TB after.

The sample JSON used in this series comes from Adobe Spry Samples (example 5). This sample contains 3 objects with nested arrays varying in size and will be loaded in from a file named sample.json. You can also download it from here.

The sample table will be named sample_json and will be created in the public schema. By default, Vertica does not expect a table structure defined. Once data is loaded into the Flex table, the keys are built out and Vertica will guess at the data type for the column. This information can then be used to promote (materialize) columns or define the table structure.

See Understanding Flex Tables and Creating Flex Tables for more information.

Using Vertica’s COPY statement, the JSON will be loaded using the fjsonparser parser with default parameters.

In the following section, the different flatten parameters are explained.

JSON Parser Options

There are two parameters available for flattening maps and the sub-maps or arrays in the JSON data. These are:

flatten_maps BOOLEAN
[Optional] Default true.

Flattens sub-maps within the JSON data, separating map levels with a period (.).

flatten_arrays BOOLEAN
[Optional] Default false.

Converts lists to sub-maps with integer keys. Lists are not flattened by default.

See FJSONPARSER in the documentation for more information.

The different configurations of these options and how the data is transformed are illustrated below.

flatten_maps = true, flatten_arrays = true

Enabling both flatten attributes will completely flatten the maps separating all levels with a period. If the document contains many deeply nested arrays, Vertica may not be able to build out a view (1600+ columns). However, all attributes could be directly queried without using Flex map functions (i.e. SELECT "batters.batter.3.type" FROM sample_json WHERE "id" = '0001';).

flatten_maps = true, flatten_arrays = false

This is the default configuration which will flatten only the maps separating the levels with a period.

flatten_maps = false, flatten_arrays = true

flatten_maps = false, flatten_arrays = false

This configuration will resemble the original JSON structure.

In the next post, exploring and managing JSON data with Flex will be covered.

About the author / 

Norbert Krupa

Norbert is the founder of vertica.tips and a Solutions Engineer at Talend. He is an HP Accredited Solutions Expert for Vertica Big Data Solutions. He has written the Vertica Diagnostic Queries which aim to cover monitoring, diagnostics and performance tuning. The views, opinions, and thoughts expressed here do not represent those of the user's employer.

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.

Notice

This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: