Development

Using HP Vertica Place to Map Longitude & Latitude to a Zip Code

1

Vertica Place provides the ability to import ESRI shape (.shp) files and use them for geographic analysis. The objective of this post is to show how to import an ESRI shape file into Vertica and create a function that maps a longitude/latitude point to a US Zip Code.

This example assumes that HP Vertica place has been installed. The ESRI shape file of US Zip codes was obtained from the census bureau as follows.

After unzipping the archive you will see several files including tl_2014_us_zcta510.shp, the shape file we want to import into Vertica. Login to Vertica and run the following command to generate the create table statement you will want to modify. It’s important to include the full path to the file, otherwise you will get an error message.

This will generate the DDL for the zip code table. Create a schema named ref (for reference), and then create the table in this schema. Note that the geom field is huge. This will have a large, very adverse impact on performance.

After creating the table, load the table from the shape file. Since the SRID isn’t known, let’s use a common one, SRID=4326

Next look for invalid records and delete them.

In this case there was only one bad record.

Next create a spatial index on the table.

This will look like:

Now that we have data we can use the lookup functions. First, we need to select a latitude/longitude pair, which can be looked up online. We’ll use Zip Code 28210, which is in Charlotte, North Carolina with a centroid at Latitude (North) 35.1289, Longitude (West) 80.8557. After converting this to a coordinate pair we can look up the longitude and latitude using HP Vertica Place’s functions.

Start by looking up the Zip Code without using the spatial index via the ST_Intersects function.

The result will look like:

An alternative approach is to look up the Zip Code with the spatial index via the STV_Intersect function.

The result will look like:

Not surprisingly, using an index is faster, although it still takes several seconds to lookup the Zip Code associated with one point on my 3 node cluster of DL-380s. This is because the zip code polygons are complex, and have many points and edges. Using less complex polygons would speed up the search.

About the author / 

Doug Harmon

Doug is a Technical Consultant and author's his own blog at datadug.com. He has written the Vertical SQL Toolbelt that contains shell scripts and other helpful diagnostic queries.

1 Comment

  1. scf September 18, 2016 at 4:28 AM -  Reply

    Thank you for this walkthrough, it was very helpful.
    I just want to note two things, after have followed it through:
    1. The file has to be placed in all nodes of the cluster in the same path.
    2. You noted at the end that it takes a few seconds to find a zip code associated with one point. This is incorrect. It takes a few seconds to load the index to memory since it’s so large, however if you query a million locations at the same time, it will still run for just a few seconds.
    3. In case you do run a large query, it is better to use STV_GeometryPoint rather than ST_GeomFromText.

Leave a Reply

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.

Notice

This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: