Skip to main content

Enrich data using Iggy (askiggy.com)

Project description

iggy-enrich-python

Iggy makes it easy for data scientists and machine learning engineers to include location-specific features in their models and analyses.

This package helps Iggy data users to enrich the points (or latitude/longitude pairs) in their data with Iggy features using Python.

Getting started

  1. Get Iggy data. Please request access here and we'll immediately send you a link to some sample data to play around with.

    What you'll receive is a sample data file in .tar.gz format. Un-compress it in a location of your choosing, for example:

    tar -xzvf iggy-package-wkt-<iggy_version_id>_<iggy_prefix>.tar.gz
    
  2. Install this library and dependencies.

    Install via pip:

    pip install iggyenrich
    
  3. Enrich some data!

    This repo contains a (very small) sample csv file with the locations of twenty four 7-11 stores in Pinellas County, FL. It has latitude and longitude columns specifying the location of each store and a few additional attributes. The easiest way to enrich a file like this (with all the available Iggy features) is by running:

    python -m iggyenrich.iggy_enrich -f ./sample_data/pinellas_711s.csv --iggy_base_loc <iggy_base_loc> --iggy_version_id <iggy_version_id>
    

    ...where <iggy_base_loc> is the local directory or S3 bucket in which you have your un-compressed Iggy data, and <iggy_version_id> is the version of Iggy data you have (something like "20211110214810").

    After a few seconds you'll find an "enriched" version of the file in sample_data/enriched_pinellas_711s.csv containing its original 24 data rows, but the number of columns has exploded from the original 7 to 2,808. These extra ~2,800 columns contain Iggy features.

Examples

If you want to use IggyEnrich within your own code, here are a few things you can do:

Create a Local Iggy Data Package

This repo assumes that you have your Iggy data saved locally or on s3, and want to load it into memory to do your enrichment.

To do this, you'll first want to create an instance of a LocalIggyDataPackage object, which loads the data from disk or s3:

from iggyenrich.iggy_data_package import LocalIggyDataPackage

pkg_spec = {
    "iggy_version_id": "{your iggy version id}",
    "crosswalk_prefix": "{your iggy crosswalk prefix}",
    "base_loc": "{your iggy data base location on disk or s3 bucket}",
    "iggy_prefix": "{your data's iggy prefix}"
}
pkg = LocalIggyDataPackage(**pkg_spec)

You'll notice that the pkg_spec includes a couple parameters you need to specify. These depend on the Iggy data package you've downloaded or put in an s3 bucket. If you look at one of these packages, you'll see that it has a parent directory like:

/<base_loc>/iggy-package-wkt-<iggy_version_id>_<iggy_prefix>/

e.g.

  • /Users/iggy/data/iggy-package-wkt-20211110214810_fl_pinellas_quadkeys, in which case base_loc="/Users/iggy/data" and iggy_version_id="20211110214810" and iggy_prefix="fl_pinellas_quadkeys", or

  • s3://iggy-bucket/data/iggy-package-wkt-20211110214810_fl_pinellas_quadkeys, in which case base_loc="s3://iggy-bucket/data" and iggy_version_id="20211110214810" and iggy_prefix="fl_pinellas_quadkeys", or

Within that parent directory will be one or more crosswalk files with a name like:

/<crosswalk_prefix>_<iggy_version_id>/000000000000.snappy.parquet

You can specify iggy_version_id, crosswalk_prefix, base_loc, and iggy_prefix based on these naming conventions.

Now, once your package is set up, you can bundle it with IggyEnrich and load the data:

from iggyenrich.iggy_enrich import IggyEnrich

iggy = IggyEnrich(iggy_package=pkg)
iggy.load()

Choose specific boundaries+features

What if I don't want to enrich my data with all of Iggy's features, but rather want to select a few specific boundaries or features? (See our Data README and Data Dictionary) for what's available.)

You can narrow things down by specifing what you want when calling load():

selected_features = [
    "area_sqkm_qk_isochrone_walk_10m",
    "population_qk_isochrone_walk_10m",
    "poi_count_per_capita_qk_isochrone_walk_10m",
    "poi_count_qk_isochrone_walk_10m",
]
selected_boundaries = ["cbg"]

iggy.load(boundaries=selected_boundaries, features=selected_features)

This will load all features corresponding to the cbg boundary, plus the four selected features corresponding to the isochrone_walk_10m boundary.

Enrich a DataFrame with columns for lat/lng

Let's assume you have a pandas DataFrame containing columns with latitude and longitude. You can enrich it with your chosen Iggy features:

import pandas as pd

df = pd.read_csv('sample_data/pinellas_711s.csv')
enriched_df = iggy.enrich_df(df, latitude_col="latitude", longitude_col="longitude")

Enrich a GeoDataFrame

If you prefer working in GeoPandas, the enrich_df function can take a GeoDataFrame too:

import geopandas as gpd
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude), crs="WGS84")
enriched_gdf = iggy.enrich_df(gdf)

More resources

You can find our Data README here and our Data Dictionary here.

Contact us

For questions or issues with using this code, please add a New Issue and we'll respond as quickly as possible.

To get access to Iggy sample data please contact us here!

If you cannot find an answer to a question in here or at any of those links, please do not hesitate to reach out to Iggy Support (support@askiggy.com).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iggyenrich-0.0.0.tar.gz (9.7 kB view hashes)

Uploaded Source

Built Distribution

iggyenrich-0.0.0-py3-none-any.whl (9.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page