Skip to main content

Intake driver for Civis platform

Project description

intake-civis

This is an intake data source for data warehoused in the Civis platform.

Requirements

civis-python
intake

Installation

intake-civis is published on PyPI. You can install it by running the following in your terminal:

pip install intake-civis

Usage

You can specify Civis schemas and tables using a YAML intake catalog:

sources:
  # An entry representing a catalog for an entire schema.
  postgres:
    driver: "civis_schema"
    args:
      database: "City of Los Angeles - Postgres"
      schema: "transporatation"
  # An entry representing a single table
  bike_trips:
    driver: "civis"
    args:
      database: "City of Los Angeles - Postgres"
      table: "bike_trips"
      schema: "transportation"

As a convenience, there is also a top-level function which creates a catalog from the entire Redshift or PostgreSQL databases.

You can create it with

import intake_civis

redshift_cat = intake_civis.open_redshift_catalog()
postgres_cat = intake_civis.open_postgres_catalog()

You can then use these catalogs to drill down to different schemas and tables, e.g.:

bike_trips = postgres_cat.transportation.bike_trips.read()

For more examples, see this demo notebook.

Geospatial support

Both Redshift and Postgres support geospatial values. We can tell the source to read in a table/query as a GeoDataFrame by passing in a string or list of strings in the geometry argument. You can also pass in a GeoPandas-compatible crs argument to set the coordinate reference system for the GeoDataFrame. When more than one column is provided, the primary geometry column for the GeoDataFrame is assumed to be the first in the list.

The CivisSchema object attempts to automatically determine the geometry columns and coordinate reference systems from the database table metadata.

Ibis support

Sometimes a table might be too large to load the entire thing into memory. In that case, it is useful to query a subset of the table. Ibis is a tool that has a pandas-like API for generating SQL queries. Civis table catalog entries have a to_ibis() function which provides a lazy ibis expression. This can then be used to query a smaller amount of data:

# Get a lazy ibis object
bike_trips = postgres_cat.transportation.bike_trips.to_ibis()

# Subset it
bike_trips_subset = bike_trips[bike_trips.start_datetime > "2019-04-01"]

# Execute the query to get an in-memory dataframe:
df = bike_trips_subset.execute()

Important limitation: Due to network restrictions on the Civis databases, you can only use this feature while in the platform. It will be unable to establish a connection from your local machine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for intake-civis, version 0.2.0
Filename, size File type Python version Upload date Hashes
Filename, size intake_civis-0.2.0-py3-none-any.whl (13.4 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size intake-civis-0.2.0.tar.gz (12.2 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page