Intake driver for Civis platform
Project description
intake-civis
This is an intake data source for data warehoused in the Civis platform.
Requirements
civis-python
intake
Installation
intake-civis
is published on PyPI.
You can install it by running the following in your terminal:
pip install intake-civis
Usage
You can specify Civis schemas and tables using a YAML intake catalog:
sources:
# An entry representing a catalog for an entire schema.
postgres:
driver: "civis_schema"
args:
database: "City of Los Angeles - Postgres"
schema: "transporatation"
# An entry representing a single table
bike_trips:
driver: "civis"
args:
database: "City of Los Angeles - Postgres"
table: "bike_trips"
schema: "transportation"
As a convenience, there is also a top-level function which creates a catalog from the entire Redshift or PostgreSQL databases.
You can create it with
import intake_civis
redshift_cat = intake_civis.open_redshift_catalog()
postgres_cat = intake_civis.open_postgres_catalog()
You can then use these catalogs to drill down to different schemas and tables, e.g.:
bike_trips = postgres_cat.transportation.bike_trips.read()
For more examples, see this demo notebook.
Geospatial support
Both Redshift and Postgres support geospatial values.
We can tell the source to read in a table/query as a GeoDataFrame
by passing in a string or list of strings in the geometry
argument.
You can also pass in a GeoPandas-compatible crs
argument to set the
coordinate reference system for the GeoDataFrame.
When more than one column is provided, the primary
geometry column for the GeoDataFrame is assumed to be the first in the list.
The CivisSchema
object attempts to automatically determine the geometry columns
and coordinate reference systems from the database table metadata.
Ibis support
Sometimes a table might be too large to load the entire thing into memory.
In that case, it is useful to query a subset of the table.
Ibis is a tool that has a pandas-like API for generating SQL queries.
Civis table catalog entries have a to_ibis()
function which provides a lazy ibis expression.
This can then be used to query a smaller amount of data:
# Get a lazy ibis object
bike_trips = postgres_cat.transportation.bike_trips.to_ibis()
# Subset it
bike_trips_subset = bike_trips[bike_trips.start_datetime > "2019-04-01"]
# Execute the query to get an in-memory dataframe:
df = bike_trips_subset.execute()
Important limitation: Due to network restrictions on the Civis databases, you can only use this feature while in the platform. It will be unable to establish a connection from your local machine.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file intake-civis-0.2.0.tar.gz
.
File metadata
- Download URL: intake-civis-0.2.0.tar.gz
- Upload date:
- Size: 12.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e270df3a97596f01ab3f908cc1f2c5eddfbbf068de356fe21406bec8eecac69 |
|
MD5 | b8b02ca3f1da51580c9ab1df5356419d |
|
BLAKE2b-256 | a008a5576395b083d24716a13170ffffe4f63fc5e0079b5e86dd20a15fac4192 |
File details
Details for the file intake_civis-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: intake_civis-0.2.0-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2cde27a2ef612ee32c462190f2d3a1b0b7f734b6c7f56537b063db79e566191c |
|
MD5 | b0a6ce67182e1bf50664769dd76a635f |
|
BLAKE2b-256 | bfbc052581817d882d66751ecf3febaa609c1553549912b3759742f1cb310ee8 |