ETL tools for spatial data stored in postgres
Project description
pg-data-etl
ETL tools for spatial data stored in postgres.
About
This module exists to make life easier when working with geospatial data in a Postgres database.
You should have the following command-line tools installed, preferably on your system path:
psqlpg_dumpshp2postgisogr2ogr
If you want to use the optional vector tile functions you'll also need:
tippecanoe
Installation
pip install pg_data_etl
Example
The following code blocks import spatial data into Postgres and runs a spatial query:
1) Connect to the database and create it
>>> from pg_data_etl import Database
>>> credentials = {
... "db_name": "sample_database",
... "host": "localhost",
... "un": "username",
... "pw": "my-password",
... "super_un": "postgres",
... "super_pw": "superuser-password"
... }
>>> db = Database.from_parameters(**credentials)
>>> db.admin("CREATE")
2) Import GIS data from the web
>>> data_to_import = [
... ("philly.high_injury_network", "https://phl.carto.com/api/v2/sql?filename=high_injury_network_2020&format=geojson&skipfields=cartodb_id&q=SELECT+*+FROM+high_injury_network_2020"),
... ("philly.playgrounds", "https://opendata.arcgis.com/datasets/899c807e205244278b3f39421be8489c_0.geojson")
... ]
>>> for sql_tablename, source_url in data_to_import:
... kwargs = {
... "filepath": source_url,
... "sql_tablename": sql_tablename,
... "gpd_kwargs": {"if_exists":"replace"}
... }
... db.import_gis(**kwargs)
3) Run a query and get the result as a geopandas.GeoDataFrame
>>> # Define a SQL query as a string in Python
>>> query = """
... select * from philly.high_injury_network
... where st_dwithin(
... st_transform(geom, 26918),
... (select st_transform(st_collect(geom), 26918) from philly.playgrounds),
... 100
... )
... order by st_length(geom) DESC """
>>> # Get a geodataframe from the db using the query
>>> gdf = db.gdf(query)
>>> gdf.head()
index objectid street_name buffer geom uid
0 234 189 BUSTLETON AVE 75 feet LINESTRING (-75.07081 40.03528, -75.07052 40.0... 236
1 65 38 5TH ST 50 feet LINESTRING (-75.14528 39.96913, -75.14502 39.9... 66
2 223 179 ARAMINGO AVE 75 feet LINESTRING (-75.12212 39.97449, -75.12132 39.9... 224
3 148 215 KELLY DR 75 feet LINESTRING (-75.18470 39.96934, -75.18513 39.9... 150
4 156 224 MARTIN LUTHER KING DR 75 feet LINESTRING (-75.17713 39.96327, -75.17775 39.9... 159
To save time and typing, database credentials can be stored in a text file. You can place this file wherever you want,
but by default it's placed into /USERHOME/.pg-data-etl/database_connections.cfg.
To generate one for the first time, run the following from a terminal prompt:
> pg make-config-file
This file uses the following format:
[DEFAULT]
pw = this-is-a-placeholder-password
port = 5432
super_db = postgres
super_un = postgres
super_pw = this-is-another-placeholder-password
[localhost]
host = localhost
un = postgres
pw = your-password-here
Each entry in square brackets is a named connection, and any parameters not explicitly defined are inherited from DEFAULT.
You can have as many connections defined as you'd like, and you can use them like this:
>>> from pg_data_etl import Database
>>> db = Database.from_config("sample_database", "localhost")
Development
Clone or fork this repo:
git clone https://github.com/aaronfraint/pg-data-etl.git
cd pg-data-etl
Install an editable version with poetry:
poetry install
Windows users who prefer to use conda can use the included environment.yml file:
conda env create -f environment.yml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pg-data-etl-0.3.9.tar.gz.
File metadata
- Download URL: pg-data-etl-0.3.9.tar.gz
- Upload date:
- Size: 32.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.8.5 Darwin/21.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cf1e4f00f2f6d6833f181256e67f46b001a42ad6d3eefaf8e99fa4376e7efc0
|
|
| MD5 |
ec04a9728ad878deb5cc008469617e80
|
|
| BLAKE2b-256 |
594dff06fc9c08965638bb54bb9d4401dc7517a460f9b28c25018428d7981245
|
File details
Details for the file pg_data_etl-0.3.9-py3-none-any.whl.
File metadata
- Download URL: pg_data_etl-0.3.9-py3-none-any.whl
- Upload date:
- Size: 39.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.8.5 Darwin/21.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b462605d1e3c6c57a38e4b86f5905f551902522a12bea68396d56ac15cadbd0
|
|
| MD5 |
ddd3cebe2db5dbbbfac824fb3233f391
|
|
| BLAKE2b-256 |
b7306f5461857b7dad4d6cd3ee8d16900be8c460ef24c3160eca21bc90dc667b
|