Skip to main content

ETL tools for spatial data stored in postgres

Project description

pg-data-etl

PyPI

ETL tools for spatial data stored in postgres.

About

This module exists to make life easier when working with geospatial data in a Postgres database.

You should have the following command-line tools installed, preferably on your system path:

  • psql
  • pg_dump
  • shp2postgis
  • ogr2ogr

If you want to use the optional vector tile functions you'll also need:

  • tippecanoe

Installation

pip install pg_data_etl

Example

The following code blocks import spatial data into Postgres and runs a spatial query:

1) Connect to the database and create it

>>> from pg_data_etl import Database
>>> credentials = {
...     "db_name": "sample_database",
...     "host": "localhost",
...     "un": "username",
...     "pw": "my-password",
...     "super_un": "postgres",
...     "super_pw": "superuser-password"
... }
>>> db = Database.from_parameters(**credentials)
>>> db.admin("CREATE")

2) Import GIS data from the web

>>> data_to_import = [
...     ("philly.high_injury_network", "https://phl.carto.com/api/v2/sql?filename=high_injury_network_2020&format=geojson&skipfields=cartodb_id&q=SELECT+*+FROM+high_injury_network_2020"),
...     ("philly.playgrounds", "https://opendata.arcgis.com/datasets/899c807e205244278b3f39421be8489c_0.geojson")
... ]
>>> for sql_tablename, source_url in data_to_import:
...     kwargs = {
...         "filepath": source_url,
...         "sql_tablename": sql_tablename,
...         "gpd_kwargs": {"if_exists":"replace"}
...     }
...     db.import_gis(**kwargs)

3) Run a query and get the result as a geopandas.GeoDataFrame

>>> # Define a SQL query as a string in Python
>>> query = """
... select * from philly.high_injury_network
... where st_dwithin(
...     st_transform(geom, 26918),
...     (select st_transform(st_collect(geom), 26918) from philly.playgrounds),
...     100
... )
... order by st_length(geom) DESC """
>>> # Get a geodataframe from the db using the query
>>> gdf = db.gdf(query)
>>> gdf.head()
   index  objectid            street_name   buffer                                               geom  uid
0    234       189          BUSTLETON AVE  75 feet  LINESTRING (-75.07081 40.03528, -75.07052 40.0...  236
1     65        38                 5TH ST  50 feet  LINESTRING (-75.14528 39.96913, -75.14502 39.9...   66
2    223       179           ARAMINGO AVE  75 feet  LINESTRING (-75.12212 39.97449, -75.12132 39.9...  224
3    148       215               KELLY DR  75 feet  LINESTRING (-75.18470 39.96934, -75.18513 39.9...  150
4    156       224  MARTIN LUTHER KING DR  75 feet  LINESTRING (-75.17713 39.96327, -75.17775 39.9...  159

To save time and typing, database credentials can be stored in a text file. You can place this file wherever you want, but by default it's placed into /USERHOME/.pg-data-etl/database_connections.cfg.

To generate one for the first time, run the following from a terminal prompt:

> pg make-config-file

This file uses the following format:

[DEFAULT]
pw = this-is-a-placeholder-password
port = 5432
super_db = postgres
super_un = postgres
super_pw = this-is-another-placeholder-password

[localhost]
host = localhost
un = postgres
pw = your-password-here

Each entry in square brackets is a named connection, and any parameters not explicitly defined are inherited from DEFAULT. You can have as many connections defined as you'd like, and you can use them like this:

>>> from pg_data_etl import Database
>>> db = Database.from_config("sample_database", "localhost")

Development

Clone or fork this repo:

git clone https://github.com/aaronfraint/pg-data-etl.git
cd pg-data-etl

Install an editable version with poetry:

poetry install

Windows users who prefer to use conda can use the included environment.yml file:

conda env create -f environment.yml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg-data-etl-0.3.9.tar.gz (32.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pg_data_etl-0.3.9-py3-none-any.whl (39.3 kB view details)

Uploaded Python 3

File details

Details for the file pg-data-etl-0.3.9.tar.gz.

File metadata

  • Download URL: pg-data-etl-0.3.9.tar.gz
  • Upload date:
  • Size: 32.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.5 Darwin/21.4.0

File hashes

Hashes for pg-data-etl-0.3.9.tar.gz
Algorithm Hash digest
SHA256 3cf1e4f00f2f6d6833f181256e67f46b001a42ad6d3eefaf8e99fa4376e7efc0
MD5 ec04a9728ad878deb5cc008469617e80
BLAKE2b-256 594dff06fc9c08965638bb54bb9d4401dc7517a460f9b28c25018428d7981245

See more details on using hashes here.

File details

Details for the file pg_data_etl-0.3.9-py3-none-any.whl.

File metadata

  • Download URL: pg_data_etl-0.3.9-py3-none-any.whl
  • Upload date:
  • Size: 39.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.5 Darwin/21.4.0

File hashes

Hashes for pg_data_etl-0.3.9-py3-none-any.whl
Algorithm Hash digest
SHA256 1b462605d1e3c6c57a38e4b86f5905f551902522a12bea68396d56ac15cadbd0
MD5 ddd3cebe2db5dbbbfac824fb3233f391
BLAKE2b-256 b7306f5461857b7dad4d6cd3ee8d16900be8c460ef24c3160eca21bc90dc667b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page