Skip to main content

GeoLibs Dator - A data extractor

Project description

GeoLibs-Dator

Dator, a data extractor (ETL as a library), that uses Pandas' DataFrames as in memory temporal storage.

Features

Source Extract Transform Load
BigQuery Y Y
CARTO Y Y Y*
CSV Y Y
Pandas Y
PostgreSQL Y Y Y

* Note: We are waiting for the append feature on CARTOframes, because the one we are using is a ñapa.

Configuration

Create a config.yml file using the config.example.yml one as guide. You can find in that one all the possible ETL cases.

If you are using BigQuery in your ETL process, you need to add a GOOGLE_APPLICATION_CREDENTIALS environment variable with the path to your Google Cloud's credentials.json file.

You can test them with the example.py file.

Example

dator_config.yml

datastorages:
  bigquery_input:
    type: bigquery
    data:
      query: SELECT * FROM `dataset.table` WHERE updated_at >= '2019-05-04T00:00:00Z' AND updated_at < '2019-06-01T00:00:00Z';

  carto_input:
    type: carto
    credentials:
      url: https://domain.com/user/user/
      api_key: api_key
    data:
      table: table

  postgresql_input:
    credentials:
      ...
    data:
      query: SELECT * FROM somewhere;
      types:
        - name: timeinstant
          type: datetime
        - name: fillinglevel
          type: float
        - name: temperature
          type: int
        - name: category
          type: str

  carto_output:
    type: carto
    credentials:
      url: https://domain.com/user/user/
      api_key: api_key
    data:
      table: table
      append: false

transformations:
  bigquery_agg:
    type: bigquery
    time:
      field: updated_at
      start: "2019-05-02T00:00:00Z"  # As string or YAML will parse them as DateTimes
      finish: "2019-05-03T00:00:00Z"
      step: 5 MINUTE
    aggregate:
      by:
        - container_id
        - updated_at
      fields:
        field_0: avg
        field_1: max

extract: bigquery_input
transform: bigquery_agg
load: carto_output

How to use

This package is designed to accomplish ETL operations in three steps:

Extract

The extract method is a default method, this means although this method can be overwritten, by default, it must work via config.

(This section under construction)

Transform

(This section under construction)

Load

The load method is a default method, this means although this method can be overwritten, by default, it must work via config. It can receive 2 parameters, the Pandas dataframe and a dictionary with extra info.

Example

app.py

from dator import Dator

dator = Dator('/usr/src/app/dator_config.yml')
df = dator.extract()
df = dator.transform(df)
dator.load(df)

app.py with extra info

from dator import Dator

def upsert_method:
  pass

dator = Dator('/usr/src/app/dator_config.yml')
df = dator.extract()
df = dator.transform(df)
dator.load(df, {'method': upsert_method})

TODOs

  • Better doc.
  • Tests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geolibs-dator-0.0.7.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

geolibs_dator-0.0.7-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file geolibs-dator-0.0.7.tar.gz.

File metadata

  • Download URL: geolibs-dator-0.0.7.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.14 CPython/3.6.8 Linux/4.14.127+

File hashes

Hashes for geolibs-dator-0.0.7.tar.gz
Algorithm Hash digest
SHA256 3bedbea6607d6dfcff5a88a6604abf3713a27b7bab51e34ff5cad307848ec1f5
MD5 89518dbd0f889cf13827b8f261756e23
BLAKE2b-256 d4bdb69d924a4fefa9b47bbab7ed1448094c834024e478297fab1c53456e80ca

See more details on using hashes here.

File details

Details for the file geolibs_dator-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: geolibs_dator-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.14 CPython/3.6.8 Linux/4.14.127+

File hashes

Hashes for geolibs_dator-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e0311502b41a5526d92243fa68dac7ef4e0577c7e7fb9d6edb1ea04ea483fbb4
MD5 cc0a9232355e6c8e854620e97cca0404
BLAKE2b-256 70ac4676adc34ca9b396143bfb3c89b85696fa6f2485599dae432ee299dcd837

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page