GeoLibs Dator - A data extractor
Project description
GeoLibs-Dator
Dator, a data extractor (ETL as a library), that uses Pandas' DataFrames as in memory temporal storage.
Features
Source | Extract | Transform | Load |
---|---|---|---|
BigQuery | Y | Y | |
CARTO | Y | Y | Y* |
CSV | Y | Y | |
Pandas | Y | ||
PostgreSQL | Y | Y | Y |
* Note: We are waiting for the append feature on CARTOframes, because the one we are using is a ñapa.
Configuration
Create a config.yml
file using the config.example.yml
one as guide. You can find in that one all the possible ETL cases.
If you are using BigQuery in your ETL process, you need to add a GOOGLE_APPLICATION_CREDENTIALS
environment variable with the path to your Google Cloud's credentials.json
file.
You can test them with the example.py
file.
Example
dator_config.yml
datastorages: bigquery_input: type: bigquery data: query: SELECT * FROM `dataset.table` WHERE updated_at >= '2019-05-04T00:00:00Z' AND updated_at < '2019-06-01T00:00:00Z'; carto_input: type: carto credentials: url: https://domain.com/user/user/ api_key: api_key data: table: table postgresql_input: credentials: ... data: query: SELECT * FROM somewhere; types: - name: timeinstant type: datetime - name: fillinglevel type: float - name: temperature type: int - name: category type: str carto_output: type: carto credentials: url: https://domain.com/user/user/ api_key: api_key data: table: table append: false transformations: bigquery_agg: type: bigquery time: field: updated_at start: "2019-05-02T00:00:00Z" # As string or YAML will parse them as DateTimes finish: "2019-05-03T00:00:00Z" step: 5 MINUTE aggregate: by: - container_id - updated_at fields: field_0: avg field_1: max extract: bigquery_input transform: bigquery_agg load: carto_output
How to use
This package is designed to accomplish ETL operations in three steps:
Extract
The extract method is a default method, this means although this method can be overwritten, by default, it must work via config.
(This section under construction)
Transform
(This section under construction)
Load
The load method is a default method, this means although this method can be overwritten, by default, it must work via config. It can receive 2 parameters, the Pandas dataframe and a dictionary with extra info.
Example
app.py
from dator import Dator dator = Dator('/usr/src/app/dator_config.yml') df = dator.extract() df = dator.transform(df) dator.load(df)
app.py with extra info
from dator import Dator def upsert_method: pass dator = Dator('/usr/src/app/dator_config.yml') df = dator.extract() df = dator.transform(df) dator.load(df, {'method': upsert_method})
TODOs
- Better doc.
- Tests.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size geolibs_dator-0.0.7-py3-none-any.whl (12.4 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |
Filename, size geolibs-dator-0.0.7.tar.gz (9.8 kB) | File type Source | Python version None | Upload date | Hashes View |
Hashes for geolibs_dator-0.0.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0311502b41a5526d92243fa68dac7ef4e0577c7e7fb9d6edb1ea04ea483fbb4 |
|
MD5 | cc0a9232355e6c8e854620e97cca0404 |
|
BLAKE2-256 | 70ac4676adc34ca9b396143bfb3c89b85696fa6f2485599dae432ee299dcd837 |