GeoLibs Dator - A data extractor
Project description
GeoLibs-Dator
Dator, a data extractor (ETL as a library), that uses Pandas' DataFrames as in memory temporal storage.
Features
| Source | Extract | Transform | Load |
|---|---|---|---|
| BigQuery | Y | Y | |
| CARTO | Y | Y | Y* |
| CSV | Y | Y | |
| Pandas | Y | ||
| PostgreSQL | Y | Y | Y |
* Note: We are waiting for the append feature on CARTOframes, because the one we are using is a ñapa.
Configuration
Create a config.yml file using the config.example.yml one as guide. You can find in that one all the possible ETL cases.
If you are using BigQuery in your ETL process, you need to add a GOOGLE_APPLICATION_CREDENTIALS environment variable with the path to your Google Cloud's credentials.json file.
You can test them with the example.py file.
Example
dator_config.yml
datastorages:
bigquery_input:
type: bigquery
data:
query: SELECT * FROM `dataset.table` WHERE updated_at >= '2019-05-04T00:00:00Z' AND updated_at < '2019-06-01T00:00:00Z';
carto_input:
type: carto
credentials:
url: https://domain.com/user/user/
api_key: api_key
data:
table: table
postgresql_input:
credentials:
...
data:
query: SELECT * FROM somewhere;
types:
- name: timeinstant
type: datetime
- name: fillinglevel
type: float
- name: temperature
type: int
- name: category
type: str
carto_output:
type: carto
credentials:
url: https://domain.com/user/user/
api_key: api_key
data:
table: table
append: false
transformations:
bigquery_agg:
type: bigquery
time:
field: updated_at
start: "2019-05-02T00:00:00Z" # As string or YAML will parse them as DateTimes
finish: "2019-05-03T00:00:00Z"
step: 5 MINUTE
aggregate:
by:
- container_id
- updated_at
fields:
field_0: avg
field_1: max
extract: bigquery_input
transform: bigquery_agg
load: carto_output
How to use
This package is designed to accomplish ETL operations in three steps:
Extract
The extract method is a default method, this means although this method can be overwritten, by default, it must work via config.
(This section under construction)
Transform
(This section under construction)
Load
The load method is a default method, this means although this method can be overwritten, by default, it must work via config. It can receive 2 parameters, the Pandas dataframe and a dictionary with extra info.
Example
app.py
from dator import Dator
dator = Dator('/usr/src/app/dator_config.yml')
df = dator.extract()
df = dator.transform(df)
dator.load(df)
app.py with extra info
from dator import Dator
def upsert_method:
pass
dator = Dator('/usr/src/app/dator_config.yml')
df = dator.extract()
df = dator.transform(df)
dator.load(df, {'method': upsert_method})
TODOs
- Better doc.
- Tests.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geolibs-dator-0.0.7.tar.gz.
File metadata
- Download URL: geolibs-dator-0.0.7.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.14 CPython/3.6.8 Linux/4.14.127+
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3bedbea6607d6dfcff5a88a6604abf3713a27b7bab51e34ff5cad307848ec1f5
|
|
| MD5 |
89518dbd0f889cf13827b8f261756e23
|
|
| BLAKE2b-256 |
d4bdb69d924a4fefa9b47bbab7ed1448094c834024e478297fab1c53456e80ca
|
File details
Details for the file geolibs_dator-0.0.7-py3-none-any.whl.
File metadata
- Download URL: geolibs_dator-0.0.7-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.14 CPython/3.6.8 Linux/4.14.127+
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0311502b41a5526d92243fa68dac7ef4e0577c7e7fb9d6edb1ea04ea483fbb4
|
|
| MD5 |
cc0a9232355e6c8e854620e97cca0404
|
|
| BLAKE2b-256 |
70ac4676adc34ca9b396143bfb3c89b85696fa6f2485599dae432ee299dcd837
|