Skip to main content

Simple python ETL tool.

Project description

DOMETL (Python ETL Tool)

Dometl is a Python ETL package.

Process

  1. Init - Initializes the database
dometl -t init
  1. Stage - Moves files into staging tables
dometl -t stage
  1. Live - Runs transformations to move data from staging to live tables
dometl -t live
  1. Test - Runs very simple tests on the data
dometl -t test

How to Install & Run the Package?

Run the initialization step

dometl -t init -cp dometl_config
# if you don't install the package
# python -c "from dometl import run_dometl; run_dometl()" -t init -cp dometl_config

Run the staging step

dometl -t stage -ep datasets\\game_data\\daily\\20221105_g.csv -tb ST_GAME -cp dometl_config
# if you don't install the package
# python -c "from dometl import run_dometl; run_dometl()" -t stage -ep datasets\\game_data\\daily\\20221105_g.csv -tb st_game -cp dometl_config
# python -c "from dometl import run_dometl; run_dometl()" -t stage -ep datasets\\game_data\\seasons -tb st_game -cp dometl_config

Run the live step

dometl -t live -tb game -cp dometl_config
# if you don't install the package
# python -c "from dometl import run_dometl; run_dometl()" -t live -tb game -cp dometl_config

Run the test step

dometl -t test -tb game -cp dometl_config
# if you don't install the package
# python -c "from dometl import run_dometl; run_dometl()" -t test -tb game -cp dometl_config

The simple testing is made up of testing queries which are placed into the config.yaml folder like below

tests:
  table_name: ["some_test.sql", "other_test.sql"]

Each table can have a set of test queries. The queries need to be written in a way that they return 0 rows when the test passes. If the query returns more than 0 rows the test will fail. As a suggestion the rows that are returned should help find the root cause of the failure.

Configuration Folder

\folder
    config.yaml     # structure defined below
    db_create.sql   # custom file which creates and initializes the db
    file1.sql       # custom SQL file
    file2.sql       # custom SQL file
    file3.sql       # custom SQL file
    file4.sql       # custom SQL file
    file5.sql       # custom SQL file

Structure for config.yaml

credentials_path: "path/to/creds.yaml"

init_order: [
  "db_create.sql",
  "file1.sql",
  "file2.sql",
]

etl:
  table_name_1: "file3.sql"  
  table_name_2: "file4.sql"  
  table_name_3: "file5.sql"  

Structure for the creds.yaml

db_credentials:
  username: ""
  password: ""
  hostname: ""
  port: ""
  db_name: ""

Bonus

Run a script with psql

psql -U postgres -h 127.0.0.1 -d DBNAME -f path\path\file_name.sql

Copy CSV into a table

psql -U postgres -h 127.0.0.1 -d DBNAME -c "COPY table_name FROM '/'some_name.csv' WITH (FORMAT csv)"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dometl-0.0.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dometl-0.0.1-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file dometl-0.0.1.tar.gz.

File metadata

  • Download URL: dometl-0.0.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for dometl-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4ab406e5e46a53a2b93890bb9252e810093663ba070ad237a13172582ae4dbc5
MD5 be19f4987a2242b4c8621b5113a4e1b4
BLAKE2b-256 e0ee8bae49a6d479626364e90ebb9e54207d38593bbaf60401b5bb549581fc70

See more details on using hashes here.

File details

Details for the file dometl-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: dometl-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for dometl-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5254b02ef484d9bd4072348b93c0c14091128e16753298b497510a92dc809d05
MD5 421a808d055e217068bf9414cc173bc5
BLAKE2b-256 4892002c5b8dcdd78818d4076effa622042cebf84dcbf514982db461cda35f09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page