Orchestration service for SQL only ETL workflows.
Project description
Why SQLizer
In many cases you can use SQL only for ETL (extract/transform/load) pipelines relying on CTAS (create table as) queries and the builting import/export futures of your RDBMS or data warehouse software (eg. Redshift).
What is SQLizer
A simple orchestration service for SQL-only ETL workflows. This service was born out of a need to orchestrate a complete data processing pipeline atop of AWS Redshift.
Roadmap
[x] PostgreSQL/Resdhift support [x] Execiting multiple queries from a folder [ ] Executing a named query [ ] Executing an inline query [ ] MySQL support/Aurora support [ ] MongoDB support [ ] parallel execution of queries in one stage [ ] validation of the wrokflow [ ] DAG for stages [ ] multi-connection support
Developing SQLizer
Setting up the development environment
python3 -m venv ./.venv
echo ".venv/" >> .gitignore
source .venv/bin/activate
pip install -e .
Optionally install development/test dependencies:
pip install pytest pytest-runner codecov pytest-cov recommonmark
Prepare the docker image (and test it):
docker build -t sqlizer .
docker run --rm --name sqlizer-runner -e "job_id=sqlizer" -e "bucket=sss" sqlizer
Prepare test data:
aws s3 mb s3://sqlizer-workflows --profile your-profile
aws s3 sync ~/Code/sqlizer/test-data/ s3://sqlizer-workflows --profile your-profile
Add parameters to the Systems Manager's Parameter Store:
aws ssm put-parameter --overwrite --name sqlizer.default.auth --value user:password --type SecureString --description "authentication details for data-source" --profile your-profile
aws ssm put-parameter --overwrite --name sqlizer.default.host --value "some-cluster.redshift.amazonaws.com:5439/database" --type SecureString --description "url access for default data source" --profile your-profile
Run it locally:
export AWS_PROFILE=your-profile
#sqlizer --connection-url="root:some_secret_pass@some-cluster.redshift.amazonaws.com:5439/database" --bucket="s3://sqlizer-workflows"
sqlizer
Prepare the distribution:
pip install -U setuptools wheel
python setup.py build -vf && python setup.py bdist_wheel
pip install -U twine
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file sqlizer-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: sqlizer-0.0.1-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.1.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d81939ca164c75cc1b56c2635d4dbc8ee45fc6c46f673202399665e6d08a2173 |
|
MD5 | 9d1cc6930af32c37eb861927106c8951 |
|
BLAKE2b-256 | a0e86c49e35a22d74ff7010497982282e931f374747fc0204607d04148fd52a6 |