Skip to main content

Orchestration service for SQL only ETL workflows.

Project description

Why SQLizer

In many cases you can use SQL only for ETL (extract/transform/load) pipelines relying on CTAS (create table as) queries and the builting import/export futures of your RDBMS or data warehouse software (eg. Redshift).

What is SQLizer

A simple orchestration service for SQL-only ETL workflows. This service was born out of a need to orchestrate a complete data processing pipeline atop of AWS Redshift.

Roadmap

[x] PostgreSQL/Resdhift support [x] Execiting multiple queries from a folder [ ] Executing a named query [ ] Executing an inline query [ ] MySQL support/Aurora support [ ] MongoDB support [ ] parallel execution of queries in one stage [ ] validation of the wrokflow [ ] DAG for stages [ ] multi-connection support

Developing SQLizer

Setting up the development environment

python3 -m venv ./.venv
echo ".venv/" >> .gitignore
source .venv/bin/activate
pip install -e .

Optionally install development/test dependencies:

pip install pytest pytest-runner codecov pytest-cov recommonmark

Prepare the docker image (and test it):

docker build -t sqlizer .
docker run --rm  --name sqlizer-runner -e "job_id=sqlizer" -e "bucket=sss" sqlizer

Prepare test data:

aws s3 mb s3://sqlizer-workflows --profile your-profile
aws s3 sync ~/Code/sqlizer/test-data/ s3://sqlizer-workflows --profile your-profile

Add parameters to the Systems Manager's Parameter Store:

aws ssm put-parameter --overwrite --name sqlizer.default.auth --value user:password --type SecureString --description "authentication details for data-source" --profile your-profile
aws ssm put-parameter --overwrite --name sqlizer.default.host --value "some-cluster.redshift.amazonaws.com:5439/database" --type SecureString --description "url access for default data source" --profile your-profile

Run it locally:

export AWS_PROFILE=your-profile
#sqlizer --connection-url="root:some_secret_pass@some-cluster.redshift.amazonaws.com:5439/database" --bucket="s3://sqlizer-workflows"
sqlizer

Prepare the distribution:

pip install -U setuptools wheel
python setup.py build -vf && python setup.py bdist_wheel
pip install -U twine

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

sqlizer-0.0.1-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file sqlizer-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: sqlizer-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.1.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.3

File hashes

Hashes for sqlizer-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d81939ca164c75cc1b56c2635d4dbc8ee45fc6c46f673202399665e6d08a2173
MD5 9d1cc6930af32c37eb861927106c8951
BLAKE2b-256 a0e86c49e35a22d74ff7010497982282e931f374747fc0204607d04148fd52a6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page