Skip to main content

Orchestration service for SQL only ETL workflows.

Project description

Why SQLizer

In many cases you can use SQL only for ETL (extract/transform/load) pipelines relying on CTAS (create table as) queries and the builting import/export futures of your RDBMS or data warehouse software (eg. Redshift).

What is SQLizer

A simple orchestration service for SQL-only ETL workflows. This service was born out of a need to orchestrate a complete data processing pipeline atop of AWS Redshift.

Roadmap

[x] PostgreSQL/Resdhift support [x] Execiting multiple queries from a folder [ ] Executing a named query [ ] Executing an inline query [ ] MySQL support/Aurora support [ ] MongoDB support [ ] parallel execution of queries in one stage [ ] validation of the wrokflow [ ] DAG for stages [ ] multi-connection support

Developing SQLizer

Setting up the development environment

python3 -m venv ./.venv
echo ".venv/" >> .gitignore
source .venv/bin/activate
pip install -e .

Optionally install development/test dependencies:

pip install pytest pytest-runner codecov pytest-cov recommonmark

Prepare the docker image (and test it):

docker build -t sqlizer .
docker run --rm  --name sqlizer-runner -e "job_id=sqlizer" -e "bucket=sss" sqlizer

Prepare test data:

aws s3 mb s3://sqlizer-workflows --profile your-profile
aws s3 sync ~/Code/sqlizer/test-data/ s3://sqlizer-workflows --profile your-profile

Add parameters to the Systems Manager's Parameter Store:

aws ssm put-parameter --overwrite --name sqlizer.default.auth --value user:password --type SecureString --description "authentication details for data-source" --profile your-profile
aws ssm put-parameter --overwrite --name sqlizer.default.host --value "some-cluster.redshift.amazonaws.com:5439/database" --type SecureString --description "url access for default data source" --profile your-profile

Run it locally:

export AWS_PROFILE=your-profile
#sqlizer --connection-url="root:some_secret_pass@some-cluster.redshift.amazonaws.com:5439/database" --bucket="s3://sqlizer-workflows"
sqlizer

Prepare the distribution:

pip install -U setuptools wheel
python setup.py build -vf && python setup.py bdist_wheel
pip install -U twine

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

sqlizer-0.0.1-py3-none-any.whl (11.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page