CLI for data platform
Project description
data-pipelines-cli
CLI for data platform
Documentation
Read the full documentation at https://data-pipelines-cli.readthedocs.io/
Installation
Requirements: Python 3.9-3.12
Required
A dbt adapter extra must be installed:
pip install data-pipelines-cli[snowflake] # Snowflake
pip install data-pipelines-cli[bigquery] # BigQuery
pip install data-pipelines-cli[postgres] # PostgreSQL
pip install data-pipelines-cli[databricks] # Databricks
To pin a specific dbt-core version:
pip install data-pipelines-cli[snowflake] 'dbt-core>=1.8.0,<1.9.0'
Optional
Additional integrations: docker, datahub, looker, gcs, s3, git
Example
pip install data-pipelines-cli[bigquery,docker,datahub,gcs]
Troubleshooting
Pre-release dbt versions: data-pipelines-cli requires stable dbt-core releases. If you encounter errors with beta or RC versions, reinstall with stable versions:
pip install --force-reinstall 'dbt-core>=1.7.3,<2.0.0'
Usage
First, create a repository with a global configuration file that you or your organization will be using. The repository
should contain dp.yml.tmpl file looking similar to this:
_templates_suffix: ".tmpl"
_envops:
autoescape: false
block_end_string: "%]"
block_start_string: "[%"
comment_end_string: "#]"
comment_start_string: "[#"
keep_trailing_newline: true
variable_end_string: "]]"
variable_start_string: "[["
templates:
my-first-template:
template_name: my-first-template
template_path: https://github.com/<YOUR_USERNAME>/<YOUR_TEMPLATE>.git
vars:
username: [[ YOUR_USERNAME ]]
Thanks to the copier, you can leverage tmpl template syntax to create
easily modifiable configuration templates. Just create a copier.yml file next to the dp.yml.tmpl one and configure
the template questions (read more at copier documentation).
Then, run dp init <CONFIG_REPOSITORY_URL> to initialize dp. You can also drop <CONFIG_REPOSITORY_URL> argument,
dp will get initialized with an empty config.
Project creation
You can use dp create <NEW_PROJECT_PATH> to choose one of the templates added before and create the project in the
<NEW_PROJECT_PATH> directory. You can also use dp create <NEW_PROJECT_PATH> <LINK_TO_TEMPLATE_REPOSITORY> to point
directly to a template repository. If <LINK_TO_TEMPLATE_REPOSITORY> proves to be the name of the template defined in
dp's config file, dp create will choose the template by the name instead of trying to download the repository.
dp template-list lists all added templates.
Project update
To update your pipeline project use dp update <PIPELINE_PROJECT-PATH>. It will sync your existing project with updated
template version selected by --vcs-ref option (default HEAD).
Project deployment
dp deploy will sync with your bucket provider. The provider will be chosen automatically based on the remote URL.
Usually, it is worth pointing dp deploy to JSON or YAML file with provider-specific data like access tokens or project
names. E.g., to connect with Google Cloud Storage, one should run:
echo '{"token": "<PATH_TO_YOUR_TOKEN>", "project_name": "<YOUR_PROJECT_NAME>"}' > gs_args.json
dp deploy --dags-path "gs://<YOUR_GS_PATH>" --blob-args gs_args.json
However, in some cases you do not need to do so, e.g. when using gcloud with properly set local credentials. In such
case, you can try to run just the dp deploy --dags-path "gs://<YOUR_GS_PATH>" command. Please refer to
documentation for more information.
When finished, call dp clean to remove compilation related directories.
Variables
You can put a dictionary of variables to be passed to dbt in your config/<ENV>/dbt.yml file, following the convention
presented in the guide at the dbt site.
E.g., if one of the fields of config/<SNOWFLAKE_ENV>/snowflake.yml looks like this:
schema: "{{ var('snowflake_schema') }}"
you should put the following in your config/<SNOWFLAKE_ENV>/dbt.yml file:
vars:
snowflake_schema: EXAMPLE_SCHEMA
and then run your dp run --env <SNOWFLAKE_ENV> (or any similar command).
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file data_pipelines_cli-0.32.0.tar.gz.
File metadata
- Download URL: data_pipelines_cli-0.32.0.tar.gz
- Upload date:
- Size: 59.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20e9c6e2ceb238ae74f1539d3b4eb367ed0e0d807098f9378ae97720e33bbe51
|
|
| MD5 |
b1e07f950f6d6c54f52cf0664e9eee5b
|
|
| BLAKE2b-256 |
f22df8a6f0d60f8caa0b40afb185f8f22f9804b5840fc04bcd3b586b27db7f5a
|