CLI for data platform

These details have not been verified by PyPI

Project links

Homepage

Project description

data-pipelines-cli

CLI for data platform

Documentation

Read the full documentation at https://data-pipelines-cli.readthedocs.io/

Installation

Use the package manager pip to install dp (data-pipelines-cli):

pip install data-pipelines-cli[bigquery,docker,datahub,gcs]

Usage

First, create a repository with a global configuration file that you or your organization will be using. The repository should contain dp.yml.tmpl file looking similar to this:

_templates_suffix: ".tmpl"
_envops:
    autoescape: false
    block_end_string: "%]"
    block_start_string: "[%"
    comment_end_string: "#]"
    comment_start_string: "[#"
    keep_trailing_newline: true
    variable_end_string: "]]"
    variable_start_string: "[["

templates:
  my-first-template:
    template_name: my-first-template
    template_path: https://github.com/<YOUR_USERNAME>/<YOUR_TEMPLATE>.git

vars:
  username: [[ YOUR_USERNAME ]]

Thanks to the copier, you can leverage tmpl template syntax to create easily modifiable configuration templates. Just create a copier.yml file next to the dp.yml.tmpl one and configure the template questions (read more at copier documentation).

Then, run dp init <CONFIG_REPOSITORY_URL> to initialize dp. You can also drop <CONFIG_REPOSITORY_URL> argument, dp will get initialized with an empty config.

Project creation

You can use dp create <NEW_PROJECT_PATH> to choose one of the templates added before and create the project in the <NEW_PROJECT_PATH> directory. You can also use dp create <NEW_PROJECT_PATH> <LINK_TO_TEMPLATE_REPOSITORY> to point directly to a template repository. If <LINK_TO_TEMPLATE_REPOSITORY> proves to be the name of the template defined in dp's config file, dp create will choose the template by the name instead of trying to download the repository.

dp template-list lists all added templates.

Project update

To update your pipeline project use dp update <PIPELINE_PROJECT-PATH>. It will sync your existing project with updated template version selected by --vcs-ref option (default HEAD).

Project deployment

dp deploy will sync with your bucket provider. The provider will be chosen automatically based on the remote URL. Usually, it is worth pointing dp deploy to JSON or YAML file with provider-specific data like access tokens or project names. E.g., to connect with Google Cloud Storage, one should run:

echo '{"token": "<PATH_TO_YOUR_TOKEN>", "project_name": "<YOUR_PROJECT_NAME>"}' > gs_args.json
dp deploy --dags-path "gs://<YOUR_GS_PATH>" --blob-args gs_args.json

However, in some cases you do not need to do so, e.g. when using gcloud with properly set local credentials. In such case, you can try to run just the dp deploy --dags-path "gs://<YOUR_GS_PATH>" command. Please refer to documentation for more information.

When finished, call dp clean to remove compilation related directories.

Variables

You can put a dictionary of variables to be passed to dbt in your config/<ENV>/dbt.yml file, following the convention presented in the guide at the dbt site. E.g., if one of the fields of config/<SNOWFLAKE_ENV>/snowflake.yml looks like this:

schema: "{{ var('snowflake_schema') }}"

you should put the following in your config/<SNOWFLAKE_ENV>/dbt.yml file:

vars:
  snowflake_schema: EXAMPLE_SCHEMA

and then run your dp run --env <SNOWFLAKE_ENV> (or any similar command).

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.30.0

Dec 8, 2023

0.29.0

Dec 8, 2023

0.28.0

Dec 8, 2023

0.27.0

Dec 7, 2023

0.26.0

Aug 25, 2023

0.25.3

Jul 4, 2023

0.25.2

Jun 19, 2023

0.25.1

May 17, 2023

0.25.0

May 15, 2023

0.24.2

Apr 14, 2023

0.24.1

Mar 15, 2023

0.24.0

Dec 16, 2022

0.23.0

Oct 19, 2022

0.22.0

Aug 22, 2022

0.21.0

Jul 19, 2022

0.20.1

Jun 17, 2022

0.20.0

May 4, 2022

0.19.0

Apr 25, 2022

0.18.0

Apr 19, 2022

0.17.0

Apr 11, 2022

0.16.0

Mar 24, 2022

0.15.2

Feb 28, 2022

0.15.1

Feb 28, 2022

0.15.0

Feb 11, 2022

0.14.0

Feb 2, 2022

0.13.0

Feb 1, 2022

0.12.0

Jan 31, 2022

0.11.0

Jan 18, 2022

0.10.0

Jan 12, 2022

0.9.0

Jan 3, 2022

0.8.0

Dec 31, 2021

0.7.0

Dec 29, 2021

0.6.0

Dec 16, 2021

0.5.1

Dec 14, 2021

0.5.0

Dec 14, 2021

0.4.0

Dec 13, 2021

0.3.0

Dec 6, 2021

0.2.0

Dec 3, 2021

0.1.2

Dec 2, 2021

0.1.1

Dec 1, 2021

0.1.0

Dec 1, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_pipelines_cli-0.30.0.tar.gz (44.1 kB view details)

Uploaded Dec 8, 2023 Source

File details

Details for the file data_pipelines_cli-0.30.0.tar.gz.

File metadata

Download URL: data_pipelines_cli-0.30.0.tar.gz
Upload date: Dec 8, 2023
Size: 44.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for data_pipelines_cli-0.30.0.tar.gz
Algorithm	Hash digest
SHA256	`e4ea7780a063aea38b5cf2acfeacdea77590d429982fc4df6bd8b7a2b426dd59`
MD5	`756a26806d82e25917b6c25d06c06b10`
BLAKE2b-256	`cdfbe6e3ad3ed399f6ff52a170d26c453a1b318e1279d41309bebfcec14a74b5`

See more details on using hashes here.

data-pipelines-cli 0.30.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

data-pipelines-cli

Documentation

Installation

Usage

Project creation

Project update

Project deployment

Variables

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes