Skip to main content

A configuration-driven framework for building Dagster pipelines

Project description

dagster-odp (open data platform)

PyPI version Python Versions License Documentation Status CI Status Coverage Code Style: Black

dagster-odp simplifies data pipeline development by enabling teams to build Dagster pipelines through configuration rather than code. It reduces the learning curve for Dagster while promoting standardization and faster development of data workflows.

Key Features

  • Configuration-Driven Development: Build data pipelines using YAML/JSON instead of Python code

  • Pre-built Tasks:

    • Google Cloud Operations: Transfer and export data between GCS and BigQuery, with support for GCS file downloads.
    • DuckDB Operations: Load files into DuckDB, execute SQL queries, and export table contents to files.
    • Utility Operations: Execute shell commands with configurable environments and working directories.
  • Extensible Framework: Create custom tasks, sensors, and resources that can be used directly in configuration files

  • Enhanced Modern Data Stack Integration:

    • DLT+: Extended integration with automatic asset creation and granular object handling
    • DBT+: Simplified variable management and external source configuration
    • Soda: Configuration-driven data quality checks
  • Enhanced Asset Management:

    • Standardized materialization metadata
    • Simplified dependency management
    • External source handling
  • Flexible Automation: Configuration-based jobs, schedules, sensors, and partitioning

Quick Example

Here's a simple pipeline that downloads data and loads it into DuckDB:

# odp_config/workflows/pipeline.yaml
assets:
  - asset_key: raw_data
    task_type: url_file_download
    params:
      source_url: https://example.com/data.parquet
      destination_file_path: ./data/raw.parquet

  - asset_key: analyzed_data
    task_type: file_to_duckdb
    depends_on: [raw_data]
    params:
      source_file_uri: "{{raw_data.destination_file_path}}"
      destination_table_id: analyzed_table

Installation

pip install dagster-odp

Getting Started

  1. Create a new project using the Dagster CLI:
dagster project scaffold --name my-odp-project
cd my-odp-project
  1. Create the ODP configuration directories:
mkdir -p odp_config/workflows
  1. Update your definitions.py:
from dagster_odp import build_definitions
defs = build_definitions("odp_config")
  1. Start building pipelines in your workflows directory using YAML/JSON configuration.

Check out our Quickstart Guide for a complete walkthrough.

Who Should Use dagster-odp?

  • Data Teams seeking to standardize pipeline creation
  • Data Analysts/Scientists who want to create pipelines without extensive coding
  • Data Engineers looking to reduce boilerplate code and maintenance overhead
  • Organizations adopting Dagster who want to accelerate development

Documentation

Comprehensive documentation is available, including:

Contributing

Contributions are welcome! Please read our Contributing Guidelines for details on how to submit pull requests, report issues, and contribute to the project.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dagster_odp-0.1.4.tar.gz (39.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dagster_odp-0.1.4-py3-none-any.whl (49.6 kB view details)

Uploaded Python 3

File details

Details for the file dagster_odp-0.1.4.tar.gz.

File metadata

  • Download URL: dagster_odp-0.1.4.tar.gz
  • Upload date:
  • Size: 39.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for dagster_odp-0.1.4.tar.gz
Algorithm Hash digest
SHA256 fb7f1e71c3be027172a2590f4db304320e3c38dd3ff5d63b70f69ce7dd8454cf
MD5 c241c6ab5a838cb3be947e1c52b6b7bc
BLAKE2b-256 c48653c9145ae46880490a5b36927e61b29bebf8ac0fdd4941884bc9d3ceca29

See more details on using hashes here.

Provenance

The following attestation bundles were made for dagster_odp-0.1.4.tar.gz:

Publisher: release.yml on runodp/dagster-odp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dagster_odp-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: dagster_odp-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 49.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for dagster_odp-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ed26482b16addc46553373cf3c732a17622db81c4a6951979904c325caff933e
MD5 2e4b7d23fd2dc115ad1f6d01e14e31fa
BLAKE2b-256 4842e52523079f4b131f5d518845a4b67e3b6e8d907bb03b5ce4ad4faade2984

See more details on using hashes here.

Provenance

The following attestation bundles were made for dagster_odp-0.1.4-py3-none-any.whl:

Publisher: release.yml on runodp/dagster-odp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page