Skip to main content

ETL Tools for ccflow

Project description

ccflow-etl

Domain-neutral ETL building blocks for ccflow callable models.

Build Status codecov License PyPI

ccflow-etl provides reusable support primitives for ETL-style workflows built as concrete ccflow CallableModel graphs. It keeps generic execution concerns here and leaves workflow-specific behavior to the package or application that owns the workflow.

Install

pip install ccflow-etl

Connector-backed cache and artifact stores are provided by connector packages that own their I/O. Generic checkpointing belongs in ccflow proper.

Package Type Integration
ccflow-s3 generic, storage S3-backed artifact IO and cache
ccflow-db generic, cache database-backed cache store
ccflow-email generic, publisher email publishers for ETL notifications
ccflow-celery generic, evaluator Celery-based evaluator for ETL task execution

Quick Start

ccflow-etl installs shared Hydra entry points for running and explaining configured callables:

cc-etl +context.path=./example-output.json +context.payload.message='hello from ccflow-etl' +context.overwrite=true
cc-etl-explain +context.path=./example-output.json

Most projects provide their own config directory and still use the shared entry point:

cc-etl --config-path ./config --config-name text_stats +context.input_path=./notes.txt +context.output_path=./stats.json

What It Provides

  • Shared CLI entry points: cc-etl and cc-etl-explain.
  • Date expansion: Interval, BaseCalendar, built-in calendars, BackfillContext, and BackfillModel.
  • Generic credential models and a /credentials Hydra registry for package extension.
  • Generic extract task composition through /tasks, /datasets, and /outputs config selections.
  • Handoff metadata: ETLArtifact for typed stage artifacts.
  • Artifact IO contracts: ArtifactExistsModel, ArtifactWriteModel, ArtifactPublishModel, and NoOpArtifactStore for backend-neutral existence checks, writes, publication, and artifact URIs.
  • Task and output composition: ExtractTaskModel, LocalFileOutput, NoOpArtifactStore, and /tasks / /outputs config selections.
  • Format-aware writes and cache handoffs: LocalWriteModel, CachePutModel, CacheGetModel, PayloadCodec, LocalCacheStore, and no-op cache defaults.
  • Retry integration: compatibility exports for ccflow RetryPolicy and RetryModel; use ccflow.evaluators.RetryEvaluator for runtime evaluator retries.
  • Execution policy: ExecutionPolicy for shared max-concurrency hints and rate spacing that evaluators and connector models can consume through the /execution Hydra group.
  • Run reporting: RunSummary for structured counts by status and artifact stage.

Documentation

Package Boundaries

ccflow-etl owns domain-neutral ETL contracts, generic credential shapes, and helpers. It does not own application workflows, provider clients, connector clients, provider-specific credential semantics, dataset inventories, dataset-specific schemas, run reporting evaluators, checkpointing, or domain-specific rules. Durable store implementations should live in connector packages and integrate through generic cache and artifact IO contracts.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ccflow_etl-0.3.0.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ccflow_etl-0.3.0-py3-none-any.whl (41.5 kB view details)

Uploaded Python 3

File details

Details for the file ccflow_etl-0.3.0.tar.gz.

File metadata

  • Download URL: ccflow_etl-0.3.0.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for ccflow_etl-0.3.0.tar.gz
Algorithm Hash digest
SHA256 abc679446c37cb0d6f2557952f645548a4f7148816916050f691123cc666def3
MD5 efe90da34a3948d6734b0612b8203e70
BLAKE2b-256 f35e1eb8ddb7d925cabd54cf08b4c53a7017e83b95b7b09224d25b639a368e92

See more details on using hashes here.

File details

Details for the file ccflow_etl-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: ccflow_etl-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 41.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for ccflow_etl-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e0bf52ea079a5f073481658d6e150d36409e0d0d5857e9df5d114dcd4dc6d8ed
MD5 c3f7482b5ee148ab7b553c5321369522
BLAKE2b-256 27ff30104316aaff2636501229f64c29423b8a68d2c267f32f408874ab82d368

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page