Duckstring is a Python-native data-mesh orchestration framework designed to be as naturally extensible as installing packages.

Project description

Duckstring

Duckstring is a data pipeline framework built around modular, versioned nodes called Ponds. Each Pond specifies its immediate parents (with version), allowing for the formation of a DAG much like one would install packages.

Pond execution is orchestrated within an environment - a Catchment - that controls storage and other global settings. It uses a pull-based system modelled after Kanban, with Outlets (terminal Ponds) sending demand upstream. This allows each Pond to be modified and deployed independently, with any paths in the DAG that are not attached to any Outlet automatically skipped.

Duckstring is built on the philosophy that most data pipelines are not truly "big data" and with good design can execute on a single compute node. It is primarily designed for batch and incremental workloads for tables on the order of tens of millions of rows (e.g. <50M).

The default engine is DuckDB, though this is configurable. Duckstring is however an independent project and is not affiliated with, endorsed by, or maintained by the DuckDB project.

-- Note: As the project is in development, most of the notes below should be read as indended functionality, and most features are not yet implemented.

Core Concepts

Catchment: Control environment - a FastAPI application
Pond: Versioned transformation unit with declared upstream dependencies - the main element of version control
Inlet: Pond with external dependencies and no upstream Ponds
Outlet: Pond with no downstream Ponds (e.g. outputs final data products)
Ripple: Unit operation within a Pond (e.g. a single transformation producing a table)

Installation

pip install duckstring

Quickstart

1) Connect to a Catchment

A Catchment is the execution environment, receiving Ponds and managing runs. It runs either as a local daemon or as a remote server, allowing you to start locally and seamlessly upgrade to a hosted/cloud server if you need to later.

Start a Catchment Server

To run a Catchment locally, run:

duckstring catchment start --name dev --port 5000 --root ~/.duckstring/dev

This will start a server with name 'dev' at port 5000 (the default, if none specified) and store Catchment details at ~/.duckstring/dev (default is ~/.duckstring/{name}). If any of these options are omitted you will be prompted on start.

Connect to a Remote Server

Alternatively, you can connect to a server running a Catchment:

duckstring catchment connect --name dev --path https://path.to.catchment

This will prompt for any necessary auth, and will add the Catchment under the specified name.

Connect to duckstring.com

There are future plans for a dedicated Catchment service at https://duckstring.com. If you're interested, please contact me.

2) Define Pond(s)

Demo Ponds

If you want to see an example sequence of Ponds in action immediately, create three project directories and run one of these commands in each:

duckstring pond demo inlet
duckstring pond demo pond
duckstring pond demo outlet

It's recommended to do this before attempting to make your own so that you can get a feel for the structure.

Custom Pond

Create a project directory and run:

duckstring pond init example_pond

This will create a duckstring pond structure:

root/
|-- src/
|   |-- pond.py
|-- pond.toml
|-- __main__.py
|-- .gitignore
|-- README.md

Here pond.py contains the code for a single Ripple operation (currently blank), and pond.toml specifies the Pond name "example_pond" and version (defaulting to "0.1.0").

3) Deploy to Catchment

From Local

From a Pond's project root run:

duckstring deploy dev

This will read the pond name, version and type (Inlet, Pond, Outlet) from pond.toml and deploy the project contents to the Catchment specified by name (here dev).

Alternatively, you can import the Pond using the Catchment UI.

From Git

If you are using git with a remote, you can deploy with:

duckstring deploy dev --git {branch|commit|tag}

This will use the current branch/commit/tag to define the Pond. Upon each execution the Catchment will clone the repository and run it.

This can also be specified using the Catchment UI.

3) Execute

Ponds are executed by sending a Demand signal from an Outlet. This propagates backwards through the DAG until it reaches each upstream Inlet, causing them to execute, with children beginning upon completion of all of their parents.

These examples will use the Pond outlet, version 1.0.0, as the execution reference. All examples may also be alternatively executed using the Catchment UI.

Pulse

To initiate a single run:

duckstring pulse dev outlet

The pulse mode emits a Demand signal from outlet, and when it begins execution, sends a Stop signal. This causes it to execute exactly once.

This will automatically run against the maximum version available for that Pond. To use a specific version:

duckstring pulse dev outlet --version 1

Wave

To continuously run:

duckstring wave dev outlet

The wave mode emits a Demand signal from outlet, and when it begins execution, sends another Demand signal. This causes it to execute continuously, as frequently as the DAG allows (i.e. at a period equal to the execution time of the slowest Ripple in any Pond).

Tide

To run at a scheduled frequency:

duckstring tide dev outlet 15 2 * * * --local

This would run at 2:15am every day local time, using cron syntax. Omitting the --local flag defaults to UTC.

4) Monitor

To print out a summary of current processes in the Catchment:

duckstring status dev

This will print to CLI a summary for each Pond that is either currently executing or has Demand.

To include all Ponds:

duckstring status dev --all

5) Retrieve Data

Get

The simplest way to retrieve data is to load by the Ripple name. This returns the entire contents of the directory, and does not require that the data be in a tabular format (e.g. SQL-compatible).

duckstring get dev outlet daily

This writes a directory ./ponds/outlet/daily with the 'daily' Ripple's contents. You may also override the default location:

duckstring get dev outlet daily --path ./daily_output

SQL Query

If the target is an SQL-compatible table (e.g. DuckDB or Parquet), an SQL statement may be sent directly, outputting the result to the command line:

duckstring query dev outlet --sql "SELECT * FROM daily WHERE id=1;"

Alternatively, include a file path:

duckstring query dev outlet --sql @path/to/query.sql

Omitting the --sql statement queries with a default SELECT * LIMIT 10 on the specified table:

duckstring query dev outlet daily

Write to file

To output to a file, include a flag for the file format, followed by the file name:

--csv: Comma-separated values --json: JSON records --parquet: Parquet file

This writes by default to ./ponds/outlet/daily/{filename}. To overrite the default location you may use the --path flag.

For example, to execute an sql statement from file query.sql and write the result to CSV at the current directory:

duckstring query dev outlet --sql @query.sql --csv daily.csv --path .

Project details

Release history Release notifications | RSS feed

This version

0.1.0

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckstring-0.1.0.tar.gz (220.2 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

duckstring-0.1.0-py3-none-any.whl (235.7 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file duckstring-0.1.0.tar.gz.

File metadata

Download URL: duckstring-0.1.0.tar.gz
Upload date: May 20, 2026
Size: 220.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for duckstring-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f245c307e69fea730d95d39c2587218e5f23e285c5caba7387ea2a2b17cbf977`
MD5	`379c1cf32a96f0f2bc3f5de2e3eb0fb4`
BLAKE2b-256	`2bac07fdf3c50d2498f94ebb78f8056bc09deec74ba47fd02c4c76bae549803e`

See more details on using hashes here.

File details

Details for the file duckstring-0.1.0-py3-none-any.whl.

File metadata

Download URL: duckstring-0.1.0-py3-none-any.whl
Upload date: May 20, 2026
Size: 235.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for duckstring-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b8257d0576a3e5dfb90b0ba7d6bf3183889d315670ddf7aac3a7052c891207ab`
MD5	`5918930e5a00435039a8cfff479b71ba`
BLAKE2b-256	`f210b0b1a1ef3efa3c3a1c01a40e2d59f1e6a2d65fbdedc874012bf7dbef5cc7`

See more details on using hashes here.

duckstring 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Project description

Duckstring

Core Concepts

Installation

Quickstart

1) Connect to a Catchment

Start a Catchment Server

Connect to a Remote Server

Connect to duckstring.com

2) Define Pond(s)

Demo Ponds

Custom Pond

3) Deploy to Catchment

From Local

From Git

3) Execute

Pulse

Wave

Tide

4) Monitor

5) Retrieve Data

Get

SQL Query

Write to file

Further Reading

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes