Skip to main content

Data pipelines in pure Python with incremental compute and data flow visualization

Project description

freshet

A lightweight caching framework for Python data pipelines.

Decorate your pipeline functions with @flow and @source, then instantiate Freshet with your pipeline module. Execution and cache state are scoped to that instance.

Installation

pip install freshet

Quick start

from freshet import Freshet
from my_project import my_pipeline

f = Freshet(my_pipeline)

result = f.tap("final_output")   # computes + caches upstream DAG
result = f.tap("final_output")   # cache hit

Base data sources

Use @source to declare external data sources as the roots of your pipeline. The function body returns a File or Directory descriptor:

from freshet import source, File

@source
def raw_data():
    return File("data/raw.parquet")

File-mode outputs

For flows that produce files rather than Python objects, call flow_output() to allocate a cache path, write to it, and return the File:

import types
from freshet import Freshet, flow, flow_output, File

@flow
def plot_chart() -> File:
    out = flow_output(".png")
    save_plot([1, 2, 3], out.path)
    return out

pipeline = types.ModuleType("my_pipeline")
pipeline.plot_chart = plot_chart

f = Freshet(pipeline, cache_dir=".freshet")
result = f.tap("plot_chart")  # returns a File pointing to the cached file

Auto-bridging

If a flow expects an in-memory type (e.g. pl.DataFrame) but receives a File, freshet auto-loads it based on the file extension:

import polars as pl
from freshet import source, flow, File

@source
def raw_trades():
    return File("data/raw/trades.parquet")

@flow
def cleaned(raw_trades: pl.DataFrame) -> pl.DataFrame:
    # raw_trades is auto-bridged: File → pl.read_parquet
    return raw_trades.filter(pl.col("price") > 0)

DAG introspection

freshet infers a dependency graph from function argument names. If a @flow function takes a parameter named raw_data, and there's a registered artifact called raw_data, freshet records that edge:

artifacts = f.artifacts()  # all registered artifacts
edges = f.edges()          # list of (upstream, downstream) tuples
bases = f.bases()          # only @source artifacts

Custom serializers

By default, Polars DataFrames are serialized as Parquet and everything else uses pickle. You can register your own:

from freshet import Serializer, register_serializer

class MySerializer:
    key = "my_format"
    extension = "bin"

    def can_handle(self, value) -> bool: ...
    def save(self, value, path) -> None: ...
    def load(self, path): ...

register_serializer(MySerializer())

Configuration

f = Freshet(my_pipeline, cache_dir="/path/to/cache")

By default the cache lives at .freshet/ in the current working directory.

Cache management

f.clear_cache("my_function")  # clear one function's cache
f.clear_cache()               # clear everything

Visualization

widget = f.chart()  # anywidget-compatible DAG visualization

Security

freshet uses pickle as a general-purpose serializer fallback. Treat cache directories as trusted input only.

To reduce accidental code execution risk, f.chart() does not unpickle cached artifacts for graph details/preview by default. You can opt in with:

FRESHET_UNSAFE_PICKLE_INSPECT=1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freshet-0.1.0.tar.gz (123.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

freshet-0.1.0-py3-none-any.whl (128.0 kB view details)

Uploaded Python 3

File details

Details for the file freshet-0.1.0.tar.gz.

File metadata

  • Download URL: freshet-0.1.0.tar.gz
  • Upload date:
  • Size: 123.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for freshet-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d0b0f371dda1465d731d36c2da87bfe0923f09dac817e6da5fb9b84e594ea132
MD5 f9c63728c83f10b98ef470ce6a19a5f5
BLAKE2b-256 2d629a29b4f6d39413030991e0eb5cd6e01b43867e5e919b97539c43d397e061

See more details on using hashes here.

File details

Details for the file freshet-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: freshet-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 128.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for freshet-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0a0bfbf9c37fe50031b4c7d7d1066a7c3b4500ea2c251057b6a9f88b86a40ec5
MD5 412a801859c3f6b3b3cc01a8369a2c7a
BLAKE2b-256 55817b9588aee192cef2d6f0b5fd3d22d398cfcb4cc3ac01e2fdce9aead7fc3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page