Data pipelines in pure Python with incremental compute and data flow visualization
Project description
freshet
A lightweight caching framework for Python data pipelines.
Decorate your pipeline functions with @flow and @source, then instantiate Freshet with your pipeline module. Execution and cache state are scoped to that instance.
Installation
pip install freshet
Quick start
from freshet import Freshet
from my_project import my_pipeline
f = Freshet(my_pipeline)
result = f.tap("final_output") # computes + caches upstream DAG
result = f.tap("final_output") # cache hit
Base data sources
Use @source to declare external data sources as the roots of your pipeline. The function body returns a File or Directory descriptor:
from freshet import source, File
@source
def raw_data():
return File("data/raw.parquet")
File-mode outputs
For flows that produce files rather than Python objects, call flow_output() to allocate a cache path, write to it, and return the File:
import types
from freshet import Freshet, flow, flow_output, File
@flow
def plot_chart() -> File:
out = flow_output(".png")
save_plot([1, 2, 3], out.path)
return out
pipeline = types.ModuleType("my_pipeline")
pipeline.plot_chart = plot_chart
f = Freshet(pipeline, cache_dir=".freshet")
result = f.tap("plot_chart") # returns a File pointing to the cached file
Auto-bridging
If a flow expects an in-memory type (e.g. pl.DataFrame) but receives a File, freshet auto-loads it based on the file extension:
import polars as pl
from freshet import source, flow, File
@source
def raw_trades():
return File("data/raw/trades.parquet")
@flow
def cleaned(raw_trades: pl.DataFrame) -> pl.DataFrame:
# raw_trades is auto-bridged: File → pl.read_parquet
return raw_trades.filter(pl.col("price") > 0)
DAG introspection
freshet infers a dependency graph from function argument names. If a @flow function takes a parameter named raw_data, and there's a registered artifact called raw_data, freshet records that edge:
artifacts = f.artifacts() # all registered artifacts
edges = f.edges() # list of (upstream, downstream) tuples
bases = f.bases() # only @source artifacts
Custom serializers
By default, Polars DataFrames are serialized as Parquet and everything else uses pickle. You can register your own:
from freshet import Serializer, register_serializer
class MySerializer:
key = "my_format"
extension = "bin"
def can_handle(self, value) -> bool: ...
def save(self, value, path) -> None: ...
def load(self, path): ...
register_serializer(MySerializer())
Configuration
f = Freshet(my_pipeline, cache_dir="/path/to/cache")
By default the cache lives at .freshet/ in the current working directory.
Cache management
f.clear_cache("my_function") # clear one function's cache
f.clear_cache() # clear everything
Visualization
widget = f.chart() # anywidget-compatible DAG visualization
Security
freshet uses pickle as a general-purpose serializer fallback. Treat cache directories as trusted input only.
To reduce accidental code execution risk, f.chart() does not unpickle cached artifacts for graph details/preview by default. You can opt in with:
FRESHET_UNSAFE_PICKLE_INSPECT=1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file freshet-0.1.0.tar.gz.
File metadata
- Download URL: freshet-0.1.0.tar.gz
- Upload date:
- Size: 123.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0b0f371dda1465d731d36c2da87bfe0923f09dac817e6da5fb9b84e594ea132
|
|
| MD5 |
f9c63728c83f10b98ef470ce6a19a5f5
|
|
| BLAKE2b-256 |
2d629a29b4f6d39413030991e0eb5cd6e01b43867e5e919b97539c43d397e061
|
File details
Details for the file freshet-0.1.0-py3-none-any.whl.
File metadata
- Download URL: freshet-0.1.0-py3-none-any.whl
- Upload date:
- Size: 128.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a0bfbf9c37fe50031b4c7d7d1066a7c3b4500ea2c251057b6a9f88b86a40ec5
|
|
| MD5 |
412a801859c3f6b3b3cc01a8369a2c7a
|
|
| BLAKE2b-256 |
55817b9588aee192cef2d6f0b5fd3d22d398cfcb4cc3ac01e2fdce9aead7fc3e
|