Invisible asset orchestrator
Project description
barca
The invisible asset orchestrator.
Rust plans it. Python runs it. You just write functions.
Barca is an asset orchestrator that adds zero perceptible overhead to your Python pipelines. A compiled Rust binary handles parsing, DAG construction, and execution planning. Python does what it's best at: running your code.
# pipeline.py
from barca import asset
@asset()
def raw_data() -> list[dict]:
return [{"x": 1}, {"x": 2}, {"x": 3}]
@asset(inputs={"data": raw_data})
def summary(data: list[dict]) -> dict:
return {"count": len(data), "total": sum(d["x"] for d in data)}
$ barca get pipeline.py
{"elapsed_seconds":0.27,"final_output":{"count":3,"total":6},"phases":1,"run_id":"...","steps_executed":2}
No config files. No YAML. No daemon. Just functions and a fast binary.
Install
Barca is designed for use with uv:
uv add barca
This gives you:
- The
barcaCLI binary (compiled Rust) - Python API:
barca.get(),barca.plan() - Decorator stubs for
@asset,@sensor,@task(IDE autocomplete + type checking)
For optional parquet (DataFrame) support:
uv add 'barca[parquet]'
All in one wheel, built with maturin. Requires Python >= 3.12.
From source
git clone https://github.com/ExSidius/barca.git
cd barca
uv sync
cargo build --release
maturin develop --release # installs into current .venv
Quick start
# assets.py
from barca import asset
@asset()
def hello() -> dict:
return {"message": "Hello from barca!"}
barca get assets.py
That's it. Barca parses your Python source with ruff's AST parser (no import, pure static analysis), builds a dependency graph, generates a phased execution plan, spawns Python workers, and persists results to a local SQLite database -- all in under 40ms for a trivial asset.
How it works
┌─────────────────────────────────────┐
│ barca get pipeline.py │
└──────────────┬──────────────────────┘
│
┌──────────────▼──────────────────────┐
│ Rust binary (barca) │
│ │
│ 1. Parse Python source (ruff AST) │
│ 2. Build DAG (petgraph) │
│ 3. Generate execution plan │
│ 4. Initialize DB (.barca/metadata.db)│
│ 5. Spawn Python workers per phase │
│ 6. Collect outputs, persist to DB │
└──────────────┬──────────────────────┘
│
┌──────────────▼──────────────────────┐
│ Python worker (per phase) │
│ │
│ - Loads modules via importlib │
│ - Executes steps in tier order │
│ - LRU cache for in-process results │
│ - Emits JSON lines to stdout │
└─────────────────────────────────────┘
Key design decisions:
- Static analysis only -- Rust never imports your Python code. It parses source text and extracts decorator metadata from the AST.
- Phased execution -- The planner decomposes the DAG into sequential phases. Within each phase, independent streams run in parallel workers.
- No framework lock-in -- Decorators are identity functions. Your code runs standalone without barca installed.
- Single binary -- One
pip installgives you everything. No JVM, no Docker, no scheduler service.
Decorators
from barca import asset, sensor, task, sink, unsafe
from barca import Always, Manual, Schedule
from barca import partitions, partitions_from, collect, asset_ref
@asset
Cached computation node. The workhorse.
@asset()
def prices() -> dict:
return {"AAPL": 150, "MSFT": 380}
@asset(inputs={"data": prices})
def report(data: dict) -> str:
return f"Tracked {len(data)} tickers"
@sensor
Observes external state. Returns (update_detected, output).
@sensor()
def inbox_files() -> tuple[bool, list[str]]:
files = list(Path("inbox").glob("*.csv"))
return bool(files), [str(f) for f in files]
@task
Workflow-management step — deploys, notifications, migrations, cache warming.
Always re-runs (never cached). May appear anywhere in the graph and may depend
on assets, sensors, or other tasks, but must not be an input to an asset or
sensor (that would poison caching). Run a task with barca run <task>.
# A task consuming an upstream asset (asset → task).
@task(inputs={"report": report})
def publish(report: str) -> None:
print(f"Publishing: {report}")
@sink
Declares a sink output target (stacks on @asset; file writing coming soon).
@asset()
@sink("output/data.json", serializer="json")
def my_data() -> dict:
return {"rows": 42}
Freshness markers
| Marker | Behavior |
|---|---|
Always |
Auto-materializes whenever stale (default for @asset and @task) |
Manual |
Only runs on explicit refresh |
Schedule("0 5 * * *") |
Cron expression |
Partitions
Fan a single asset definition into N independent materializations:
@asset(partitions={"ticker": partitions(["AAPL", "MSFT", "GOOG"])})
def prices(ticker: str) -> dict:
return {"ticker": ticker, "price": get_price(ticker)}
| Function | Purpose |
|---|---|
partitions(values) |
Static list of partition keys |
partitions_from(source) |
Derive partitions from upstream asset |
collect(asset_fn) |
Aggregate all partitions of an upstream |
asset_ref(ref_string) |
Canonical asset reference |
CLI
barca get [target] <file.py> [file.py ...] Get asset(s) — cache-aware
barca plan <file.py> [file.py ...] Emit execution plan as JSON
barca list <file.py> [file.py ...] List all definitions with deps
barca history [--limit N] Show recent run history
barca stats <target> <file.py> ... Show timing/cache stats for an asset
barca serve [file.py ...] [--port N] Run the HTTP API server
barca --help Show help
Shorthand: barca pipeline.py works as barca get pipeline.py (all assets).
Server
barca serve starts a long-running HTTP server that exposes the orchestrator as a
JSON API — for triggering runs programmatically, polling status, and (in the future)
a web UI. It binds to 127.0.0.1 by default (local only, no auth).
barca serve pipeline.py --port 8274 # default port 8274
barca serve pipeline.py --watch # dev mode: re-parse DAG on file change
Runs are async: POST returns a run_id immediately, then you poll /status/{run_id}.
curl localhost:8274/health # {"status":"ok","version":"0.2.1"}
curl localhost:8274/assets # list assets + deps
curl localhost:8274/plan # execution plan JSON
curl -XPOST localhost:8274/run # → {"run_id":"…"}; poll /status/<id>
curl -XPOST localhost:8274/get/summary # run a single target
curl localhost:8274/status/<run_id> # poll run status + result
See docs/server-api.md for the full endpoint reference.
Python API
import barca
# Get all assets in a file (returns the last asset's value)
value = barca.get("pipeline.py")
print(value) # {"count": 3, "total": 6}
# Get a specific asset's value (cache-aware)
value = barca.get("summary", "pipeline.py")
print(value) # {"count": 3, "total": 6}
# Inspect the execution plan
plan = barca.plan("pipeline.py")
print(plan["total_steps"]) # 2
All output formats work transparently: dicts, lists, sets, DataFrames, and arbitrary Python objects are serialized as JSON, pickle, or parquet and deserialized automatically.
barca plan -- inspect without running
$ barca plan pipeline.py
{
"total_steps": 2,
"phases": [
{
"reason": "Initial",
"streams": [
{
"stream_id": "p0-w0",
"steps": ["pipeline.py:raw_data", "pipeline.py:summary"]
}
]
}
]
}
barca get -- execute and get results
Parses source, builds DAG, spawns workers, collects outputs, persists to .barca/metadata.db. With a target, only the target's subgraph runs. Without a target, all assets run.
Output is a JSON summary:
{
"run_id": "...",
"elapsed_seconds": 0.27,
"steps_executed": 2,
"phases": 1,
"final_output": {"count": 3, "total": 6}
}
Use --no-cache to skip cache lookups and execute everything fresh.
Diagnostics go to stderr:
[barca] 2/2 steps done in 0.0s
Benchmarks
All benchmarks measured with hyperfine (3 warmup runs, 10 measured runs) on the same machine. Barca is compared against Dagster and Prefect running equivalent pipelines.
Trivial (1 asset, zero work)
Measures pure framework overhead -- how long it takes to do nothing.
| Framework | Mean | Relative |
|---|---|---|
| barca | 38.0 ms | 1.00x |
| dagster | 538.1 ms | 14.2x |
| prefect | 3977.7 ms | 104.7x |
Barca's total overhead (parse + plan + spawn + persist) is 38ms. Dagster needs ~0.5s. Prefect needs ~4s.
Benchmark suite
The benchmarks/ directory contains 12 scenarios covering a range of DAG topologies and workloads:
| Benchmark | Assets | Topology | What it tests |
|---|---|---|---|
trivial |
1 | single node | Pure framework overhead |
chain_100 |
100 | linear chain | Sequential dependency resolution |
fan_out_500 |
500 | flat (independent) | Wide parallelism, process spawning |
fan_out_500_50ms |
500 | flat + 50ms sleep | Parallelism under I/O latency |
deep_diamond |
18 | diamond (5-wide, 6-deep) | Fan-out/fan-in patterns |
wide_layers |
varies | parallel layers | Tier-based parallel execution |
large_payloads |
varies | varied | JSON serialization overhead |
map_reduce |
varies | map-reduce | Scatter-gather pattern |
mixed_io_cpu |
varies | varied | Mixed I/O and CPU workloads |
multi_file_discovery |
varies | multi-file | Cross-file asset discovery |
iris_pipeline |
varies | diamond | ML pipeline (iris dataset) |
spaceflights |
10 | diamond (3-wide, 6-deep) | Full ML pipeline (Kedro-style) |
Run any benchmark:
cd benchmarks/trivial
./bench.sh 10 # 10 measured runs
Each benchmark includes equivalent Dagster and Prefect implementations for apples-to-apples comparison.
Architecture
Cargo.toml Rust workspace root
crates/
barca-core/ Engine: parser, DAG, planner, dispatch, DB, cache
barca-cli/ Thin CLI shell (clap → barca-core)
python/barca/
__init__.py Decorator stubs + API exports
api.py Python API (get/plan via subprocess)
_worker.py Execution worker (invoked by Rust binary)
_artifacts.py Artifact serialization (json/pickle/parquet)
py.typed PEP 561 marker
pyproject.toml Maturin build config
Tech stack
| Layer | Technology |
|---|---|
| Parser | ruff Python AST (static, no import) |
| DAG | petgraph |
| Database | Turso/libSQL (local SQLite) |
| Serialization | serde + serde_json |
| Hashing | SHA-256 (content-addressed artifacts) |
| Build | maturin (Rust binary + Python stubs in one wheel) |
| Python runtime | Python >= 3.12 |
Node kinds
| Kind | Decorator | Cached | Can be input to |
|---|---|---|---|
| asset | @asset() |
Yes | assets, sensors, tasks |
| sensor | @sensor() |
No | assets, sensors, tasks |
| task | @task() |
No | tasks only (not assets/sensors) |
Development
git clone https://github.com/recursia-io/barca.git
cd barca
# Build
cargo build --release
maturin develop --release
# Test
cargo test
# Run
barca get examples/basic_app/example_project/assets.py
barca plan examples/basic_app/example_project/assets.py
Project status
Barca is in active development. The core pipeline (parse -> DAG -> plan -> execute -> persist) is working and benchmarked. See the guide for a walkthrough.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file barca-0.2.1.tar.gz.
File metadata
- Download URL: barca-0.2.1.tar.gz
- Upload date:
- Size: 128.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c8558896d639c24157895a5a75bfc136e463f715252729888977c1ca0edf4ba
|
|
| MD5 |
76ec778f58c0440bb3a68ef99b428061
|
|
| BLAKE2b-256 |
1d597c11a1a2d158f73df4d0f1483f1c8744899975d5a5e959f8f44f518a617a
|
File details
Details for the file barca-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: barca-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 8.0 MB
- Tags: Python 3, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06b5600e42715718d4ccbada01c2d3f71241791cdbd80c6e42cbc161fde0a889
|
|
| MD5 |
1ea9566b3b5c692066fe3850416603db
|
|
| BLAKE2b-256 |
366a5e525180e092c03bd1e0f18e16dfa28e896a4f9a3ca75a27bf9b04853471
|
File details
Details for the file barca-0.2.1-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: barca-0.2.1-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 7.2 MB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
979e5783eba31f93c1538e4134acf5a2b4f6d60f6b1cf2e5a5052c013d210daf
|
|
| MD5 |
16ff69e31d942ae72204668aa65a88cb
|
|
| BLAKE2b-256 |
f5227306bca10b438168e0c3702646a871f7e242053170ebd570a7e058c3f14e
|
File details
Details for the file barca-0.2.1-py3-none-macosx_10_12_x86_64.whl.
File metadata
- Download URL: barca-0.2.1-py3-none-macosx_10_12_x86_64.whl
- Upload date:
- Size: 7.7 MB
- Tags: Python 3, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1be462648b301a687830c522f0298073ff6136dc036ea267b04f76b1b2ef0db
|
|
| MD5 |
90b17bb38391c5b2c6288df53a169efe
|
|
| BLAKE2b-256 |
0ab0fdffdbc7a8c14ea3e73aafed980425c6c9ee2dc15685a58c8d8a3424c62a
|