Skip to main content

Invisible asset orchestrator

Project description

barca

The invisible asset orchestrator.
Rust plans it. Python runs it. You just write functions.

CI PyPI Python Rust License


Barca is an asset orchestrator that adds zero perceptible overhead to your Python pipelines. A compiled Rust binary handles parsing, DAG construction, and execution planning. Python does what it's best at: running your code.

# pipeline.py
from barca import asset

@asset()
def raw_data() -> list[dict]:
    return [{"x": 1}, {"x": 2}, {"x": 3}]

@asset(inputs={"data": raw_data})
def summary(data: list[dict]) -> dict:
    return {"count": len(data), "total": sum(d["x"] for d in data)}
$ barca get pipeline.py
{"elapsed_seconds":0.27,"final_output":{"count":3,"total":6},"phases":1,"run_id":"...","steps_executed":2}

No config files. No YAML. No daemon. Just functions and a fast binary.

Install

Barca is designed for use with uv:

uv add barca

This gives you:

  • The barca CLI binary (compiled Rust)
  • Python API: barca.get(), barca.plan()
  • Decorator stubs for @asset, @sensor, @task (IDE autocomplete + type checking)

For optional parquet (DataFrame) support:

uv add 'barca[parquet]'

All in one wheel, built with maturin. Requires Python >= 3.12.

From source

git clone https://github.com/ExSidius/barca.git
cd barca
uv sync
cargo build --release
maturin develop --release    # installs into current .venv

Quick start

# assets.py
from barca import asset

@asset()
def hello() -> dict:
    return {"message": "Hello from barca!"}
barca get assets.py

That's it. Barca parses your Python source with ruff's AST parser (no import, pure static analysis), builds a dependency graph, generates a phased execution plan, spawns Python workers, and persists results to a local SQLite database -- all in under 40ms for a trivial asset.

How it works

                    ┌─────────────────────────────────────┐
                    │          barca get pipeline.py       │
                    └──────────────┬──────────────────────┘
                                   │
                    ┌──────────────▼──────────────────────┐
                    │         Rust binary (barca)          │
                    │                                      │
                    │  1. Parse Python source (ruff AST)   │
                    │  2. Build DAG (petgraph)              │
                    │  3. Generate execution plan           │
                    │  4. Initialize DB (.barca/metadata.db)│
                    │  5. Spawn Python workers per phase    │
                    │  6. Collect outputs, persist to DB    │
                    └──────────────┬──────────────────────┘
                                   │
                    ┌──────────────▼──────────────────────┐
                    │      Python worker (per phase)       │
                    │                                      │
                    │  - Loads modules via importlib        │
                    │  - Executes steps in tier order       │
                    │  - LRU cache for in-process results   │
                    │  - Emits JSON lines to stdout         │
                    └─────────────────────────────────────┘

Key design decisions:

  • Static analysis only -- Rust never imports your Python code. It parses source text and extracts decorator metadata from the AST.
  • Phased execution -- The planner decomposes the DAG into sequential phases. Within each phase, independent streams run in parallel workers.
  • No framework lock-in -- Decorators are identity functions. Your code runs standalone without barca installed.
  • Single binary -- One pip install gives you everything. No JVM, no Docker, no scheduler service.

Decorators

from barca import asset, sensor, task, sink, unsafe
from barca import Always, Manual, Schedule
from barca import partitions, partitions_from, collect, asset_ref

@asset

Cached computation node. The workhorse.

@asset()
def prices() -> dict:
    return {"AAPL": 150, "MSFT": 380}

@asset(inputs={"data": prices})
def report(data: dict) -> str:
    return f"Tracked {len(data)} tickers"

@sensor

Observes external state. Returns (update_detected, output).

@sensor()
def inbox_files() -> tuple[bool, list[str]]:
    files = list(Path("inbox").glob("*.csv"))
    return bool(files), [str(f) for f in files]

@task

Workflow-management step — deploys, notifications, migrations, cache warming. Always re-runs (never cached). May appear anywhere in the graph and may depend on assets, sensors, or other tasks, but must not be an input to an asset or sensor (that would poison caching). Run a task with barca run <task>.

# A task consuming an upstream asset (asset → task).
@task(inputs={"report": report})
def publish(report: str) -> None:
    print(f"Publishing: {report}")

@sink

Declares a sink output target (stacks on @asset; file writing coming soon).

@asset()
@sink("output/data.json", serializer="json")
def my_data() -> dict:
    return {"rows": 42}

Freshness markers

Marker Behavior
Always Auto-materializes whenever stale (default for @asset and @task)
Manual Only runs on explicit refresh
Schedule("0 5 * * *") Cron expression

Partitions

Fan a single asset definition into N independent materializations:

@asset(partitions={"ticker": partitions(["AAPL", "MSFT", "GOOG"])})
def prices(ticker: str) -> dict:
    return {"ticker": ticker, "price": get_price(ticker)}
Function Purpose
partitions(values) Static list of partition keys
partitions_from(source) Derive partitions from upstream asset
collect(asset_fn) Aggregate all partitions of an upstream
asset_ref(ref_string) Canonical asset reference

CLI

barca get [target] <file.py> [file.py ...] Get asset(s) — cache-aware
barca plan <file.py> [file.py ...]         Emit execution plan as JSON
barca list <file.py> [file.py ...]         List all definitions with deps
barca history [--limit N]                    Show recent run history
barca stats <target> <file.py> ...         Show timing/cache stats for an asset
barca serve [file.py ...] [--port N]       Run the HTTP API server
barca --help                               Show help

Shorthand: barca pipeline.py works as barca get pipeline.py (all assets).

Server

barca serve starts a long-running HTTP server that exposes the orchestrator as a JSON API — for triggering runs programmatically, polling status, and (in the future) a web UI. It binds to 127.0.0.1 by default (local only, no auth).

barca serve pipeline.py --port 8274      # default port 8274
barca serve pipeline.py --watch          # dev mode: re-parse DAG on file change

Runs are async: POST returns a run_id immediately, then you poll /status/{run_id}.

curl localhost:8274/health                       # {"status":"ok","version":"0.2.1"}
curl localhost:8274/assets                       # list assets + deps
curl localhost:8274/plan                          # execution plan JSON
curl -XPOST localhost:8274/run                    # → {"run_id":"…"}; poll /status/<id>
curl -XPOST localhost:8274/get/summary            # run a single target
curl localhost:8274/status/<run_id>               # poll run status + result

See docs/server-api.md for the full endpoint reference.

Python API

import barca

# Get all assets in a file (returns the last asset's value)
value = barca.get("pipeline.py")
print(value)  # {"count": 3, "total": 6}

# Get a specific asset's value (cache-aware)
value = barca.get("summary", "pipeline.py")
print(value)  # {"count": 3, "total": 6}

# Inspect the execution plan
plan = barca.plan("pipeline.py")
print(plan["total_steps"])  # 2

All output formats work transparently: dicts, lists, sets, DataFrames, and arbitrary Python objects are serialized as JSON, pickle, or parquet and deserialized automatically.

barca plan -- inspect without running

$ barca plan pipeline.py
{
  "total_steps": 2,
  "phases": [
    {
      "reason": "Initial",
      "streams": [
        {
          "stream_id": "p0-w0",
          "steps": ["pipeline.py:raw_data", "pipeline.py:summary"]
        }
      ]
    }
  ]
}

barca get -- execute and get results

Parses source, builds DAG, spawns workers, collects outputs, persists to .barca/metadata.db. With a target, only the target's subgraph runs. Without a target, all assets run.

Output is a JSON summary:

{
  "run_id": "...",
  "elapsed_seconds": 0.27,
  "steps_executed": 2,
  "phases": 1,
  "final_output": {"count": 3, "total": 6}
}

Use --no-cache to skip cache lookups and execute everything fresh.

Diagnostics go to stderr:

[barca] 2/2 steps done in 0.0s

Benchmarks

All benchmarks measured with hyperfine (3 warmup runs, 10 measured runs) on the same machine. Barca is compared against Dagster and Prefect running equivalent pipelines.

Trivial (1 asset, zero work)

Measures pure framework overhead -- how long it takes to do nothing.

Framework Mean Relative
barca 38.0 ms 1.00x
dagster 538.1 ms 14.2x
prefect 3977.7 ms 104.7x

Barca's total overhead (parse + plan + spawn + persist) is 38ms. Dagster needs ~0.5s. Prefect needs ~4s.

Benchmark suite

The benchmarks/ directory contains 12 scenarios covering a range of DAG topologies and workloads:

Benchmark Assets Topology What it tests
trivial 1 single node Pure framework overhead
chain_100 100 linear chain Sequential dependency resolution
fan_out_500 500 flat (independent) Wide parallelism, process spawning
fan_out_500_50ms 500 flat + 50ms sleep Parallelism under I/O latency
deep_diamond 18 diamond (5-wide, 6-deep) Fan-out/fan-in patterns
wide_layers varies parallel layers Tier-based parallel execution
large_payloads varies varied JSON serialization overhead
map_reduce varies map-reduce Scatter-gather pattern
mixed_io_cpu varies varied Mixed I/O and CPU workloads
multi_file_discovery varies multi-file Cross-file asset discovery
iris_pipeline varies diamond ML pipeline (iris dataset)
spaceflights 10 diamond (3-wide, 6-deep) Full ML pipeline (Kedro-style)

Run any benchmark:

cd benchmarks/trivial
./bench.sh 10    # 10 measured runs

Each benchmark includes equivalent Dagster and Prefect implementations for apples-to-apples comparison.

Architecture

Cargo.toml                  Rust workspace root
crates/
  barca-core/               Engine: parser, DAG, planner, dispatch, DB, cache
  barca-cli/                Thin CLI shell (clap → barca-core)
python/barca/
  __init__.py               Decorator stubs + API exports
  api.py                    Python API (get/plan via subprocess)
  _worker.py                Execution worker (invoked by Rust binary)
  _artifacts.py             Artifact serialization (json/pickle/parquet)
  py.typed                  PEP 561 marker
pyproject.toml              Maturin build config

Tech stack

Layer Technology
Parser ruff Python AST (static, no import)
DAG petgraph
Database Turso/libSQL (local SQLite)
Serialization serde + serde_json
Hashing SHA-256 (content-addressed artifacts)
Build maturin (Rust binary + Python stubs in one wheel)
Python runtime Python >= 3.12

Node kinds

Kind Decorator Cached Can be input to
asset @asset() Yes assets, sensors, tasks
sensor @sensor() No assets, sensors, tasks
task @task() No tasks only (not assets/sensors)

Development

git clone https://github.com/recursia-io/barca.git
cd barca

# Build
cargo build --release
maturin develop --release

# Test
cargo test

# Run
barca get examples/basic_app/example_project/assets.py
barca plan examples/basic_app/example_project/assets.py

Project status

Barca is in active development. The core pipeline (parse -> DAG -> plan -> execute -> persist) is working and benchmarked. See the guide for a walkthrough.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

barca-0.2.1.tar.gz (128.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

barca-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.0 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

barca-0.2.1-py3-none-macosx_11_0_arm64.whl (7.2 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

barca-0.2.1-py3-none-macosx_10_12_x86_64.whl (7.7 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file barca-0.2.1.tar.gz.

File metadata

  • Download URL: barca-0.2.1.tar.gz
  • Upload date:
  • Size: 128.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for barca-0.2.1.tar.gz
Algorithm Hash digest
SHA256 0c8558896d639c24157895a5a75bfc136e463f715252729888977c1ca0edf4ba
MD5 76ec778f58c0440bb3a68ef99b428061
BLAKE2b-256 1d597c11a1a2d158f73df4d0f1483f1c8744899975d5a5e959f8f44f518a617a

See more details on using hashes here.

File details

Details for the file barca-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for barca-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 06b5600e42715718d4ccbada01c2d3f71241791cdbd80c6e42cbc161fde0a889
MD5 1ea9566b3b5c692066fe3850416603db
BLAKE2b-256 366a5e525180e092c03bd1e0f18e16dfa28e896a4f9a3ca75a27bf9b04853471

See more details on using hashes here.

File details

Details for the file barca-0.2.1-py3-none-macosx_11_0_arm64.whl.

File metadata

  • Download URL: barca-0.2.1-py3-none-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 7.2 MB
  • Tags: Python 3, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for barca-0.2.1-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 979e5783eba31f93c1538e4134acf5a2b4f6d60f6b1cf2e5a5052c013d210daf
MD5 16ff69e31d942ae72204668aa65a88cb
BLAKE2b-256 f5227306bca10b438168e0c3702646a871f7e242053170ebd570a7e058c3f14e

See more details on using hashes here.

File details

Details for the file barca-0.2.1-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for barca-0.2.1-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f1be462648b301a687830c522f0298073ff6136dc036ea267b04f76b1b2ef0db
MD5 90b17bb38391c5b2c6288df53a169efe
BLAKE2b-256 0ab0fdffdbc7a8c14ea3e73aafed980425c6c9ee2dc15685a58c8d8a3424c62a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page