A composable pipeline engine for structured data enrichment using LLMs
Project description
Accrue
The enrichment pipeline engine.
Define a pipeline. Point it at your data. Get structured results. Accrue is a Python library for enriching datasets with LLMs. Compose multi-step pipelines, run them across hundreds to tens of thousands of rows, and get validated, structured output back -- with caching, retries, and parallel execution handled for you.
No platform. No markup. Just a pipeline you can version-control, iterate on, and reason about.
from accrue import Pipeline, LLMStep
pipeline = Pipeline([
LLMStep("analyze", fields={
"market_size": "Estimate total addressable market in billions USD",
"competition": {
"prompt": "Rate competitive intensity with key competitors",
"enum": ["Low", "Medium", "High"],
"examples": ["High - Competes with AWS, Google Cloud"],
},
"growth_potential": {
"prompt": "Assess 5-year growth trajectory",
"type": "String",
"format": "X% CAGR - reasoning",
},
})
])
result = pipeline.run(df) # DataFrame in, DataFrame out
print(result.data.head())
print(f"Tokens used: {result.cost.total_tokens:,}")
Install
Requires Python 3.10+.
pip install accrue
Set your API key:
export OPENAI_API_KEY=sk-...
That's it. OpenAI is the default provider (zero config, structured outputs auto-enabled). Anthropic and Google are optional:
pip install accrue[anthropic] # Claude
pip install accrue[google] # Gemini
Claude Code Skill
If you use Claude Code, Accrue ships with a built-in /accrue skill that guides you through building pipelines interactively. It designs fields, picks models, estimates costs, and writes your script -- you just review and run.
> /accrue
> I have 500 companies in accounts.csv, I need to qualify them for ICP fit
The skill walks you through field design, model selection, pipeline architecture, and configuration before writing a production-ready script. See Using the Claude Code Skill for details.
Why Accrue
You have a spreadsheet of companies, leads, or entities. You need structured fields added to every row -- classifications, summaries, scores, extracted data. You could write a for loop and call the OpenAI API, but then you're building retry logic, rate limiting, caching, progress tracking, and crash recovery. You could use Clay, but you'd pay $500/month for something you can't version-control.
Accrue is the pipeline between a single API call and a full platform:
| Raw API calls | Accrue | Clay | |
|---|---|---|---|
| Scope | One call at a time | Pipeline of steps across rows | Full SaaS platform |
| Multi-step | Manual orchestration | DAG with parallel execution | Sequential drag-and-drop |
| Caching | Build it yourself | SQLite, auto-invalidates on prompt change | Platform-managed |
| Crash recovery | Start over | Checkpoint + row-level cache resume | Platform-managed |
| Iterate on prompts | Re-run everything | Only re-process changed steps/rows | Re-run everything |
| Cost | API costs | API costs | $$$$/month + API costs |
| Version control | Yes | Yes | No |
Quick Example
Chain steps together with depends_on. Use web_search() to ground LLM answers in live data:
from accrue import Pipeline, FunctionStep, LLMStep, web_search
pipeline = Pipeline([
FunctionStep("research",
fn=web_search("Research {company}: market position, competitors, recent news"),
fields=["__web_context", "sources"],
),
LLMStep("analyze",
fields={
"market_size": "Estimate TAM in billions USD",
"competitors": {"prompt": "List top 3 competitors", "type": "List[String]"},
"investment_thesis": "One-paragraph investment thesis",
},
depends_on=["research"],
),
])
result = pipeline.run(companies_df)
Features
-
Multi-step pipelines -- Chain LLM steps and function steps into a DAG with automatic dependency resolution and parallel execution. Quickstart
-
Provider-agnostic -- OpenAI, Anthropic (with automatic prompt caching), and Google ship as adapters. Any OpenAI-compatible API works via
base_url. Custom providers implement one async method. Providers guide -
7-key field specs -- Control LLM output with
prompt,type,format,enum,examples,bad_examples, anddefault. Drives structured outputs and Pydantic validation automatically. Field specs guide -
Caching and checkpointing -- SQLite input-hash cache auto-invalidates on prompt changes. Checkpointing saves after each step for crash recovery. Caching guide
-
Batch API --
LLMStep(batch=True)for 50% cost savings via OpenAI and Anthropic batch endpoints. Cache-aware, auto-chunking, realtime fallback on failures. Batch guide -
Web search and grounding --
web_search()factory for search-then-analyze pipelines, orgrounding=Truefor native provider web search with normalized citations. Web search guide -
Conditional steps --
run_if/skip_ifpredicates for per-row branching. Skipped rows get defaults, never hit the API. Conditional steps guide -
Hooks -- Typed lifecycle events for observability. Sync and async callables, never crash the pipeline. Hooks guide
-
provider_kwargs-- Escape hatch for provider-specific features (extended thinking, effort control, etc.) without waiting for first-class support.
Sweet Spot
Accrue is built for 100 to 50,000 rows -- too many for manual work or single-call tools, too few to justify big data infrastructure.
| Rows | Time (3 steps, 10 workers) | Cost (gpt-4.1-mini) |
|---|---|---|
| 100 | ~30s | ~$0.20 |
| 1,000 | ~5 min | ~$2 |
| 10,000 | ~50 min | ~$20 |
| 50,000 | ~50 min (50 workers) | ~$100 |
With batch=True, halve the API costs. Cached steps re-run in seconds.
Documentation
| Section | Description |
|---|---|
| Getting Started | Installation, first pipeline, core concepts |
| Claude Code Skill | Interactive pipeline builder via /accrue |
| Guides | Field specs, providers, caching, batch API, grounding, hooks, errors, configuration |
| Cookbook | End-to-end examples: company enrichment, lead scoring, content analysis, batch processing |
| API Reference | Complete reference for every public export |
Contributing
git clone https://github.com/matt-house-e/accrue.git
cd accrue
pip install -e ".[dev]"
pytest
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file accrue-1.2.0.tar.gz.
File metadata
- Download URL: accrue-1.2.0.tar.gz
- Upload date:
- Size: 70.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da407d4285930eb7ec05187a08ab5b6d586b3de62bb690bca7b1b86234a0a4b7
|
|
| MD5 |
1543c9c7c2d040057c88fcd1198d8022
|
|
| BLAKE2b-256 |
8e1621f94fd1ecb35de02c4f9114bb767bb81d03cef9021e0c97825614a62b9f
|
Provenance
The following attestation bundles were made for accrue-1.2.0.tar.gz:
Publisher:
publish.yml on matt-house-e/accrue
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
accrue-1.2.0.tar.gz -
Subject digest:
da407d4285930eb7ec05187a08ab5b6d586b3de62bb690bca7b1b86234a0a4b7 - Sigstore transparency entry: 1188966573
- Sigstore integration time:
-
Permalink:
matt-house-e/accrue@ef77865a9a937c802bb2fba5603137167ac23a4c -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/matt-house-e
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ef77865a9a937c802bb2fba5603137167ac23a4c -
Trigger Event:
release
-
Statement type:
File details
Details for the file accrue-1.2.0-py3-none-any.whl.
File metadata
- Download URL: accrue-1.2.0-py3-none-any.whl
- Upload date:
- Size: 68.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
caab592eec8215eaadbd7828de63fb16fee98ccfe293f806c432ecdd9d317dea
|
|
| MD5 |
2a9fa4a0cffde7c8da80c5e67899c020
|
|
| BLAKE2b-256 |
f360bb6cc4fb2443b725751ea33eed6da7b3869ec3bdaedc53899e1b181ff594
|
Provenance
The following attestation bundles were made for accrue-1.2.0-py3-none-any.whl:
Publisher:
publish.yml on matt-house-e/accrue
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
accrue-1.2.0-py3-none-any.whl -
Subject digest:
caab592eec8215eaadbd7828de63fb16fee98ccfe293f806c432ecdd9d317dea - Sigstore transparency entry: 1188966575
- Sigstore integration time:
-
Permalink:
matt-house-e/accrue@ef77865a9a937c802bb2fba5603137167ac23a4c -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/matt-house-e
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ef77865a9a937c802bb2fba5603137167ac23a4c -
Trigger Event:
release
-
Statement type: