Skip to main content

A composable pipeline engine for structured data enrichment using LLMs

Project description

Accrue
The enrichment pipeline engine.

PyPI Python License


Define a pipeline. Point it at your data. Get structured results. Accrue is a Python library for enriching datasets with LLMs. Compose multi-step pipelines, run them across hundreds to tens of thousands of rows, and get validated, structured output back -- with caching, retries, and parallel execution handled for you.

No platform. No markup. Just a pipeline you can version-control, iterate on, and reason about.

from accrue import Pipeline, LLMStep

pipeline = Pipeline([
    LLMStep("analyze", fields={
        "market_size": "Estimate total addressable market in billions USD",
        "competition": {
            "prompt": "Rate competitive intensity with key competitors",
            "enum": ["Low", "Medium", "High"],
            "examples": ["High - Competes with AWS, Google Cloud"],
        },
        "growth_potential": {
            "prompt": "Assess 5-year growth trajectory",
            "type": "String",
            "format": "X% CAGR - reasoning",
        },
    })
])

result = pipeline.run(df)  # DataFrame in, DataFrame out
print(result.data.head())
print(f"Tokens used: {result.cost.total_tokens:,}")

Install

Requires Python 3.10+.

pip install accrue

Set your API key:

export OPENAI_API_KEY=sk-...

That's it. OpenAI is the default provider (zero config, structured outputs auto-enabled). Anthropic and Google are optional:

pip install accrue[anthropic]  # Claude
pip install accrue[google]     # Gemini

Claude Code Skill

If you use Claude Code, Accrue ships with a built-in /accrue skill that guides you through building pipelines interactively. It designs fields, picks models, estimates costs, and writes your script -- you just review and run.

> /accrue
> I have 500 companies in accounts.csv, I need to qualify them for ICP fit

The skill walks you through field design, model selection, pipeline architecture, and configuration before writing a production-ready script. See Using the Claude Code Skill for details.

Why Accrue

You have a spreadsheet of companies, leads, or entities. You need structured fields added to every row -- classifications, summaries, scores, extracted data. You could write a for loop and call the OpenAI API, but then you're building retry logic, rate limiting, caching, progress tracking, and crash recovery. You could use Clay, but you'd pay $500/month for something you can't version-control.

Accrue is the pipeline between a single API call and a full platform:

Raw API calls Accrue Clay
Scope One call at a time Pipeline of steps across rows Full SaaS platform
Multi-step Manual orchestration DAG with parallel execution Sequential drag-and-drop
Caching Build it yourself SQLite, auto-invalidates on prompt change Platform-managed
Crash recovery Start over Checkpoint + row-level cache resume Platform-managed
Iterate on prompts Re-run everything Only re-process changed steps/rows Re-run everything
Cost API costs API costs $$$$/month + API costs
Version control Yes Yes No

Quick Example

Chain steps together with depends_on. Use web_search() to ground LLM answers in live data:

from accrue import Pipeline, FunctionStep, LLMStep, web_search

pipeline = Pipeline([
    FunctionStep("research",
        fn=web_search("Research {company}: market position, competitors, recent news"),
        fields=["__web_context", "sources"],
    ),
    LLMStep("analyze",
        fields={
            "market_size": "Estimate TAM in billions USD",
            "competitors": {"prompt": "List top 3 competitors", "type": "List[String]"},
            "investment_thesis": "One-paragraph investment thesis",
        },
        depends_on=["research"],
    ),
])

result = pipeline.run(companies_df)

Features

  • Multi-step pipelines -- Chain LLM steps and function steps into a DAG with automatic dependency resolution and parallel execution. Quickstart

  • Provider-agnostic -- OpenAI, Anthropic (with automatic prompt caching), and Google ship as adapters. Any OpenAI-compatible API works via base_url. Custom providers implement one async method. Providers guide

  • 7-key field specs -- Control LLM output with prompt, type, format, enum, examples, bad_examples, and default. Drives structured outputs and Pydantic validation automatically. Field specs guide

  • Caching and checkpointing -- SQLite input-hash cache auto-invalidates on prompt changes. Checkpointing saves after each step for crash recovery. Caching guide

  • Batch API -- LLMStep(batch=True) for 50% cost savings via OpenAI and Anthropic batch endpoints. Cache-aware, auto-chunking, realtime fallback on failures. Batch guide

  • Web search and grounding -- web_search() factory for search-then-analyze pipelines, or grounding=True for native provider web search with normalized citations. Web search guide

  • Conditional steps -- run_if / skip_if predicates for per-row branching. Skipped rows get defaults, never hit the API. Conditional steps guide

  • Hooks -- Typed lifecycle events for observability. Sync and async callables, never crash the pipeline. Hooks guide

  • provider_kwargs -- Escape hatch for provider-specific features (extended thinking, effort control, etc.) without waiting for first-class support.

Sweet Spot

Accrue is built for 100 to 50,000 rows -- too many for manual work or single-call tools, too few to justify big data infrastructure.

Rows Time (3 steps, 10 workers) Cost (gpt-4.1-mini)
100 ~30s ~$0.20
1,000 ~5 min ~$2
10,000 ~50 min ~$20
50,000 ~50 min (50 workers) ~$100

With batch=True, halve the API costs. Cached steps re-run in seconds.

Documentation

Section Description
Getting Started Installation, first pipeline, core concepts
Claude Code Skill Interactive pipeline builder via /accrue
Guides Field specs, providers, caching, batch API, grounding, hooks, errors, configuration
Cookbook End-to-end examples: company enrichment, lead scoring, content analysis, batch processing
API Reference Complete reference for every public export

Contributing

git clone https://github.com/matt-house-e/accrue.git
cd accrue
pip install -e ".[dev]"
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

accrue-1.1.0.tar.gz (69.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

accrue-1.1.0-py3-none-any.whl (68.2 kB view details)

Uploaded Python 3

File details

Details for the file accrue-1.1.0.tar.gz.

File metadata

  • Download URL: accrue-1.1.0.tar.gz
  • Upload date:
  • Size: 69.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for accrue-1.1.0.tar.gz
Algorithm Hash digest
SHA256 b1516e8394db557b747000c08b40fbcc4585ab8fe8bec63f700d21f7528aba58
MD5 f718c7fc2d33c81af91956b520cd1ae7
BLAKE2b-256 6770d5330f866919d4514c5838075806e7cc5e96d4b622432425bc82d7307f86

See more details on using hashes here.

Provenance

The following attestation bundles were made for accrue-1.1.0.tar.gz:

Publisher: publish.yml on matt-house-e/accrue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file accrue-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: accrue-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 68.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for accrue-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 37e2a89829abce2dbf4afeba02b67d7cc53fdfdcef9e0193f9afb51730d476d2
MD5 b581097e3f142903a98ae632784baf66
BLAKE2b-256 83723d6b46ecba84cce2cd451d0f8df54a913d5ed324e65d23928a761772652b

See more details on using hashes here.

Provenance

The following attestation bundles were made for accrue-1.1.0-py3-none-any.whl:

Publisher: publish.yml on matt-house-e/accrue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page