Skip to main content

Generate production-ready RAG pipelines from a single config

Project description

ragfactory

Generate production-ready RAG pipelines from a single config file.

PyPI version Python 3.11+ License: Apache 2.0 LangChain LlamaIndex


You have a document corpus. You need a production-grade RAG system. You don't want to spend three days wiring together chunkers, embedders, vector DBs, rerankers, and LLMs — only to discover on day four that you chose the wrong retrieval strategy.

ragfactory solves this. Describe your pipeline in a single YAML file. Get back a fully-wired, Dockerised Python project with every component integrated, validated, and ready to run.

pip install ragfactory
ragfactory init --name my-rag --vector-db qdrant --llm anthropic --output ./my-rag

That's it. Eight files. Zero boilerplate. Ship in minutes, not days.


What gets generated

Every ragfactory generate produces a complete, standalone Python project:

File Description
pipeline.py Query pipeline — retrieval, reranking, generation, fully wired
ingestion.py Indexing pipeline — load, chunk, embed, upsert
config.yaml Serialized, reloadable copy of your exact config
pyproject.toml Pinned dependencies for every chosen component
.env.example Required API keys, pre-filled with the right variable names
Dockerfile Container-ready, production base image
docker-compose.yml Spins up vector DB + app in one command
README.md Component-specific setup guide for your exact combination

The generated code is not a wrapper. It's idiomatic Python that calls the upstream SDKs directly — no ragfactory runtime dependency at execution time.


Installation

pip install ragfactory

Requirements: Python 3.11+


Quick start

Option A — Interactive wizard

ragfactory init \
  --name customer-support-rag \
  --vector-db qdrant \
  --embedding voyage \
  --llm anthropic \
  --output ./customer-support-rag

Option B — Config-driven generation

# pipeline.yaml
name: customer-support-rag
framework: langchain

indexing:
  chunking:
    type: contextual          # LLM-prepended context — +49-67% recall (Anthropic, 2024)
    chunk_size: 512
    chunk_overlap: 50
    context_model: gpt-4o-mini
  embedding:
    type: voyage              # Best MTEB 2024 retrieval benchmarks
    model: voyage-3-large
  vector_db:
    type: qdrant              # 8500-12000 QPS, best metadata filtering
    url: http://localhost:6333
    collection_name: support-docs

retrieval:
  type: hybrid_rrf            # BM25 + dense via Reciprocal Rank Fusion — +15-30% recall
  top_k: 20
  rrf_k: 60

post_retrieval:
  reranker:
    type: cohere              # API reranker, no GPU required
    top_n: 5

generation:
  llm:
    type: anthropic
    model: claude-sonnet-4-6
    temperature: 0.1
ragfactory generate --config pipeline.yaml --output ./customer-support-rag
cd customer-support-rag
pip install -e .

Option C — Save config, generate later

# Scaffold the YAML without generating code
ragfactory init --name my-rag --save-config ./my-rag.yaml

# Edit my-rag.yaml, then generate
ragfactory generate --config ./my-rag.yaml --output ./my-rag

CLI reference

ragfactory generate

Generate a complete pipeline project from a config file.

ragfactory generate --config pipeline.yaml --output ./my-pipeline

# Skip compatibility validation (advanced use)
ragfactory generate --config pipeline.yaml --output ./my-pipeline --force
Flag Description
--config Path to YAML config file (required)
--output Output directory (default: config name field)
--force Skip compatibility validation, generate anyway

ragfactory validate

Validate a config file without generating anything. Surfaces incompatibilities, warnings, and cost alerts.

ragfactory validate --config pipeline.yaml

Exit codes:

  • 0 — Config is valid (warnings may still appear)
  • 1 — Config has incompatible components; generation is blocked

Example output:

✓  Config is valid

  WARN_CHROMADB        ChromaDB is limited to ~7M vectors — prototyping only
  WARN_CONTEXTUAL      Contextual chunking costs ~$1.02/M input tokens

ragfactory init

Scaffold a new pipeline with smart defaults and automatic compatibility enforcement.

ragfactory init \
  --name my-pipeline \
  --framework langchain \
  --vector-db qdrant \
  --embedding openai \
  --chunking recursive \
  --retrieval hybrid_rrf \
  --reranker cohere \
  --llm anthropic \
  --output ./my-pipeline

Auto-retrieval logic: ragfactory picks the best retrieval strategy for your vector DB automatically.

  • chromadb / pineconedense (no sparse index support)
  • qdrant / weaviate / milvus / pgvectorhybrid_rrf

You can always override with --retrieval.

Flag Default Description
--name required Pipeline name (lowercase, alphanumeric, hyphens)
--framework langchain langchain or llamaindex
--vector-db qdrant Vector database
--embedding openai Embedding model provider
--chunking recursive Chunking strategy
--retrieval auto Retrieval strategy
--reranker none Reranker (optional)
--llm openai LLM provider
--output ./name Output directory
--save-config Write YAML only, skip code generation

ragfactory options

Explore all available components and their descriptions.

# All components
ragfactory options

# Filter by category
ragfactory options --component embedding
ragfactory options --component retrieval

# Machine-readable JSON
ragfactory options --json

Available categories: chunking, embedding, vectordb, retrieval, reranker, llm


Component matrix

Chunking strategies

Strategy Best for Notes
recursive General purpose Production default. Hierarchical separators.
fixed Uniform corpora Simple baseline. Fixed token windows.
semantic Topic-shifting docs Embedding-based breakpoints. Handles topic drift.
contextual High-recall pipelines LLM-prepended context per chunk. +49–67% recall (Anthropic, 2024). Costs ~$1.02/M tokens.
late Long-form retrieval Jina Late Chunking. Token-level pooling after full-doc encoding. Requires Jina embeddings.
page_level Structured PDFs One chunk per page. Good for dense reference material.
proposition High-precision facts Extract atomic propositions before indexing. Higher indexing cost.

Embedding models

Model Dims Highlights
openai 1536 text-embedding-3-small/large. Fast, reliable, great default.
cohere 1024 embed-v4.0. Multilingual. Separate input types for query/document.
voyage 1024 voyage-3-large. Top MTEB 2024 retrieval benchmarks.
gemini 768 text-embedding-004. Google ecosystem integration.
bge_m3 1024 BAAI/BGE-M3. Self-hosted. Dense + sparse + ColBERT in one model.
nomic 768 nomic-embed-text-v1.5. Self-hosted or API. Configurable dims: 64–768.
jina 1024 jina-embeddings-v3. 8192-token context. Native late chunking support.

Vector databases

DB Scale Highlights
chromadb < 7M vectors Embedded, in-process. Zero infra. Prototyping only.
qdrant 100M+ Production default. Rust. 8500–12000 QPS. Best metadata filtering.
pinecone 1.4B+ Serverless. Zero-ops. Pay-per-query.
weaviate 100M+ Native hybrid search. Multi-tenancy. GraphQL + REST.
milvus 1B+ Distributed. GPU-accelerated indexing (CAGRA). Enterprise-grade.
pgvector 10M+ PostgreSQL extension. Best for teams with an existing Postgres stack.

Retrieval strategies

Strategy Recall delta Notes
dense Baseline Pure vector similarity. Fastest.
hybrid_rrf +15–30% BM25 + dense via Reciprocal Rank Fusion. Keyword + semantic.
hybrid_weighted +10–25% BM25 + dense with tunable alpha weight. More control than RRF.
small_to_big +5–15% Retrieve child chunks, return parent context. Better answer coherence.
sentence_window +5-10% Retrieve sentence, expand to window. Native in LlamaIndex; approximated in LangChain.

Compatibility note: hybrid_rrf and hybrid_weighted require sparse vector support. ChromaDB and Pinecone fall back to dense automatically. sentence_window is native in LlamaIndex. LangChain output uses a sentence retrieval plus context-window approximation.

Rerankers

Reranker GPU required Notes
cohere No Cohere Rerank API. Fast, production-ready, no infra.
cross_encoder Recommended Best quality. Cross-attention scoring. GPU needed for production throughput.
colbert No RAGatouille ColBERT. Token-level interaction. Requires 6–10× disk space.
flashrank No Fastest local reranker. CPU-friendly. Best for latency-first setups.

LLM providers

Provider Models Default
openai gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo gpt-4o-mini
anthropic claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5 claude-sonnet-4-6
cohere command-r-plus, command-r, command Grounding-optimised, inline citation support
ollama Any Ollama model Local inference, zero API cost

Frameworks

Both LangChain and LlamaIndex are fully supported. The generated code uses each framework's native APIs and idioms — not a shared abstraction layer on top of them.

framework: langchain    # or: llamaindex
Feature LangChain LlamaIndex
LCEL pipeline chains
EnsembleRetriever (hybrid)
Small-to-big retrieval
Sentence Window Retrieval (native)
Node post-processors
Streaming support

Full config reference

name: my-pipeline                    # required; lowercase, alphanumeric, hyphens; 1-64 chars
version: "1.0"                       # config schema version (default: "1.0")
framework: langchain                 # langchain | llamaindex

# ── Indexing ──────────────────────────────────────────────────────────────────
indexing:
  chunking:
    type: recursive                  # fixed | recursive | semantic | contextual | late | page_level | proposition
    chunk_size: 512                  # tokens per chunk
    chunk_overlap: 50                # overlap between chunks

  embedding:
    type: openai                     # openai | cohere | voyage | gemini | bge_m3 | nomic | jina
    model: text-embedding-3-small    # provider-specific model name (optional)

  vector_db:
    type: qdrant                     # chromadb | qdrant | pinecone | weaviate | milvus | pgvector
    url: http://localhost:6333       # for qdrant, weaviate, milvus
    collection_name: my-collection

# ── Pre-retrieval (optional) ──────────────────────────────────────────────────
pre_retrieval:
  query_rewriting:
    enabled: true
    strategy: multi_query            # multi_query | sub_question | step_back
    num_rewrites: 3
  hyde:
    enabled: true
    num_hypotheses: 3

# ── Retrieval ─────────────────────────────────────────────────────────────────
retrieval:
  type: hybrid_rrf                   # dense | hybrid_rrf | hybrid_weighted | small_to_big | sentence_window
  top_k: 20
  rrf_k: 60                          # hybrid_rrf only (default: 60)

# ── Post-retrieval (optional) ─────────────────────────────────────────────────
post_retrieval:
  reranker:
    type: cohere                     # cohere | cross_encoder | colbert | flashrank
    top_n: 5
  context_assembly:
    ordering: relevance_first        # relevance_first | chronological | reverse_relevance
    max_sources: 5
    source_attribution: true

# ── Generation ────────────────────────────────────────────────────────────────
generation:
  llm:
    type: anthropic                  # openai | anthropic | cohere | ollama
    model: claude-sonnet-4-6
    temperature: 0.1
    max_tokens: 1024

# ── Evaluation (optional) ─────────────────────────────────────────────────────
evaluation:
  framework: ragas                   # ragas | deepeval | both
  metrics:
    - faithfulness
    - answer_relevancy
    - context_precision
  num_test_cases: 50
  pass_threshold: 0.7

Compatibility validation

ragfactory validates every config before generating code. Incompatible combinations are blocked with a clear error code. Warnings surface cost and performance risks without blocking generation.

Blocked combinations

Error code Meaning
INCOMPAT_HYBRID_RRF_CHROMADB ChromaDB has no sparse index — hybrid RRF requires it
INCOMPAT_HYBRID_WEIGHTED_CHROMADB Same constraint for weighted hybrid
INCOMPAT_HYBRID_RRF_PINECONE Hybrid RRF not reliably exposed in Pinecone integrations
WARN_SENTENCE_WINDOW LangChain sentence-window retrieval is generated as an approximation
INCOMPAT_LATE_OPENAI Late chunking requires Jina embeddings, not OpenAI
INCOMPAT_LATE_BGE_M3 Late chunking requires Jina embeddings, not BGE-M3
UNSUPPORTED_ADVANCED_FLARE FLARE config is parsed but code generation is not implemented in this release
UNSUPPORTED_ADVANCED_CRAG CRAG config is parsed but code generation is not implemented in this release
UNSUPPORTED_ADVANCED_AGENTIC Agentic RAG config is parsed but code generation is not implemented in this release
LATE_CHUNKING_FLAG_NOT_SET chunking.type=late but embedding.late_chunking=false

Warnings

Warning code Meaning
WARN_CHROMADB ChromaDB caps at ~7M vectors — prototyping only
WARN_CONTEXTUAL Contextual chunking costs ~$1.02/M input tokens
WARN_CROSS_ENCODER Cross-encoder reranker needs GPU for production throughput
WARN_BGE_M3 BGE-M3 self-hosted; CPU inference is 10–50× slower than GPU
WARN_HYDE HyDE adds 1 extra LLM call per query (~100ms extra latency)
WARN_COLBERT ColBERT requires 6–10× the disk space of a dense index
WARN_CHROMADB ChromaDB is not suited for > 7M vectors
CONTEXTUAL_CHUNKING_EXTRA_API_KEY Context model uses a different provider — extra API key required
RERANKER_TOP_N_EXCEEDS_TOP_K reranker.top_n >= retrieval.top_k — likely misconfiguration

Four validated starting points

Prototype — Zero infrastructure

Runs locally with no external services. Great for experimentation.

name: prototype
framework: langchain
indexing:
  chunking: {type: recursive}
  embedding: {type: openai}
  vector_db: {type: chromadb}
retrieval: {type: dense}
generation:
  llm: {type: openai}

Production — Best performance per dollar

Full hybrid retrieval, reranking, contextual chunking.

name: production
framework: langchain
indexing:
  chunking: {type: contextual, context_model: gpt-4o-mini}
  embedding: {type: voyage, model: voyage-3-large}
  vector_db: {type: qdrant, url: http://localhost:6333, collection_name: docs}
retrieval: {type: hybrid_rrf, top_k: 20}
post_retrieval:
  reranker: {type: cohere, top_n: 5}
generation:
  llm: {type: anthropic, model: claude-sonnet-4-6}

Serverless — Zero operations

Fully managed, infinitely scalable. Pay-per-query with no infra to maintain.

name: serverless
framework: langchain
indexing:
  chunking: {type: recursive}
  embedding: {type: openai}
  vector_db: {type: pinecone, index_name: my-index, environment: us-east-1}
retrieval: {type: dense, top_k: 10}
generation:
  llm: {type: openai, model: gpt-4o}

Air-gapped — Fully local

Zero API calls. Everything runs on your hardware. Privacy-first.

name: local
framework: langchain
indexing:
  chunking: {type: recursive}
  embedding: {type: bge_m3}
  vector_db: {type: chromadb}
retrieval: {type: dense}
generation:
  llm: {type: ollama, model: llama3.2, base_url: http://localhost:11434}

Environment setup

Each generated project includes a .env.example pre-filled for your exact component selection. Copy it, fill in your keys, and run.

cp .env.example .env
Variable Required for
OPENAI_API_KEY openai embedding or LLM
ANTHROPIC_API_KEY anthropic LLM
COHERE_API_KEY cohere embedding, reranker, or LLM
VOYAGE_API_KEY voyage embedding
PINECONE_API_KEY pinecone vector DB
WEAVIATE_API_KEY weaviate cloud
GOOGLE_API_KEY gemini embedding

Architecture

ragfactory/
├── models/         Pydantic v2 config schemas — every component, every option
├── core/
│   ├── validator.py    Cross-field compatibility rules (errors + warnings + cost alerts)
│   └── generator.py    Jinja2 template renderer → fully-wired Python project
├── templates/
│   ├── stages/         One .j2 template per component (42 templates total)
│   │   ├── chunking/   7 strategies
│   │   ├── embedding/  7 providers
│   │   ├── vectordb/   6 databases
│   │   ├── retrieval/  5 strategies
│   │   ├── reranker/   4 rerankers
│   │   └── llm/        4 providers
│   └── entrypoints/
│       ├── langchain/  pipeline.py.j2, ingestion.py.j2
│       ├── llamaindex/ pipeline.py.j2, ingestion.py.j2
│       └── common/     Dockerfile.j2, pyproject.toml.j2, .env.example.j2, README.md.j2
└── cli/
    └── main.py         Typer CLI: generate, validate, init, options

Possible combinations: 7 chunking × 7 embedding × 6 vector DB × 5 retrieval × 5 reranker × 4 LLM × 2 frameworks = 58,800 unique pipeline configurations.


Development

git clone https://github.com/ragfactory/ragfactory
cd ragfactory
pip install -e ".[dev]"

Run tests:

pytest                         # full suite
pytest tests/unit/             # unit tests only
pytest tests/integration/      # CLI integration tests (real filesystem)
pytest --cov=ragfactory        # with coverage report

Lint and type-check:

ruff check .
mypy ragfactory/

Verify a generated project's Python is syntactically valid:

ragfactory generate --config tests/fixtures/quick_start.yaml --output /tmp/test-out
python -c "import ast; ast.parse(open('/tmp/test-out/pipeline.py').read()); print('OK')"

Releasing

Bump the version, tag, and publish to PyPI in one command:

python release.py 0.2.0

That's it. The script updates pyproject.toml, commits, tags, and pushes. GitHub Actions picks up the tag and publishes to PyPI automatically.

Pre-requisite: Add your PyPI token as a GitHub secret named PYPI_API_TOKEN under
Settings → Secrets and variables → Actions → New repository secret.


Roadmap

  • Phase 1 — CLI: generate, validate, init, options
  • Phase 2 — REST API (FastAPI): programmatic pipeline generation
  • Phase 3 — Web UI: visual pipeline builder with live validation
  • Advanced generation techniques: CRAG, FLARE, Agentic RAG
  • Evaluation harness: RAGAS and DeepEval integration
  • Ingestion sources: S3, URLs, Google Drive, Notion
  • LangGraph multi-agent pipeline support
  • VS Code extension: config autocomplete + inline validation

License

Apache 2.0 — free for commercial and private use.


Built by developers who got tired of wiring the same RAG stack from scratch for the fifth time.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragfactory-0.1.1.tar.gz (214.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragfactory-0.1.1-py3-none-any.whl (71.8 kB view details)

Uploaded Python 3

File details

Details for the file ragfactory-0.1.1.tar.gz.

File metadata

  • Download URL: ragfactory-0.1.1.tar.gz
  • Upload date:
  • Size: 214.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for ragfactory-0.1.1.tar.gz
Algorithm Hash digest
SHA256 59ad67c6e167e98ea01e893aed6bdbaf37003d7025df15492e6c6782351a1e92
MD5 af6f466685f4084e8a3b45ac31a8f96b
BLAKE2b-256 4797d42972eb72903e52af1ce2645beb0369f5dfc592c171d72a2f517b35ec29

See more details on using hashes here.

File details

Details for the file ragfactory-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ragfactory-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 71.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for ragfactory-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6d1d8b1a80135909e15d4d541895bbd8c0b555b8a4361227d75de23285685533
MD5 3fa35d71af1702c6d4d17a151e6a08e1
BLAKE2b-256 ef5ace0c67a54f0e979a85427e31bff7e4c07e01a9bae6d3dde1c541e6fa598e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page