A label-driven RAG pipeline built on top of paralabelgen.

These details have not been verified by PyPI

Project links

Project description

labelrag

labelrag is a Python library for label-driven retrieval-augmented generation pipelines built on top of paralabelgen.

PyPI distribution: labelrag
Python import package: labelrag
Core dependency target: paralabelgen==0.2.0
Default extraction path: spaCy via paralabelgen

Install

pip install labelrag

If you want to use the default spaCy-backed labeling path, install a compatible English pipeline such as:

python -m spacy download en_core_web_sm

en_core_web_sm is the recommended default model, but you can point the underlying LabelGeneratorConfig at another installed compatible spaCy pipeline.

Quick Start

Retrieval-only workflow

from labelrag import RAGPipeline, RAGPipelineConfig

paragraphs = [
    "OpenAI builds language models for developers.",
    "Developers use language models in production systems.",
    "Production systems need monitoring and evaluation tooling.",
]

pipeline = RAGPipeline(RAGPipelineConfig())
pipeline.fit(paragraphs)

retrieval = pipeline.build_context("How do developers use language models?")
print(retrieval.prompt_context)
print(retrieval.metadata)

Retrieval plus provider-backed answer generation

from labelrag import (
    OpenAICompatibleAnswerGenerator,
    OpenAICompatibleConfig,
    RAGPipeline,
    RAGPipelineConfig,
)

paragraphs = [
    "OpenAI builds language models for developers.",
    "Developers use language models in production systems.",
    "Production systems need monitoring and evaluation tooling.",
]

pipeline = RAGPipeline(RAGPipelineConfig())
pipeline.fit(paragraphs)

generator = OpenAICompatibleAnswerGenerator(
    OpenAICompatibleConfig(
        model="mistral-small-latest",
        api_key_env_var="MISTRAL_API_KEY",
        base_url="https://api.mistral.ai/v1",
    )
)

answer = pipeline.answer_with_generator(
    "How do developers use language models?",
    generator,
)
print(answer.answer_text)
print(answer.metadata)

Retrieval Model

The current retrieval layer is deterministic and label-driven.

fit(...) delegates paragraph analysis to labelgen.LabelGenerator
build_context(...) maps the question into the fitted label space
retrieval uses greedy coverage over query label IDs
label-free queries can fall back to deterministic concept overlap
require_full_label_coverage=True suppresses partial retrieval results while preserving attempted coverage trace in metadata

Tie-break order for greedy retrieval is:

larger overlap with remaining query labels
larger overlap on query concept IDs
larger total paragraph label count
lexicographically smaller paragraph_id

OpenAI-Compatible Provider Notes

The built-in answer-generation adapter targets a minimal OpenAI-compatible chat-completions API surface.

It supports:

standard base URLs such as https://api.openai.com/v1
full endpoint URLs such as https://api.mistral.ai/v1/chat/completions
API key injection through explicit config or optional environment-variable lookup
non-streaming text generation for answer_with_generator(...)

This adapter is intended to cover providers such as OpenAI, Mistral, and Qwen when they expose an OpenAI-compatible endpoint shape.

Public API

The main public entrypoints are:

RAGPipeline
RAGPipelineConfig, RetrievalConfig, PromptConfig
IndexedParagraph, LabelRecord, ConceptRecord
QueryAnalysis, RetrievedParagraph
RetrievalResult, RAGAnswerResult
GeneratedAnswer, AnswerGenerator
OpenAICompatibleAnswerGenerator, OpenAICompatibleConfig
convenience re-export: Paragraph

RAGPipeline also exposes record-oriented inspection helpers for paragraph/label/concept lookup workflows:

get_paragraph(...)
get_label(...)
get_paragraph_labels(...)
get_paragraph_concepts(...)
get_label_paragraphs(...)
get_concept_paragraphs(...)

Lower-level ID-oriented helpers remain available when you only need stable IDs:

get_label_paragraph_ids(...)
get_paragraph_label_ids(...)
get_paragraph_concept_ids(...)
get_concept_paragraph_ids(...)

Detailed API notes are available in docs/public_api.md.

Examples

Runnable examples are available in examples/:

Persistence Notes

save(path) produces a human-inspectable directory containing:

manifest.json
config.json
label_generator.json
corpus_index.json
fit_result.json

The persistence layer now supports:

json
json.gz

Compression is applied to the full saved snapshot rather than mixing compressed and uncompressed artifacts in one directory.

Snapshots written by the current release include a lightweight manifest describing the saved version, persistence format, and expected artifacts.

Public guarantee:

a saved and reloaded pipeline should preserve retrieval behavior for the same fitted state, question, and config

Current update boundary:

fit(...) is batch-only
adding new paragraphs currently requires a full refit
save/load restores a static fitted state rather than an incrementally updateable corpus state

Legacy snapshot note:

loading pre-0.0.2 snapshots remains a best-effort compatibility path
when older snapshots are missing derived concept inspection tables, load() may rebuild them from paragraph-side concept data that is still present
persisted manifests include a non-empty labelrag_version
save() fails explicitly if the current package version cannot be determined for manifest writing

Configuration Notes

RetrievalConfig.max_paragraphs sets the hard retrieval limit
RetrievalConfig.allow_label_free_fallback enables deterministic concept overlap fallback for label-free queries
RetrievalConfig.require_full_label_coverage suppresses partial retrieval output when not all query labels can be covered
PromptConfig.include_paragraph_ids includes stable paragraph IDs in the rendered prompt context
PromptConfig.include_label_annotations includes paragraph label annotations in rendered prompt context
PromptConfig.max_context_characters applies a hard cap to rendered context length

Development Checks

.venv/bin/ruff check . --fix
.venv/bin/pyright
.venv/bin/pytest

Release Checks

.venv/bin/python -m build
.venv/bin/python -m twine check dist/*

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Apr 14, 2026

This version

0.0.3

Apr 10, 2026

0.0.2

Apr 3, 2026

0.0.1

Apr 2, 2026

0.0.0

Apr 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

labelrag-0.0.3.tar.gz (16.1 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

labelrag-0.0.3-py3-none-any.whl (21.3 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file labelrag-0.0.3.tar.gz.

File metadata

Download URL: labelrag-0.0.3.tar.gz
Upload date: Apr 10, 2026
Size: 16.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for labelrag-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`7c8f5a36551f583b2214f580ff32722010259553647ae36e2b92e9f7e36001ce`
MD5	`80b638ec5261bb9b3d593ebf0bbb3e48`
BLAKE2b-256	`96135d7ed22325f5c260b53f1435c5ddf1c3859a875aaf4d528f04949056d4a5`

See more details on using hashes here.

File details

Details for the file labelrag-0.0.3-py3-none-any.whl.

File metadata

Download URL: labelrag-0.0.3-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 21.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for labelrag-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3300f211732bba806fff9c360f392721fdbe04c0a1ecb4383811c22d43590d63`
MD5	`b7a58bff0b5df889dc0613eedc7d1c64`
BLAKE2b-256	`c554e09b7c43ec6455e8923f321e83014c0b96f51b3a4fc8c47a335dee1a6512`

See more details on using hashes here.

labelrag 0.0.3

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

labelrag

Install

Quick Start

Retrieval-only workflow

Retrieval plus provider-backed answer generation

Retrieval Model

OpenAI-Compatible Provider Notes

Public API

Examples

Persistence Notes

Configuration Notes

Development Checks

Release Checks

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes