RAG job pack for Tessera: compile a corpus + queries into a validated retrieval eval dataset.
Project description
tesserakit-rag
Compile a document corpus plus a set of queries into a validated retrieval eval dataset.
tessera-rag reads a directory holding a corpus/ of documents and a queries file, builds a canonical RagCase dataset (each query with its gold retrieval target documents and optional expected answer), verifies every document reference, and emits a dataset plus reports.
Scope (v0.1)
This pack builds and validates the dataset. It does not run retrieval (no embeddings, no vector store, no scoring). Like the api pack (no HTTP execution) and evals (no LLM calls), execution is a runtime concern deferred to a later version. v0.1 is the offline "is this retrieval eval set well-formed and internally consistent" pass.
Input shape
my_rag_eval/
corpus/
refunds.md
billing/disputes.md
queries.jsonl (or queries.yaml)
Document ids are the corpus-relative path without suffix: refunds, billing/disputes.
Each query (one JSON object per line in queries.jsonl, or a YAML list):
{"id": "q1", "query": "Can I get a refund after 45 days?", "expected_answer": "No, the window is 30 days.", "relevant_docs": ["refunds"], "tags": ["billing"]}
relevant_docs is the gold set the retriever should surface. expected_answer is optional; queries without one are flagged for human review.
Compile a RAG eval pack
tessera rag compile --input examples/rag/ --output ./out/rag_pack
Artifacts written:
dataset.jsonl canonical RagCase rows (query, expected, relevant doc ids)
corpus_index.jsonl RagDocument rows (id, title, counts, sha256)
validation_report.md reference + hygiene findings
coverage_report.md answer/target coverage, orphan docs, avg targets per query
retrieval_targets.md per-query gold document set (with titles)
Validation rules
parse_error— a queries line/file could not be parsedempty_corpus— no documents found undercorpus/empty_document,duplicate_doc_idmissing_query_text,duplicate_querydangling_doc_reference— a query references a document not in the corpusquery_without_relevant_docs— no retrieval target to score againstquery_without_expected_answer— needs human revieworphan_document— a corpus document no query references
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tesserakit_rag-0.4.0.tar.gz.
File metadata
- Download URL: tesserakit_rag-0.4.0.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8b5d75688ea388773625b6fd93be95c9a150e21d2882235341b6e9982431cf9
|
|
| MD5 |
477b2e8173a1e93acdff846094444c5a
|
|
| BLAKE2b-256 |
902b38c41d188678486226e2779688539688154c3f3d538ae39fd6a13c75a2e9
|
File details
Details for the file tesserakit_rag-0.4.0-py3-none-any.whl.
File metadata
- Download URL: tesserakit_rag-0.4.0-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
553008a0d876155e78c31496105d2eb68c5f6d042b8b0e9dd8a3c98382927ceb
|
|
| MD5 |
ae9b74540072e569b5e55ced7d698d73
|
|
| BLAKE2b-256 |
c79184fb2409bb55b9c4132b0b48e1523c8fdda4cbb0571237e92cdb85fa48ad
|