Topology-Aware Retrieval for RAG — structure-guided vector search with progressive fallback
Project description
tar-rag Project Description
tar-rag is a vector-store-agnostic Python library that adds structural navigation to RAG pipelines through directory-derived topology maps and progressive filter fallback, giving your retrieval layer the precision a flat semantic search can't.
Features
- Topology-aware retrieval — directory structure becomes filter metadata at query time
- Vector-store-agnostic (OpenAI Vector Stores, Pinecone, Qdrant, Chroma, or any custom adapter)
- Progressive filter fallback from most-specific to global, with parallel attempts
- Built-in confidence scoring with high / medium / low / none tiers and a token-cost gate
- Sync and native async on every entry point (
tar.search/tar.asearch) - Zero mandatory runtime dependencies — vector store SDKs and file extractors are optional extras
- Production CLI (
tar-rag crawl) plus a small Python API for embedded use - Tested end-to-end against a live OpenAI Vector Store; benchmark numbers in
benchmark.md
Usage Example
Usecase: index a directory of documents, upload them to an OpenAI Vector Store, and run a query through tar-rag's structural filter + fallback pipeline.
Step 1 — Crawl your corpus
from tar_rag import DirectoryCrawler, build_artifacts
crawler = DirectoryCrawler(
root="./my-corpus",
level_names=["kind", "topic"], # or None to auto-infer
)
documents = crawler.crawl()
bundle = build_artifacts(documents, level_names=crawler.level_names)
bundle.write("./tar_rag_output/")
print(f"Indexed {len(documents)} document(s).")
print("Tune ./tar_rag_output/confidence_config.json before first query if needed.")
Step 2 — Upload to a vector store (OpenAI shown)
import openai
from tar_rag.manifest import MetadataManifest
client = openai.OpenAI()
manifest = MetadataManifest.load("./tar_rag_output/metadata_manifest.json")
vs = client.vector_stores.create(name=f"my-kb-{manifest.version}")
for doc in manifest:
with open(doc.relative_path, "rb") as f:
uploaded = client.files.create(file=f, purpose="assistants")
client.vector_stores.files.create(
vector_store_id=vs.id,
file_id=uploaded.id,
attributes={k: v for k, v in doc.metadata.items() if v is not None},
)
print(f"Uploaded {len(manifest)} document(s) to vector store {vs.id}.")
Step 3 — Query through tar-rag
import openai
from tar_rag import TarRag
from tar_rag.adapters import OpenAIVectorStoreAdapter
tar = TarRag.from_artifacts("./tar_rag_output/")
tar.set_adapter(OpenAIVectorStoreAdapter(
client=openai.OpenAI(),
vector_store_id="vs_xxx",
top_k=6,
))
result = tar.search("What does asyncio.TaskGroup do in the source code?")
print(f"confidence={result.confidence} top_score={result.top_score:.2f}")
print(f"reason={result.reason} attempts_made={result.attempts_made}")
if result.should_answer:
for chunk in result.results:
print(chunk.score, chunk.snippet[:200])
else:
print("Confidence below the gate — forwarding zero chunks to the LLM.")
The same tar.search(...) call works against Pinecone, Qdrant, Chroma, or any custom adapter — only the constructor changes. See the full GitHub README for the multi-store examples, the system architecture diagram, the async path, and the tuning guide.
"Data should empower, not overwhelm"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tar_rag-0.1.0.tar.gz.
File metadata
- Download URL: tar_rag-0.1.0.tar.gz
- Upload date:
- Size: 83.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0e14cc2b90c3304256697ba6d4c7de9d21689a3e5a3238f3c30c00cb1a26c8b
|
|
| MD5 |
4e6d047019ff84250d6ba17be7da164b
|
|
| BLAKE2b-256 |
01d4d740034a7421db42dfff3d4701cbd1c66f4d7bffe6c8c0762d9e567b5e49
|
Provenance
The following attestation bundles were made for tar_rag-0.1.0.tar.gz:
Publisher:
publish.yml on vamsi-karnam/tar-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tar_rag-0.1.0.tar.gz -
Subject digest:
c0e14cc2b90c3304256697ba6d4c7de9d21689a3e5a3238f3c30c00cb1a26c8b - Sigstore transparency entry: 1527585396
- Sigstore integration time:
-
Permalink:
vamsi-karnam/tar-rag@6f7f7a8133a6b00e404e8a83543bac035fc8f7a3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/vamsi-karnam
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6f7f7a8133a6b00e404e8a83543bac035fc8f7a3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file tar_rag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tar_rag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 58.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee2a5ad03c4f381a81390c86f154bccf94bcbbd376c0b6442d34e9724ba11b03
|
|
| MD5 |
149941bccb01f7896d20a96d8b8ce6e1
|
|
| BLAKE2b-256 |
2f7484aa0b0011555dba2aae32b3740c6d0768c8f6dc289bba60f510f2f6bdff
|
Provenance
The following attestation bundles were made for tar_rag-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on vamsi-karnam/tar-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tar_rag-0.1.0-py3-none-any.whl -
Subject digest:
ee2a5ad03c4f381a81390c86f154bccf94bcbbd376c0b6442d34e9724ba11b03 - Sigstore transparency entry: 1527585486
- Sigstore integration time:
-
Permalink:
vamsi-karnam/tar-rag@6f7f7a8133a6b00e404e8a83543bac035fc8f7a3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/vamsi-karnam
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6f7f7a8133a6b00e404e8a83543bac035fc8f7a3 -
Trigger Event:
push
-
Statement type: