A topological data analysis library for detecting knowledge gaps in RAG systems.
Project description
TopoRAG
TopoRAG is a Topological Data Analysis (TDA) library designed to detect conceptual gaps and blind spots in text documents, specifically useful for evaluating the context retrieved in Retrieval-Augmented Generation (RAG) pipelines.
By using Persistent Homology (via Ripser), it identifies topological "holes" in the semantic embedding space of your text, and then uses an LLM to label those missing concepts.
Installation
pip install toporag
(Note: Currently in development, install from source)
git clone <repository>
cd toporag_lib
pip install -e .
Setup
You need an OpenAI API key for text embeddings (text-embedding-3-small) and gap labeling (gpt-4o-mini).
export OPENAI_API_KEY="sk-..."
Basic Usage
import asyncio
from toporag import TopoAnalyzer
async def main():
analyzer = TopoAnalyzer() # Automatically picks up OPENAI_API_KEY
texto = """
We started the project with great enthusiasm. The team was assembled,
requirements were gathered, and we had a solid plan for the architecture.
Finally, we deployed the application to production and celebrated our success.
The customers loved the final result and our metrics improved dramatically.
"""
gaps = await analyzer.analyze_text(texto)
for gap in gaps:
print(f"Missing Topic: {gap['topic_label']}")
print(f"Explanation: {gap['explanation']}")
print("---")
if __name__ == "__main__":
asyncio.run(main())
API
TopoAnalyzer.analyze_text(text: str, threshold: float = 0.15, max_holes: int = 5, generate_suggestions: bool = True)
Splits the text into segments, embeds them, finds gaps with persistence above threshold, and uses the LLM to label up to max_holes gaps.
TopoAnalyzer.analyze_url(url: str, ...)
Scrapes the URL for readable text and runs the topological analysis.
TopoAnalyzer.analyze_segments(segments: List[str], ...)
Runs the analysis directly on a pre-chunked list of strings.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toporag-0.1.0.tar.gz.
File metadata
- Download URL: toporag-0.1.0.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2d1f4d6b01c5618ab6785bbaea1f9cf438e02f69cd92e2be6724005575351d6
|
|
| MD5 |
ac61d8791b6741797729e8fdf0fafb3f
|
|
| BLAKE2b-256 |
9ee40be1cf34110a1b7ea26c01380bb7814ad81971cd522c3a0d6c014cd930b4
|
File details
Details for the file toporag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: toporag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81f10dcd7aa6cf7847a7426789071ded08fee3f03687082c63e1c655e5a0f9b9
|
|
| MD5 |
8d7c8dafacc788fa25abb608e273bddb
|
|
| BLAKE2b-256 |
c338ae55b18edd891d1ae16d0d881f09c7a9b4f2e40c06fa24fadf09f5560553
|