Skip to main content

A topological data analysis library for detecting knowledge gaps in RAG systems.

Project description

TopoRAG

TopoRAG is a Topological Data Analysis (TDA) library designed to detect conceptual gaps and blind spots in text documents, specifically useful for evaluating the context retrieved in Retrieval-Augmented Generation (RAG) pipelines.

By using Persistent Homology (via Ripser), it identifies topological "holes" in the semantic embedding space of your text, and then uses an LLM to label those missing concepts.

Installation

pip install toporag

(Note: Currently in development, install from source)

git clone <repository>
cd toporag_lib
pip install -e .

Setup

You need an OpenAI API key for text embeddings (text-embedding-3-small) and gap labeling (gpt-4o-mini).

export OPENAI_API_KEY="sk-..."

Basic Usage

import asyncio
from toporag import TopoAnalyzer

async def main():
    analyzer = TopoAnalyzer() # Automatically picks up OPENAI_API_KEY
    
    texto = """
    We started the project with great enthusiasm. The team was assembled, 
    requirements were gathered, and we had a solid plan for the architecture.
    
    Finally, we deployed the application to production and celebrated our success. 
    The customers loved the final result and our metrics improved dramatically.
    """
    
    gaps = await analyzer.analyze_text(texto)
    for gap in gaps:
        print(f"Missing Topic: {gap['topic_label']}")
        print(f"Explanation: {gap['explanation']}")
        print("---")

if __name__ == "__main__":
    asyncio.run(main())

API

TopoAnalyzer.analyze_text(text: str, threshold: float = 0.15, max_holes: int = 5, generate_suggestions: bool = True)

Splits the text into segments, embeds them, finds gaps with persistence above threshold, and uses the LLM to label up to max_holes gaps.

TopoAnalyzer.analyze_url(url: str, ...)

Scrapes the URL for readable text and runs the topological analysis.

TopoAnalyzer.analyze_segments(segments: List[str], ...)

Runs the analysis directly on a pre-chunked list of strings.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toporag-0.1.0.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toporag-0.1.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file toporag-0.1.0.tar.gz.

File metadata

  • Download URL: toporag-0.1.0.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for toporag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a2d1f4d6b01c5618ab6785bbaea1f9cf438e02f69cd92e2be6724005575351d6
MD5 ac61d8791b6741797729e8fdf0fafb3f
BLAKE2b-256 9ee40be1cf34110a1b7ea26c01380bb7814ad81971cd522c3a0d6c014cd930b4

See more details on using hashes here.

File details

Details for the file toporag-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: toporag-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for toporag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 81f10dcd7aa6cf7847a7426789071ded08fee3f03687082c63e1c655e5a0f9b9
MD5 8d7c8dafacc788fa25abb608e273bddb
BLAKE2b-256 c338ae55b18edd891d1ae16d0d881f09c7a9b4f2e40c06fa24fadf09f5560553

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page