Topic-Enhanced Retrieval-Augmented Generation Library

These details have not been verified by PyPI

Project description

topic-rag — Topic-Enhanced Retrieval-Augmented Generation

Install it anywhere with:

pip install topic-rag

What it does

Standard RAG systems retrieve documents purely by text similarity (how close the words are). topic-rag adds a second layer — it discovers hidden topics across your document collection and uses those topics to boost retrieval accuracy. A query about "neural networks" will score higher against documents that share that topic cluster, even if the exact words differ.

Core usage

1. Basic retrieval

from topic_rag import DocumentProcessor, TopicEnhancedRAGRetriever, RAGEvaluator

# Build a topic-aware corpus from your documents
processor = DocumentProcessor(n_topics=10)
corpus = processor.process_corpus(documents)   # list of {id, text, title}

# Retrieve with topic enhancement
retriever = TopicEnhancedRAGRetriever(corpus)
results = retriever.retrieve("What is transfer learning?", k=5)

# Evaluate retrieval quality
evaluator = RAGEvaluator()
metrics = evaluator.evaluate_retrieval(results, relevant_doc_ids)
# → recall@5, precision@5, MRR, NDCG, hit_rate

2. Benchmarking against standard datasets

from topic_rag import EvaluationPipeline, ExperimentConfig

pipeline = EvaluationPipeline()
config = ExperimentConfig(
    dataset_name="squad_v2",   # ms_marco | natural_questions | hotpot_qa | trivia_qa
    max_documents=500,
    max_queries=100,
    n_topics=10
)
results = pipeline.run_single_experiment(config)

3. Statistical validation

from topic_rag import StatisticalAnalyzer

analyzer = StatisticalAnalyzer()
stats = analyzer.calculate_comparison_statistics(standard_results, enhanced_results)
# → paired t-test, Wilcoxon signed-rank, Cohen's d effect size, confidence intervals

4. Paper generation

from topic_rag import PaperGenerator

gen = PaperGenerator()
files = gen.generate_complete_paper({
    "title": "My RAG Study",
    "authors": ["Your Name"],
    "institution": "Your University",
    "results": experiment_results,
    "output_format": "LaTeX + PDF",   # or "Markdown"
    "sections": { "abstract": True, "methodology": True, ... }
})

Advantages over plain RAG

	Standard RAG	topic-rag
Retrieval signal	TF-IDF similarity only	TF-IDF + latent topic overlap
Semantic grouping	None	Automatic topic discovery
Evaluation	Manual	Built-in (Recall, MRR, NDCG)
Statistical proof	None	t-test, Wilcoxon, effect sizes
Paper output	None	LaTeX + Markdown auto-generated
Datasets	Bring your own	MS MARCO, NQ, SQuAD, HotpotQA, TriviaQA built-in
Dependencies	Heavy (PyTorch, transformers)	Lightweight (numpy, scikit-learn, scipy)

Key design decisions

No GPU required — uses TF-IDF and a lightweight topic model (no PyTorch, no sentence-transformers)
Self-contained — all benchmark datasets have built-in fallback data, so experiments run offline
Research-ready — statistical tests and paper generation make it suitable for academic submission
AGPL-3.0 — open source; any service built on it must also be open source

Who it's for

Researchers benchmarking retrieval systems
Engineers who want a lightweight RAG baseline without heavy ML infrastructure
Anyone who needs reproducible, statistically validated RAG experiments with automatic paper output

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

See the LICENSE file for the full license text.

What AGPL-3.0 means

Anyone can view, use, and modify the code
Any modified version used to provide a network service must release its source code
Companies cannot embed this in proprietary software without open-sourcing their product

For commercial licensing enquiries, please contact the project maintainers.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.3

Mar 13, 2026

1.0.2

Mar 13, 2026

This version

1.0.1

Mar 13, 2026

1.0.0

Mar 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topic_rag-1.0.1.tar.gz (45.9 kB view details)

Uploaded Mar 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

topic_rag-1.0.1-py3-none-any.whl (42.4 kB view details)

Uploaded Mar 13, 2026 Python 3

File details

Details for the file topic_rag-1.0.1.tar.gz.

File metadata

Download URL: topic_rag-1.0.1.tar.gz
Upload date: Mar 13, 2026
Size: 45.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for topic_rag-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`38f6e3590bde108ae17be4bdbab984c9c9d9c8476d16de01e56283bfa1b20527`
MD5	`3c9e3eddd25afc7a789d71885ff8bdd2`
BLAKE2b-256	`2552744a7737d9ff20ce66a54c1e663bd6e16c629a10007ed95d5b14873982a2`

See more details on using hashes here.

File details

Details for the file topic_rag-1.0.1-py3-none-any.whl.

File metadata

Download URL: topic_rag-1.0.1-py3-none-any.whl
Upload date: Mar 13, 2026
Size: 42.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for topic_rag-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5000c4b5d47ede91fee84675322036ff8621fa40855b4e9494d2d1e1d8660214`
MD5	`82af309ade19626e08e9d3d964d1adf6`
BLAKE2b-256	`f03c44c286703f8f8a37aa9239389fa342995ed7c38db95f29b917f6530deeeb`

See more details on using hashes here.

topic-rag 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

topic-rag — Topic-Enhanced Retrieval-Augmented Generation

What it does

Core usage

1. Basic retrieval

2. Benchmarking against standard datasets

3. Statistical validation

4. Paper generation

Advantages over plain RAG

Key design decisions

Who it's for

License

What AGPL-3.0 means

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes