Reverse RAG database - index questions, not chunks

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sreeraman

These details have not been verified by PyPI

Project description

Isotope

⚠️ Alpha Release (v0.1.0): APIs are stabilizing but may change. Production use at your own risk.

Your RAG is searching for answers. It should be matching questions.

Traditional RAG embeds your documents and hopes user questions land nearby. But questions and statements live in different semantic spaces—you're matching apples to oranges.

Isotope breaks your documents into atomic facts, generates the questions each fact answers, then indexes those questions. When users ask something, they match question-to-question. Same semantic space. Tighter matches. Better retrieval.

Why "Isotope"?

In chemistry, isotopes are variants of the same element—same core identity, different configurations. Isotope does the same for your knowledge: it takes each atomic fact and generates multiple question "isotopes"—different phrasings that all point back to the same truth.

One fact. Many questions. All paths lead to the right answer.

Why Isotope?

✅ Question-to-question matching - Tighter semantic alignment than traditional RAG
✅ Confidence scores you can trust - Know when retrieval quality is low
✅ Pluggable architecture - Bring your own LLM provider, embeddings, vector store
✅ CLI + Python API - Use from command line or integrate into your app
✅ Research-backed - Implements peer-reviewed approach (arXiv:2405.12363)
✅ Optional dependencies - Install only what you need

Installation
Quick Start
How It Works
Performance
When to Use
Documentation
Contributing
Citation
License

The 30-Second Pitch

Traditional RAG:  "Who created Python?" → search chunks → hope for the best
                   ↓
                   Semantic gap between questions and statements

Isotope:          "Who created Python?" → search questions → get "Who created Python?"
                   ↓
                   Same semantic space = confident matches

Installation

Requirements: Python 3.11+ and an LLM provider (OpenAI, Anthropic, etc. via LiteLLM)

Quick start (recommended):

pip install isotope-rag[all]
export OPENAI_API_KEY=your-key-here  # or other LiteLLM-compatible provider

Minimal install:

pip install isotope-rag  # Core only - bring your own provider/storage
pip install isotope-rag[litellm,chroma]  # Add LiteLLM + ChromaDB

Optional extras:

[all] - Everything (recommended for new users)
[litellm] - LiteLLM integration for 100+ LLM providers
[chroma] - ChromaDB vector store
[cli] - Command-line interface
[loaders] - PDF/HTML document loaders
[dev] - Development tools (pytest, ruff, mypy)

Quick Start

Option 1: Command Line (fastest)

# 0. Configure models (writes isotope.yaml)
isotope init --provider litellm --llm-model openai/gpt-4o --embedding-model openai/text-embedding-3-small

# 1. Ingest your docs
isotope ingest docs/

# 2. Ask questions
isotope query "How do I authenticate?"

# 3. See what's indexed
isotope status

Expected output:

📊 Indexed: 42 chunks → 156 atoms → 2,340 questions
🔍 Top result: "How to authenticate users" (confidence: 0.92)
📄 Source: docs/authentication.md

Option 2: Python API (for integration)

from isotope import Isotope, Chunk, LiteLLMProvider, LocalStorage

# Initialize
iso = Isotope(
    provider=LiteLLMProvider(
        llm="openai/gpt-4o",
        embedding="openai/text-embedding-3-small",
    ),
    storage=LocalStorage("./my_data"),
)

# Ingest
ingestor = iso.ingestor()
chunks = [Chunk(
    content="Python was created by Guido van Rossum in 1991.",
    source="wiki"
)]
ingestor.ingest_chunks(chunks)

# Query
retriever = iso.retriever(llm_model="openai/gpt-4o")
response = retriever.get_answer("Who invented Python?")

print(response.answer)   # "Python was created by Guido van Rossum."
print(response.results)  # [SearchResult(chunk=..., score=0.94)]

What's happening here?

Chunks are broken into atoms ("Python was created by Guido van Rossum" + "Python was created in 1991")
Questions are generated for each atom ("Who created Python?", "When was Python created?")
Questions are embedded and indexed in ChromaDB
Your query matches question-to-question
The LLM synthesizes an answer from matching chunks

How It Works

┌──────────┐    ┌────────┐    ┌───────┐    ┌───────────┐    ┌─────────┐
│ Document │───▶│ Chunks │───▶│ Atoms │───▶│ Questions │───▶│  Index  │
└──────────┘    └────────┘    └───────┘    └───────────┘    └─────────┘
                                                                  │
┌──────────┐    ┌────────┐    ┌───────────────────────────────────┘
│  Answer  │◀───│ Chunks │◀───│ Match user query against questions
└──────────┘    └────────┘

Atomize → Break content into atomic facts
Generate → Create questions each fact answers (15 per atom by default)
Embed & Index → Store question embeddings
Query → User questions match against indexed questions
Retrieve → Return the chunks that answer matched questions

Based on "Question-Based Retrieval using Atomic Units for Enterprise RAG"

Performance

Based on arXiv:2405.12363:

Metric	Reverse RAG	Traditional RAG
Precision@5	Higher	Baseline
MRR	Higher	Baseline
Score Calibration	Meaningful	Less calibrated

Key findings:

Question diversity matters: Generating diverse questions improves coverage
Deduplication helps: 50% retention after deduplication maintains maximum retrieval performance
Confidence scores are meaningful: Low scores genuinely indicate poor matches, unlike traditional RAG where scores can be misleadingly high

See the paper for full benchmarks on MS MARCO and other datasets.

Documentation

📚 Learn the Concepts

Reverse RAG Explained - The paper and core insight
Architecture - System design and components

🛠️ Guides & How-Tos

Configuration - Settings and environment variables
Atomization Strategies - Sentence vs LLM atomization
CLI Reference - Command-line usage

🎓 Tutorials

Getting Started - Your first 10 minutes
Coming soon: Building a FAQ bot, Hybrid retrieval, Custom providers

🔌 API Reference

Coming soon: Full API documentation

When to Use Isotope

Great fit:

✅ FAQ-style content where questions are predictable (support docs, technical documentation)
✅ Knowledge bases with factual, Q&A-structured information
✅ When precision matters more than recall (legal, medical, compliance)
✅ When you want meaningful confidence scores (threshold-based filtering)

Consider traditional RAG or hybrid approach when:

❌ Queries are exploratory and unpredictable
❌ You can't afford to miss any relevant content (broad discovery)
❌ Content doesn't map naturally to Q&A format (narrative, creative writing)
❌ You need semantic search over unstructured brainstorming notes

Trade-offs:

Isotope trades recall for precision. If users ask questions you didn't anticipate, scores will be low. But you'll know they're low—the confidence scores are meaningful.

Mitigation: Isotope works great in hybrid mode - use question-matching for high-confidence results, fall back to chunk search for low scores.

Other strategies:

Query expansion (rephrase queries before search)
Re-ranking with cross-encoders

See Limitations for details.

Contributing

Isotope is in active development and we welcome contributions!

Ways to contribute:

🐛 Report bugs
💡 Request features
📖 Improve documentation
🔧 Submit pull requests

Development setup:

git clone https://github.com/sreejithraman/isotope.git
cd isotope
pip install -e ".[dev]"
pre-commit install
pytest  # Run tests

See CONTRIBUTING.md for detailed guidelines (coming soon).

Citation

If you use Isotope in your research, please cite the paper:

@article{raina2024question,
  title={Question-Based Retrieval using Atomic Units for Enterprise RAG},
  author={Raina, Vatsal and Gales, Mark},
  journal={arXiv preprint arXiv:2405.12363},
  year={2024}
}

And consider citing this implementation:

@software{isotope_rag,
  title={Isotope: Reverse RAG Database},
  author={Raman, Sree},
  year={2026},
  url={https://github.com/sreejithraman/isotope}
}

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sreeraman

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jan 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isotope_rag-0.1.0.tar.gz (79.1 kB view details)

Uploaded Jan 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

isotope_rag-0.1.0-py3-none-any.whl (61.5 kB view details)

Uploaded Jan 5, 2026 Python 3

File details

Details for the file isotope_rag-0.1.0.tar.gz.

File metadata

Download URL: isotope_rag-0.1.0.tar.gz
Upload date: Jan 5, 2026
Size: 79.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for isotope_rag-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d5fa0e180329efeb6ee1a18d7007da101fba970eff4713d943af99eb41fcb398`
MD5	`032f0df713637cdf54a7711a97a81244`
BLAKE2b-256	`8c47b65d9ddcaf43bcee1a0885e69d7f005c1838b18a1ea68341fe0675b4a9bf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for isotope_rag-0.1.0.tar.gz:

Publisher: publish.yml on sreejithraman/isotope

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: isotope_rag-0.1.0.tar.gz
- Subject digest: d5fa0e180329efeb6ee1a18d7007da101fba970eff4713d943af99eb41fcb398
- Sigstore transparency entry: 796257327
- Sigstore integration time: Jan 5, 2026
Source repository:
- Permalink: sreejithraman/isotope@88190e1a0a1592194dd4cd78de529a97f14314fa
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/sreejithraman
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@88190e1a0a1592194dd4cd78de529a97f14314fa
- Trigger Event: push

File details

Details for the file isotope_rag-0.1.0-py3-none-any.whl.

File metadata

Download URL: isotope_rag-0.1.0-py3-none-any.whl
Upload date: Jan 5, 2026
Size: 61.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for isotope_rag-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`202ef91ca851f785abba9c64cf1b89d01919f2d96761b15c9782ba6ba160f19e`
MD5	`9c191e0d3ddab78069024cdfc05534af`
BLAKE2b-256	`1a4424485ded02fe4dea0b6fbc42e31df7098769d90201e8273a4582f7690e50`

See more details on using hashes here.

Provenance

The following attestation bundles were made for isotope_rag-0.1.0-py3-none-any.whl:

Publisher: publish.yml on sreejithraman/isotope

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: isotope_rag-0.1.0-py3-none-any.whl
- Subject digest: 202ef91ca851f785abba9c64cf1b89d01919f2d96761b15c9782ba6ba160f19e
- Sigstore transparency entry: 796257331
- Sigstore integration time: Jan 5, 2026
Source repository:
- Permalink: sreejithraman/isotope@88190e1a0a1592194dd4cd78de529a97f14314fa
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/sreejithraman
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@88190e1a0a1592194dd4cd78de529a97f14314fa
- Trigger Event: push

isotope-rag 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Isotope

Why "Isotope"?

Why Isotope?

Table of Contents

The 30-Second Pitch

Installation

Quick Start

Option 1: Command Line (fastest)

Option 2: Python API (for integration)

How It Works

Performance

Documentation

When to Use Isotope

Contributing

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance