LangChain-compatible hybrid knowledge base retrieval with user-curated personal KB layer

These details have not been verified by PyPI

Project links

Project description

langchain-rag-wiki

RAG meets wiki-style provenance.
Surfaces documents, learns from usage, and lets users build personal knowledge bases on top of shared vector stores.

Installation · Quickstart · How It Works · Real Vector DB Setup · Configuration · Contributing

What Is This?

Standard RAG is stateless — every query hits the vector database fresh, returns probabilistic results, and forgets what was useful last time.

langchain-rag-wiki wraps any LangChain retriever and adds a personal knowledge layer on top:

Documents retrieved repeatedly get suggested for saving after a configurable threshold
Once saved, their chunks are searched semantically using cosine similarity — no external infrastructure, just numpy
Every response shows a provenance block so you always know which document was used and from where
A decay model keeps the cache honest — stale documents fade out automatically, and docs that consistently miss queries get demoted immediately

It's a drop-in BaseRetriever. Any LangChain chain, agent, or RAG pipeline that accepts a retriever works with it immediately.

Installation

pip install langchain-rag-wiki

With optional backends:

pip install 'langchain-rag-wiki[sqlite]'     # SQLite persistent store
pip install 'langchain-rag-wiki[redis]'       # Redis distributed store
pip install 'langchain-rag-wiki[scheduler]'   # APScheduler decay jobs
pip install 'langchain-rag-wiki[llama]'       # LlamaIndex adapter
pip install 'langchain-rag-wiki[dev]'         # all of the above + pytest + fakeredis

Quickstart

Zero infrastructure — uses in-memory state. No API keys needed.

from rag_wiki import RagWikiRetriever, RagWikiRetrieverConfig, MemoryStateStore

retriever = RagWikiRetriever(
    user_id          = "user-123",
    global_retriever = your_existing_retriever,  # any LangChain BaseRetriever
    state_store      = MemoryStateStore(),
    config           = RagWikiRetrieverConfig(fetch_threshold=3),
)

docs = retriever.invoke("quarterly earnings report")
print(retriever.last_provenance.render())  # see what was retrieved and from where

Run the bundled demo (no API keys, no services required):

python example.py

How It Works

Retrieval Flow

Every query goes through two stages:

1. Semantic cache search  → PINNED + CLAIMED docs searched via cosine similarity
                            (keyword fallback if no embedding model available)
2. Global RAG fallback    → your existing vector retriever

Chunk vectors are accumulated upon user claim — when a user accepts a suggestion, the full document is automatically chunked, embedded, and stored in the local cache. From then on, it can be searched semantically without hitting the global DB.

Document Lifecycle

Documents move through six states based on usage patterns:

GLOBAL → SURFACED → SUGGESTED → CLAIMED → PINNED
                ↘               ↙
                  DEMOTED ──────

State	What it means	Retrieval path
`GLOBAL`	In the shared vector DB only	Vector similarity search
`SURFACED`	Retrieved at least once; counter is active	Vector search (counter increments)
`SUGGESTED`	Fetch count ≥ threshold; user prompted	Vector search (pending decision)
`CLAIMED`	User saved it; chunks in local cache	Semantic chunk search
`PINNED`	Consistently relevant; auto-promoted	Semantic chunk search (always included)
`DEMOTED`	Usage dropped or cache misses exceeded; evicted	Returns to vector search

Semantic Cache Search

When a document is CLAIMED or PINNED, its chunks are searched using proper cosine similarity:

# Embed query once, reused across all cached docs
query_vec = embedding_model.embed_query(query)

# For each cached doc: score all chunks, inject top-k above threshold
scores = cosine_similarity(query_vec, chunk_matrix)
top_chunks = chunks[scores >= similarity_threshold][:local_top_k]

Vectors are always normalised before the dot product, so scores are in [-1, 1] regardless of the embedding model's output magnitude.

The embedding model is resolved automatically — if your global retriever has an embeddings attribute, it's reused. You can also pass one explicitly. If neither is available, the system falls back to keyword matching silently.

Auto-Demotion

If a cached document's chunks consistently fail the similarity check across max_cache_miss_streak consecutive queries, it is demoted immediately — no need to wait for the daily decay job. Its chunk index is deleted and it returns to the global vector search pool.

Threshold and Suggestion

The save suggestion fires when a document has been retrieved fetch_threshold times. If the user declines, the next suggestion is scheduled at fetch_count + threshold × 2 — doubling the gap each time. The reset_threshold setting resets the fetch count if the document hasn't appeared in that many consecutive queries.

Decay Model

A background job recomputes each document's relevance score daily:

decay_score = weighted_avg(
    recency_factor   = exp(-λ × days_since_last_fetch),   weight: 0.40
    frequency_factor = min(fetch_count / freq_cap, 1.0),  weight: 0.30
    explicit_signal  = thumbs_up / thumbs_down value,     weight: 0.20
    chunk_hit_rate   = fraction of chunks ever matched,   weight: 0.15
)

Documents above pin_threshold (0.85) get auto-pinned after pin_hold_days. Documents below demotion_threshold (0.15) get evicted after demotion_hold_days.

Provenance Block

After every query, retriever.last_provenance.render() outputs:

────────────────────────────────────────────────────────────
📄 Sources used in this response
  • Kubernetes Pod Basics [from your KB]
    Chunks 0, 2  |  Saved to your KB
  • Docker Image Guide
    Full document  |  SURFACED (fetched 2×)

💡 "Docker Image Guide" has appeared in your queries 3 times.
   Would you like to save it to your personal knowledge base?
────────────────────────────────────────────────────────────

Also available as a structured dict via retriever.last_provenance.to_dict().

Benchmark Results

We benchmark the performance of langchain-rag-wiki against a standard "Plain RAG" pipeline using an isolated, real-world query set. The benchmark evaluates both performance (response time, token usage) and retrieval quality using the cross-encoder/ms-marco-MiniLM-L-6-v2 model for calibrated relevance scoring.

Radar Summary

Summary against Pinecone (remote, cold start):

Context Relevance: RAG-Wiki consistently retrieves far more relevant context from the personal knowledge base (avg score 0.327 vs Plain RAG's 0.161). By fetching chunks proven to be useful for the user, it achieves significantly better context fit without introducing noise.
Chunk Efficiency: RAG-Wiki injects fewer, higher-quality chunks (avg 2.9 chunks vs Plain RAG's fixed 5.0 chunks). This leads to fewer tokens used in context, saving LLM execution cost.
Cache Hit Rate: 100% of the test queries hit the local cache once the documents were populated, completely bypassing the remote Pinecone index for those documents.
Overhead: RAG-Wiki adds a modest ~100ms overhead for cache maintenance and semantic searching on a cold start, which amortizes heavily as the cache warms up.

Connecting to a Real Vector DB

Ingesting Documents

Use the included ingest.py as a starting point. It loads .txt files, splits them into chunks, injects the required metadata, and stores them in Chroma using Ollama embeddings:

python ingest.py

The metadata fields doc_id, doc_title, and doc_path on each chunk connect the vector store to the lifecycle tracking system. Without them, fetch counting and cache promotion won't work.

Chroma + Ollama (fully local, no API key)

The embedding model is auto-resolved from the Chroma vectorstore — no need to pass it separately.

from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings
from rag_wiki import RagWikiRetriever, RagWikiRetrieverConfig
from rag_wiki.storage.sqlite import SQLiteStateStore
import os

os.makedirs("wiki/documents", exist_ok=True)

vectorstore = Chroma(
    collection_name    = "my_docs",
    embedding_function = OllamaEmbeddings(model="nomic-embed-text:latest"),
    persist_directory  = "./chroma_db",
)

retriever = RagWikiRetriever(
    user_id          = "user-1",
    global_retriever = vectorstore.as_retriever(search_kwargs={"k": 5}),
    state_store      = SQLiteStateStore("sqlite:///./wiki/rag_wiki_state.db"),
    config           = RagWikiRetrieverConfig(
        fetch_threshold      = 3,
        similarity_threshold = 0.75,
        wiki_save_dir        = "wiki/documents",
    ),
)

docs = retriever.invoke("your query here")
print(retriever.last_provenance.render())

Passing an Embedding Model Explicitly

If your retriever doesn't expose an embeddings attribute, pass the model directly:

from langchain_openai import OpenAIEmbeddings

retriever = RagWikiRetriever(
    user_id          = "user-1",
    global_retriever = your_retriever,
    embedding_model  = OpenAIEmbeddings(),
)

Pinecone

from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
from rag_wiki import RagWikiRetriever, MemoryStateStore

vectorstore = PineconeVectorStore(
    index_name = "my-index",
    embedding  = OpenAIEmbeddings(),
)

retriever = RagWikiRetriever(
    user_id          = "user-1",
    global_retriever = vectorstore.as_retriever(search_kwargs={"k": 5}),
    state_store      = MemoryStateStore(),
)

User Actions

# Save document to personal KB (chunks already accumulated at retrieval time)
retriever.accept_suggestion(doc_id="doc-1")

# Decline — next suggestion scheduled at escalating interval
retriever.decline_suggestion(doc_id="doc-1")

# Explicit positive signal — boosts decay score
retriever.thumbs_up(doc_id="doc-1")

# Explicit negative signal — reduces decay score
retriever.thumbs_down(doc_id="doc-1")

# Always include this document in every query context
retriever.force_pin(doc_id="doc-1")

# Remove from personal KB entirely (also deletes chunk index)
retriever.force_remove(doc_id="doc-1")

Configuration

`RagWikiRetrieverConfig`

Parameter	Type	Default	Description
`fetch_threshold`	`int`	`3`	Fetch count before save suggestion fires
`reset_threshold`	`int`	`3`	Queries without a hit before fetch count resets
`similarity_threshold`	`float`	`0.75`	Cosine similarity cutoff for cache chunk hits
`local_top_k`	`int`	`3`	Max chunks to inject per cached doc per query
`wiki_save_dir`	`str \| None`	`None`	Directory to save accepted doc copies; `None` disables
`no_resiluggest_days`	`int`	`30`	Deprecated — kept for API compatibility
`decay`	`DecayConfig`	(see below)	Decay engine settings

`DecayConfig`

Parameter	Type	Default	Description
`w_recency`	`float`	`0.40`	Weight for recency in decay score
`w_frequency`	`float`	`0.30`	Weight for frequency
`w_explicit`	`float`	`0.20`	Weight for explicit user signals
`w_chunk_hit`	`float`	`0.15`	Weight for chunk hit rate
`max_cache_miss_streak`	`int`	`10`	Consecutive cache misses before immediate demotion
`decay_lambda`	`float`	`0.05`	Decay steepness λ (half-life ≈ 14 days)
`freq_cap`	`int`	`20`	Max fetch count for frequency normalisation
`pin_threshold`	`float`	`0.85`	Score above which doc is auto-pinned
`demotion_threshold`	`float`	`0.15`	Score below which doc is auto-demoted
`pin_hold_days`	`int`	`7`	Days score must hold above threshold before pin fires
`demotion_hold_days`	`int`	`3`	Days score must hold below threshold before demotion fires

Storage Backends

Memory (default, zero dependencies)

from rag_wiki import MemoryStateStore
store = MemoryStateStore()

Thread-safe dict. Data lost on process restart. Best for development and testing.

SQLite (single-node persistent)

from rag_wiki.storage.sqlite import SQLiteStateStore
store = SQLiteStateStore("sqlite:///./wiki/rag_wiki_state.db")

Persists to a local file. Good for single-server production deployments.

Redis (distributed)

import redis
from rag_wiki.storage.redis_store import RedisStateStore

client = redis.Redis(host="localhost", port=6379, db=0)
store = RedisStateStore(client)

Required when running multiple API workers or load-balancing. Uses Redis hashes and sets for fast state queries.

Decay Scheduler

from rag_wiki import DecayEngine, DecayConfig, MemoryStateStore
from rag_wiki.lifecycle.state_machine import StateMachine
from rag_wiki.scheduler import DecayScheduler

store     = MemoryStateStore()
engine    = DecayEngine(store, StateMachine(), config=DecayConfig())
scheduler = DecayScheduler(engine, store, backend="simple", interval_hours=24)

scheduler.start()
scheduler.run_now("user-123")   # manual trigger for one user
scheduler.run_all_users()       # all users with CLAIMED or PINNED docs
scheduler.stop()

Use backend="apscheduler" for production (requires pip install 'langchain-rag-wiki[scheduler]').

LlamaIndex Adapter

from llama_index.core import VectorStoreIndex
from rag_wiki.adapters.llamaindex import LlamaIndexRetrieverAdapter
from rag_wiki import RagWikiRetriever

adapter = LlamaIndexRetrieverAdapter(
    llama_retriever=index.as_retriever(similarity_top_k=5)
)

retriever = RagWikiRetriever(
    user_id          = "user-1",
    global_retriever = adapter,
)

Requires pip install 'langchain-rag-wiki[llama]'.

Running Tests

pip install 'langchain-rag-wiki[dev]'
pytest tests/ -v

Project Structure

rag_wiki/
├── __init__.py              # public exports
├── retriever.py             # RagWikiRetriever — main entry point
├── scheduler.py             # DecayScheduler (simple + APScheduler backends)
├── storage/
│   ├── base.py              # StateStore ABC + UserDocRecord + DocumentState
│   ├── chunk_store.py       # ChunkStore — chunk-level vector cache (disk + memory)
│   ├── memory.py            # MemoryStateStore (default, zero deps)
│   ├── sqlite.py            # SQLiteStateStore
│   └── redis_store.py       # RedisStateStore
├── lifecycle/
│   ├── state_machine.py     # pure transition logic
│   ├── fetch_counter.py     # threshold tracking + suggestion events
│   └── decay_engine.py      # scoring + pin/demotion transitions
├── transparency/
│   └── provenance.py        # ProvenanceBlock + ProvenanceBuilder
└── adapters/
    └── llamaindex.py        # LlamaIndex → LangChain adapter

Contributing

Contributions are very welcome. See CONTRIBUTING.md for the full guide.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.3

May 5, 2026

0.2.2

Apr 30, 2026

0.2.1

Apr 29, 2026

0.2.0

Apr 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_rag_wiki-0.2.3.tar.gz (2.3 MB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_rag_wiki-0.2.3-py3-none-any.whl (40.2 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file langchain_rag_wiki-0.2.3.tar.gz.

File metadata

Download URL: langchain_rag_wiki-0.2.3.tar.gz
Upload date: May 5, 2026
Size: 2.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for langchain_rag_wiki-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`8decd84162a974f8e797e65f4b02bfe44a315af591cb7d38b275955aba0b5e48`
MD5	`0f31ea8fa863d75fbc8d4ea909d8bdb3`
BLAKE2b-256	`aee9e8b2cef398b24059ba808569d931031f2ec8b169b021b01291123b7b168d`

See more details on using hashes here.

File details

Details for the file langchain_rag_wiki-0.2.3-py3-none-any.whl.

File metadata

Download URL: langchain_rag_wiki-0.2.3-py3-none-any.whl
Upload date: May 5, 2026
Size: 40.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for langchain_rag_wiki-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`33dd6617f52bf8e3aebb553eb02541cae516765ece33601fe13bc2e58cbcdc1a`
MD5	`b61edd1c855aa7cf1fadf655f85472a0`
BLAKE2b-256	`30b08fb333b9b706eb43a8de1ecaafed4995dd219589728262cd5f2cb4e4e977`

See more details on using hashes here.

langchain-rag-wiki 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

langchain-rag-wiki

What Is This?

Installation

Quickstart

How It Works

Retrieval Flow

Document Lifecycle

Semantic Cache Search

Auto-Demotion

Threshold and Suggestion

Decay Model

Provenance Block

Benchmark Results

Connecting to a Real Vector DB

Ingesting Documents

Chroma + Ollama (fully local, no API key)

Passing an Embedding Model Explicitly

Pinecone

User Actions

Configuration

RagWikiRetrieverConfig

DecayConfig

Storage Backends

Memory (default, zero dependencies)

SQLite (single-node persistent)

Redis (distributed)

Decay Scheduler

LlamaIndex Adapter

Running Tests

Project Structure

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`RagWikiRetrieverConfig`

`DecayConfig`