Skip to main content

A dead simple RAG pipeline with persistent memory and conversation namespaces.

Project description

PyPI version

NoBrainer RAG

A dead simple RAG (Retrieval-Augmented Generation) system that just works. Built for developers who want to add memory to their AI applications without overthinking it.

Why NoBrainer?

  • ๐Ÿš€ Simple API - Just 3 methods: add(), query(), clear()
  • ๐Ÿ”’ Namespace Isolation - Each conversation gets its own isolated namespace
  • ๐Ÿ’พ Persistent Memory - Data survives even if your object doesn't
  • ๐ŸŽฏ Smart Retrieval - RecursiveCharacterTextSplitter + FlashRank reranking built-in
  • โšก Flexible - Swap embedding models, toggle reranking, adjust on the fly
  • ๐ŸŒ Cloud Agnostic - Works with AWS, GCP, or Azure Pinecone regions

Installation

pip install NoBrainerRag

Or clone and install:

git clone https://github.com/AarushSrivatsa/NoBrainerRag.git
cd NoBrainerRag
pip install -e .

What Makes This Actually Good

Most RAG tutorials give you basic vector search and call it a day. NoBrainer comes with production-grade features out of the box:

๐Ÿ“Š RecursiveCharacterTextSplitter (Smart Chunking)

Not your average "split every N characters" nonsense. This intelligently splits text by:

  1. Paragraphs first (\n\n)
  2. Then sentences (.)
  3. Then clauses (,)
  4. Finally words as a last resort

Why this matters: Preserves semantic meaning. You don't get chunks that cut off mid-sentence or split important context.

๐ŸŽฏ FlashRank Reranking (Better Results)

Every single retrieval automatically:

  1. Gets 10 candidate chunks from Pinecone
  2. Runs them through FlashRank (ms-marco-MiniLM-L-12-v2)
  3. Returns only the top 4 most relevant

Why this matters: Vector similarity isn't perfect. Reranking catches what embeddings miss, giving you the actually relevant results.

You can toggle this on/off anytime - just set rag.use_reranking = False if you want raw vector search.

๐Ÿง  Flexible Embedding Models

Defaults to nomic-embed-text:v1.5 via Ollama - one of the best open-source embedding models. Runs locally.

But you can use ANY embedding model:

  • OpenAI embeddings
  • Cohere embeddings
  • HuggingFace models
  • Any LangChain-compatible embedding

Why this matters: No vendor lock-in. Use what works best for your use case.


Prerequisites

1. Pinecone API Key (Required)

Get your free API key from Pinecone.

Create a .env file:

PINECONE_API_KEY=your_key_here

2. Ollama with Embedding Model (Only if using default embeddings)

If you're using the default Ollama embeddings, install Ollama and pull the model:

ollama pull nomic-embed-text:v1.5

If you're bringing your own embedding model, skip this step.


Quick Start

from NoBrainerRag import NoBrainerRag

# Create a RAG instance with a namespace
rag = NoBrainerRag(
    namespace="user_123",
    index_name="my-rag-index"
)

# Insert some knowledge
rag.add("Paris is the capital of France.")
rag.add("Python is a programming language created by Guido van Rossum.")

# Retrieve relevant info
result = rag.query("What is the capital of France?")
print(result)

# Delete when done
rag.clear()

๐Ÿ”ฅ The Magic: Persistent Memory

Here's the cool part - your data persists even if the object is gone:

# Session 1: Insert data
rag = NoBrainerRag(namespace="user_123", index_name="my-index")
rag.add("Important information here")
del rag  # Object is destroyed

# Session 2: Access the same data later
rag = NoBrainerRag(namespace="user_123", index_name="my-index")  # Same namespace!
result = rag.query("tell me about important information")
# Your data is still there! ๐ŸŽ‰

๐Ÿ—‚๏ธ How Pinecone Storage Works

Index = The Database
Your index_name is the top-level Pinecone index where ALL your data lives. Think of it as your database.

Namespace = The Conversation
Your namespace parameter creates an isolated namespace INSIDE that index. Each conversation is completely isolated.

Pinecone Index: "my-chatbot-memory"
โ”œโ”€โ”€ Namespace: "user_123" (namespace="user_123")
โ”‚   โ”œโ”€โ”€ chunk_1: "Paris is the capital..."
โ”‚   โ”œโ”€โ”€ chunk_2: "Python was created..."
โ”‚   โ””โ”€โ”€ chunk_3: "Machine learning is..."
โ”œโ”€โ”€ Namespace: "user_456" (namespace="user_456")
โ”‚   โ”œโ”€โ”€ chunk_1: "Tokyo is in Japan..."
โ”‚   โ””โ”€โ”€ chunk_2: "JavaScript runs in..."
โ””โ”€โ”€ Namespace: "doc_789" (namespace="doc_789")
    โ””โ”€โ”€ chunk_1: "This document explains..."

What this means:

  • One Pinecone index can hold thousands of namespaces
  • Each namespace is completely isolated (no data leakage)
  • Same index + same namespace = same memory, always
  • Delete a namespace = only that data is wiped

โš ๏ธ CRITICAL: How Memory Persistence Works

To access the exact same memory across sessions, you MUST have:

โœ… Same Pinecone index (index_name parameter)
โœ… Same namespace (namespace parameter)

# These will access THE SAME memory:
rag1 = NoBrainerRag(namespace="user_123", index_name="my-index")
rag2 = NoBrainerRag(namespace="user_123", index_name="my-index")
# โœ… Same data

# These will have DIFFERENT memory:
rag1 = NoBrainerRag(namespace="user_123", index_name="my-index")
rag2 = NoBrainerRag(namespace="user_456", index_name="my-index")
# โŒ Different namespace = different memory

# These will also have DIFFERENT memory:
rag1 = NoBrainerRag(namespace="user_123", index_name="my-index")
rag2 = NoBrainerRag(namespace="user_123", index_name="other-index")
# โŒ Different index = completely different database = different memory

The Rule: Same index + same namespace = same memory. Change either and you get a fresh memory space.


API Reference

Initialization

rag = NoBrainerRag(
    namespace="user_123",                   # Required: Unique namespace identifier
    index_name="my-rag-index",              # Required: Pinecone index name
    embedding_model=None,                   # Optional: Custom embedding model
    chunk_size=400,                         # Optional: Size of text chunks
    chunk_overlap=75,                       # Optional: Overlap between chunks
    separators=["\n\n", "\n", ".", ",", " ", ""],  # Optional: Split points
    base_k=10,                              # Optional: Initial retrieval count
    top_n=4,                                # Optional: Results after reranking
    use_reranking=True,                     # Optional: Enable FlashRank
    rerank_model="ms-marco-MiniLM-L-12-v2", # Optional: Reranking model
    pinecone_cloud="aws",                   # Optional: Cloud provider
    pinecone_region="us-east-1",            # Optional: Region
    similarity_metric="cosine"              # Optional: Vector similarity metric
)

Methods

add(text: str)
Insert text into the vector database. Automatically chunks and embeds it.

rag.add("Your text here")
# Returns: "Insertion Successful: 3 chunks created"

query(query: str)
Retrieve relevant content for a query.

results = rag.query("What is the capital of France?")
# Returns formatted string with top relevant chunks

clear()
Delete all documents in this namespace.

rag.clear()
# Returns: "RAG memory of namespace 'user_123' was successfully wiped out"

Advanced Usage

Custom Embedding Models

from langchain_openai import OpenAIEmbeddings
from langchain_cohere import CohereEmbeddings

# Use OpenAI embeddings
rag = NoBrainerRag(
    namespace="user_123",
    index_name="my-index",
    embedding_model=OpenAIEmbeddings(model="text-embedding-3-small")
)

# Or Cohere
rag = NoBrainerRag(
    namespace="user_123",
    index_name="my-index",
    embedding_model=CohereEmbeddings(model="embed-english-v3.0")
)

Disable Reranking for Speed

# At initialization
rag = NoBrainerRag(
    namespace="user_123",
    index_name="my-index",
    use_reranking=False  # Skip reranking for faster results
)

# Or toggle it anytime
rag.use_reranking = False
result = rag.query("fast query")  # Uses raw vector search

rag.use_reranking = True
result = rag.query("precise query")  # Uses reranking

Adjust Retrieval Parameters on the Fly

rag = NoBrainerRag(namespace="user_123", index_name="my-index")

# Start with defaults (base_k=10, top_n=4)
result = rag.query("my query")

# Need more context? Change it
rag.base_k = 20
rag.top_n = 8
result = rag.query("complex query")  # Now retrieves more chunks

# Back to focused results
rag.base_k = 5
rag.top_n = 2
result = rag.query("simple query")

Multi-Region Setup

# Use GCP in Europe
rag = NoBrainerRag(
    namespace="user_123",
    index_name="my-index",
    pinecone_cloud="gcp",
    pinecone_region="europe-west1"
)

# Or Azure in East US
rag = NoBrainerRag(
    namespace="user_123",
    index_name="my-index",
    pinecone_cloud="azure",
    pinecone_region="eastus"
)

Custom Chunking Strategy

# Larger chunks with more overlap
rag = NoBrainerRag(
    namespace="user_123",
    index_name="my-index",
    chunk_size=800,
    chunk_overlap=150,
    separators=["\n\n\n", "\n\n", "\n"]  # Only split on paragraph breaks
)

Common Use Cases

Chatbot with Memory

# When user starts chatting
rag = NoBrainerRag(namespace=user.id, index_name="chatbot-memory")

# As conversation progresses
rag.add(f"User said: {user_message}")
rag.add(f"Assistant replied: {bot_response}")

# When generating responses
context = rag.query(user_message)
# Feed context to your LLM

Document Q&A

rag = NoBrainerRag(namespace="doc_session_456", index_name="documents")

# Load your document
with open("document.txt") as f:
    content = f.read()
    rag.add(content)

# Ask questions
answer = rag.query("What is the main topic?")

Multi-User Application

# Each user gets isolated memory (separate namespace)
user1_rag = NoBrainerRag(namespace=f"user_{user1.id}", index_name="app-memory")
user2_rag = NoBrainerRag(namespace=f"user_{user2.id}", index_name="app-memory")

# Their data never mixes - guaranteed namespace isolation

Under the Hood

NoBrainer RAG uses battle-tested, production-grade tools so you don't have to piece them together yourself:

  • Embeddings: Ollama with nomic-embed-text:v1.5 (768 dimensions, state-of-the-art, runs locally)
  • Chunking: LangChain's RecursiveCharacterTextSplitter - respects semantic boundaries
  • Reranking: FlashRank with ms-marco-MiniLM-L-12-v2 - automatically improves precision
  • Vector Database: Pinecone (serverless, production-scale)
  • Retrieval: LangChain's contextual compression retriever (retrieves 10 โ†’ reranks โ†’ returns top 4)

The pipeline:

  1. Text โ†’ RecursiveCharacterTextSplitter breaks it into semantic chunks
  2. Chunks โ†’ Nomic embeddings convert to 768-dim vectors
  3. Store โ†’ Pinecone index with namespace isolation
  4. Query โ†’ Retrieve top 10 candidates based on vector similarity
  5. Rerank โ†’ FlashRank re-scores all 10 and picks the actual best 4 matches
  6. Return โ†’ Formatted, contextually relevant results ready to use

This isn't a toy setup - this is the right way to do RAG. The kind of pipeline you'd spend a week researching and building yourself. Except it's already done.


FAQ

Q: Do I need to keep the same NoBrainerRag object alive?
A: Nope! As long as you use the same index name and namespace, you can create new objects anytime and access the same data.

Q: What happens if I use the same namespace twice?
A: That's the point! Same index + same namespace = same memory. It's a feature, not a bug.

Q: Can I use this in production?
A: Yeah, it's built on production-grade tools (Pinecone, LangChain, Ollama). Just make sure your Pinecone plan can handle your scale.

Q: How much does Pinecone cost?
A: They have a generous free tier. Check Pinecone pricing.

Q: Can I change the embedding model?
A: Yes! Pass any LangChain-compatible embedding model to the embedding_model parameter.

Q: Is my data secure?
A: Data is stored in your Pinecone account. Use their security features + keep your API keys safe.

Q: Can I adjust retrieval settings after initialization?
A: Yes! Just change instance variables like rag.base_k = 20 or rag.use_reranking = False and the next query will use the new settings.

Q: What's the difference between index and namespace?
A: Index = your database. Namespace = an isolated partition inside that database. One index can hold many namespaces.


Requirements

  • Python 3.8+
  • Pinecone API key
  • Ollama installed locally (only if using default embeddings)
  • nomic-embed-text:v1.5 model pulled in Ollama (only if using default embeddings)

Contributing

Found a bug? Have an idea? PRs welcome! Keep it simple though - the goal is "no brainer", not "all the features".


License

MIT - do whatever you want with it.


Support

If this saved you hours of work, star the repo โญ and help other devs find it!


Built with โค๏ธ for developers who just want things to work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nobrainerrag-0.1.3.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nobrainerrag-0.1.3-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file nobrainerrag-0.1.3.tar.gz.

File metadata

  • Download URL: nobrainerrag-0.1.3.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for nobrainerrag-0.1.3.tar.gz
Algorithm Hash digest
SHA256 d7cb1de5b1fda72fc4ae9d6f727b0692d0d1a9b3a606b46e1709d11e6d172b1b
MD5 656122919d185f4aa8ccefbfe3c3f4a6
BLAKE2b-256 3ca3abc5d2b54c1dddef1fcab620095195d8f43a94813ad2d0ac5a869b8c005f

See more details on using hashes here.

File details

Details for the file nobrainerrag-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: nobrainerrag-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for nobrainerrag-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 430b3481c2c9616a130c0adbe011e7f279172aab19813a53759c4e09a7168bb3
MD5 6ae72a7a49c3676ca0af92744cf61df3
BLAKE2b-256 409eb683014aff67191c1d3f0c0ab9f3345525d281aaa072cd0d88a872daf913

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page