Skip to main content

Agent Brain RAG - Intelligent document indexing and semantic search server that gives AI agents long-term memory

Project description

Agent Brain RAG Server

Agent Brain (formerly doc-serve) is an intelligent document indexing and semantic search system designed to give AI agents long-term memory.

AI agents need persistent memory to be truly useful. Agent Brain provides the retrieval infrastructure that enables context-aware, knowledge-grounded AI interactions.

PyPI version Python 3.10+ License: MIT

Installation

pip install agent-brain-rag

Quick Start

  1. Set environment variables:

    export OPENAI_API_KEY=your-key
    export ANTHROPIC_API_KEY=your-key
    
  2. Start the server:

    agent-brain-serve
    

The server will start at http://127.0.0.1:8000.

Note: The legacy command doc-serve is still available but deprecated. Please use agent-brain-serve for new installations.

Search Capabilities

Agent Brain provides multiple search strategies to match your retrieval needs:

Search Type Description Best For
Semantic Search Natural language queries using OpenAI embeddings (text-embedding-3-large) Conceptual questions, finding related content
Keyword Search (BM25) Traditional keyword matching with TF-IDF ranking Exact matches, technical terms, code identifiers
Hybrid Search Combines vector + BM25 for best of both approaches General-purpose queries, balanced recall/precision
GraphRAG Knowledge graph-based retrieval for relationship-aware queries Understanding connections, multi-hop reasoning

Features

  • Document Indexing: Load and index documents from folders (PDF, Markdown, TXT, DOCX, HTML)
  • AST-Aware Code Ingestion: Smart parsing for Python, TypeScript, JavaScript, Java, Go, Rust, C, C++
  • Multi-Strategy Retrieval: Semantic, keyword, hybrid, and graph-based search
  • OpenAI Embeddings: Uses text-embedding-3-large for high-quality embeddings
  • Claude Summarization: AI-powered code summaries for better context
  • Chroma Vector Store: Persistent, thread-safe vector database
  • FastAPI: Modern, high-performance REST API with OpenAPI documentation

Prerequisites

  • Python 3.10+
  • OpenAI API key (for embeddings)
  • Anthropic API key (for summarization)

GraphRAG Configuration (Feature 113)

Agent Brain supports optional GraphRAG (Graph-based Retrieval-Augmented Generation) for enhanced relationship-aware queries.

Enabling GraphRAG

Set the environment variable to enable graph indexing:

export ENABLE_GRAPH_INDEX=true

Configuration Options

Variable Default Description
ENABLE_GRAPH_INDEX false Enable/disable GraphRAG features
GRAPH_STORE_TYPE simple Graph backend: simple (JSON) or kuzu (embedded DB)
GRAPH_MAX_TRIPLETS_PER_CHUNK 10 Maximum entities to extract per document chunk
GRAPH_USE_CODE_METADATA true Extract relationships from code AST metadata
GRAPH_USE_LLM_EXTRACTION true Use LLM for entity extraction from documents
GRAPH_TRAVERSAL_DEPTH 2 Default traversal depth for graph queries

Query Modes

With GraphRAG enabled, you have access to additional query modes:

  • graph: Query using only the knowledge graph (entity relationships)
  • multi: Combines vector search, BM25, and graph results using RRF fusion

Example: Graph Query

# CLI
agent-brain query "authentication service" --mode graph

# API
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "authentication service", "mode": "graph", "top_k": 10}'

Example: Multi-Mode Query

# CLI
agent-brain query "user login flow" --mode multi

# API
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "user login flow", "mode": "multi", "top_k": 5}'

Rebuilding the Graph Index

To rebuild only the graph index without re-indexing documents:

curl -X POST "http://localhost:8000/index?rebuild_graph=true" \
  -H "Content-Type: application/json" \
  -d '{"folder_path": "."}'

Optional Dependencies

For enhanced GraphRAG features, install optional dependency groups:

# For Kuzu graph store (production workloads)
poetry install --with graphrag-kuzu

# For enhanced entity extraction
poetry install --with graphrag

Two-Stage Reranking (Feature 123)

Agent Brain supports optional two-stage retrieval with reranking for improved search precision. When enabled, the system:

  1. Stage 1: Retrieves more candidates than requested (e.g., 50 candidates for top_k=5)
  2. Stage 2: Reranks candidates using a cross-encoder model for more accurate relevance scoring

Enabling Reranking

Set the following environment variables:

# Enable two-stage reranking (default: false)
ENABLE_RERANKING=true

# Choose provider (default: sentence-transformers)
RERANKER_PROVIDER=sentence-transformers  # or "ollama"

# Choose model (default: cross-encoder/ms-marco-MiniLM-L-6-v2)
RERANKER_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

# Stage 1 retrieval multiplier (default: 10)
RERANKER_TOP_K_MULTIPLIER=10

# Maximum candidates for Stage 1 (default: 100)
RERANKER_MAX_CANDIDATES=100

# Batch size for reranking inference (default: 32)
RERANKER_BATCH_SIZE=32

Provider Options

Provider Model Latency Description
sentence-transformers cross-encoder/ms-marco-MiniLM-L-6-v2 ~50ms Recommended. Fast, accurate cross-encoder.
sentence-transformers cross-encoder/ms-marco-MiniLM-L-12-v2 ~100ms Slower but more accurate.
ollama llama3.2:1b ~500ms Fully local, no HuggingFace download.

YAML Configuration

You can also configure reranking in config.yaml:

reranker:
  provider: sentence-transformers
  model: cross-encoder/ms-marco-MiniLM-L-6-v2
  params:
    batch_size: 32

Graceful Degradation

If the reranker fails (model unavailable, timeout, etc.), the system automatically falls back to Stage 1 results. This ensures queries never fail due to reranking issues.

Response Fields

When reranking is enabled, query results include additional fields:

  • rerank_score: The cross-encoder relevance score
  • original_rank: The position before reranking (1-indexed)

Example response:

{
  "results": [
    {
      "text": "Document content...",
      "source": "docs/guide.md",
      "score": 0.95,
      "rerank_score": 0.95,
      "original_rank": 5,
      "chunk_id": "chunk_abc123"
    }
  ]
}

Development Installation

cd agent-brain-server
poetry install

Configuration

Copy the environment template and configure:

cp ../.env.example .env
# Edit .env with your API keys

Required environment variables:

  • OPENAI_API_KEY: Your OpenAI API key for embeddings
  • ANTHROPIC_API_KEY: Your Anthropic API key for summarization

Running the Server

# Development mode
poetry run uvicorn agent_brain_server.api.main:app --reload

# Or use the entry point
poetry run agent-brain-serve

API Documentation

Once running, visit:

API Endpoints

Health

  • GET /health - Server health status
  • GET /health/status - Detailed indexing status

Indexing

  • POST /index - Start indexing documents from a folder
  • POST /index/add - Add documents to existing index
  • DELETE /index - Reset the index

Querying

  • POST /query - Semantic search query
  • GET /query/count - Get indexed document count

Example Usage

Index Documents

curl -X POST http://localhost:8000/index \
  -H "Content-Type: application/json" \
  -d '{"folder_path": "/path/to/docs"}'

Query Documents

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How do I configure authentication?", "top_k": 5}'

Architecture

agent_brain_server/
├── api/
│   ├── main.py           # FastAPI application
│   └── routers/          # Endpoint handlers
├── config/
│   └── settings.py       # Configuration management
├── models/               # Pydantic request/response models
├── indexing/
│   ├── document_loader.py  # Document loading
│   ├── chunking.py         # Text chunking
│   └── embedding.py        # Embedding generation
├── services/
│   ├── indexing_service.py # Indexing orchestration
│   └── query_service.py    # Query execution
└── storage/
    └── vector_store.py     # Chroma vector store

Development

Running Tests

poetry run pytest

Code Formatting

poetry run black agent_brain_server/
poetry run ruff check agent_brain_server/

Type Checking

poetry run mypy agent_brain_server/

Documentation

Release Information

Related Packages

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_brain_rag-10.2.1.tar.gz (208.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_brain_rag-10.2.1-py3-none-any.whl (255.3 kB view details)

Uploaded Python 3

File details

Details for the file agent_brain_rag-10.2.1.tar.gz.

File metadata

  • Download URL: agent_brain_rag-10.2.1.tar.gz
  • Upload date:
  • Size: 208.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_brain_rag-10.2.1.tar.gz
Algorithm Hash digest
SHA256 60fcbd0d40904c7f868695dcda3d6f94e75b818c4901846100b41e519ebf9687
MD5 5bde10d0d67dbf08755f5eea650c47e8
BLAKE2b-256 913842a6817fe13e6886f6aa8aac8842b2568b6568ba7923dd2bb64e7c2ed728

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_brain_rag-10.2.1.tar.gz:

Publisher: publish-to-pypi.yml on SpillwaveSolutions/agent-brain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agent_brain_rag-10.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_brain_rag-10.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ba6a3b69e37923c2305e51a42466df20405036e5d82885f3dbba7c1125e290cd
MD5 6f1971f0ad7a5c5d0f6f251002b1b3d1
BLAKE2b-256 24c9e969b8181cb515de53f2f9c1ab936085743925b5c62bef3bd97d94b52c30

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_brain_rag-10.2.1-py3-none-any.whl:

Publisher: publish-to-pypi.yml on SpillwaveSolutions/agent-brain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page