Skip to main content

A lightweight, self-consolidating cognitive memory layer for AI agents. Combines SQLite, vector search, and a knowledge graph with a biologically-inspired sleep/forget cycle.

Project description

๐Ÿฆ… Shaheen DB

PyPI version Python Version License Tests Database Vector Math

pip install shaheen-db

Shaheen DB (named after the Royal Falcon) is a lightweight, zero-ops, self-consolidating cognitive memory layer designed specifically for AI Agents.

Instead of treating agent memory as a massive, unstructured dumping ground of raw vector embeddingsโ€”which leads to bloated context windows, duplicate facts, high token costs, and retrieval noiseโ€”Shaheen DB implements an opinionated, biologically inspired cognitive memory pipeline.


๐Ÿ’ก The Problem with Traditional Vector Databases

Most AI agents use standard vector databases (like Pinecone or Chroma) for memory. This approach has three fatal flaws in production:

  1. Semantic Noise: If a user mentions eating a croissant on Day 1, and asks about their business strategy on Day 30, a standard vector search will often retrieve the croissant log due to query overlaps. This pollutes the LLM's prompt.
  2. Context Bloat: Raw transcripts contain filler words, greetings, and temporary statements. Feeding these directly to LLMs burns tokens and incurs high latency.
  3. No Fact Evolution: If a user says "My name is Saif" on Day 1, and "Actually, write my name as Saif Al-Islam" on Day 5, a vector search will return both statements, forcing the LLM to guess which name is correct.

Shaheen DB solves this by separating sensory ingestion from long-term factual consolidation, and automatically forgetting trivial logs over time.


๐Ÿง  Core Memory Architecture

                  [ Agent Conversations / Logs ]
                                โ”‚
                                โ–ผ  (Immediate Ingestion < 5ms)
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚  Sensory Buffer (SQL)  โ”‚ โ—„โ”€โ”€โ”€ Instant Vector Match
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ”‚
                                โ–ผ  (Background "Sleep" Loop)
                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                   โ”‚   Consolidation Engine   โ”‚
                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚            โ”‚
          (Extract facts) โ”‚            โ”‚ (Extract relationships)
                          โ–ผ            โ–ผ
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚ Vectors  โ”‚  โ”‚ Knowledge โ”‚
                    โ”‚  Index   โ”‚  โ”‚   Graph   โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  1. Sensory Store (Short-term Memory): Immediate, high-throughput writes (under 5ms) to an SQLite buffer running in Write-Ahead Logging (WAL) mode. Vectorized immediately for real-time semantic retrieval.
  2. Consolidation Loop (The "Sleep" Cycle): An asynchronous process that processes raw logs using structured LLM outputs to extract structured entities (nodes) and relationships (edges).
  3. Cognitive Decay (Forgetting): Applies mathematical exponential decay ($decay = e^{-\lambda \Delta t}$) to temporary sensory logs. Trivial information decays and is deleted, while pinned/permanent facts and organized graph connections are preserved permanently.
  4. Hybrid GraphRAG Retrieval: A unified retrieval query that automatically builds a context pack containing:
    • Recent Conversations: Highly relevant active sensory logs.
    • Associated Entities & Facts: Consolidated entities matching the query.
    • Associative Relationships: Sub-graphs of connections between matching entities.

โšก Features at a Glance

  • Zero-Ops GraphRAG: Get the reasoning benefits of a Knowledge Graph and Vector Database in a single local SQLite file. No Neo4j servers, no Docker, no external cloud configurations.
  • Biologically Inspired Sleep Cycle: Runs LLM extraction in the background to convert unstructured dialogue into structured relationships.
  • Native Memory Decay: Automatic time-based cleanup of trivial logs to keep context window sizes minimal and token costs low.
  • Microsecond Graph Traversals: Uses optimized SQL queries inside SQLite to fetch 1-hop and 2-hop entity neighborhoods in microseconds.
  • Dependency-Optional Offline Mode: Support for local CPU/GPU embedding generation via sentence-transformers or local API calls to Ollama to avoid heavy PyTorch dependencies.

๐Ÿš€ Quickstart

1. Installation

Install the package requirements:

pip install -r requirements.txt

2. Basic Usage

from shaheen import Shaheen

# Initialize Shaheen DB (creates a local SQLite file)
db = Shaheen(db_path="memory.db")

# 1. Store sensory memories (fast write, immediate indexing)
db.remember("User: Hello! My name is Saif. I am a software engineer.")
db.remember("User: I am building Shaheen DB in Python to solve agent memory.")
db.remember("User: By the way, I have a pet falcon named Swift.", permanent=True) # Pinned context!

# 2. Trigger the "Sleep" cycle to organize memory into a graph
db.consolidate()

# 3. Recall coordinated memory (returns a unified GraphRAG context)
result = db.recall("Who is Saif and what is his pet?")

print(result["context_text"])

3. Output Context Pack Structure

The generated context pack is cleanly formatted, ready to be injected straight into your LLM prompt template:

### Recent Conversations & Logs:
- [2026-05-20 08:31:13] User: By the way, I have a pet falcon named Swift. [Permanent] (decay relevance: 1.00, query match: 0.85)

### Relevant Entities & Facts:
- saif (Person): role: software engineer (query match: 0.88)
- swift (Animal): type: falcon, owner: saif (query match: 0.84)
- shaheen-db (Software): purpose: AI agent memory, creator: saif (query match: 0.75)

### Associative Relationships:
- saif --[BUILDING]--> shaheen-db (status: active)
- saif --[OWNER_OF]--> swift

โฑ๏ธ Running Consolidation in Production

Because the "Sleep Cycle" (db.consolidate()) calls generative LLMs to extract information, it takes a few seconds to complete. You should never run it inline on the main thread during active user conversations, as it will slow down your agent's response time.

Instead, use one of the following production patterns:

Pattern A: Threshold-Based Trigger (Self-Sleep)

Automatically trigger consolidation in a background thread when the count of raw, unconsolidated memories reaches a threshold (e.g., every 15 conversation turns).

import threading

def handle_incoming_message(text: str):
    # 1. Ingest immediately (takes < 5ms)
    db.remember(f"User: {text}")
    
    # 2. Trigger consolidation in the background if the threshold is met
    unconsolidated_count = len(db.db.get_unconsolidated_memories())
    if unconconsolidated_count >= 15:
        threading.Thread(target=db.consolidate, daemon=True).start()
        
    # 3. Retrieve context & query agent LLM
    context = db.recall(text)
    return agent.respond(text, context)

Pattern B: Inactivity/Idle-Time Trigger

Trigger consolidation only when the user has stopped talking to the agent for a set duration (e.g., 10 minutes of inactivity).

import time
import threading

last_activity = time.time()
has_slept = False

def idle_watcher():
    global last_activity, has_slept
    while True:
        time.sleep(30)
        if time.time() - last_activity > 600 and not has_slept:
            db.consolidate()
            has_slept = True

# Start idle watcher thread on startup
threading.Thread(target=idle_watcher, daemon=True).start()

Pattern C: Scheduled Cron Job

For multi-user SaaS setups, run a nightly script (e.g., at 2:00 AM) that iterates through all active user databases and triggers .consolidate().

# Run daily at 2:00 AM
0 2 * * * python /app/scripts/trigger_nightly_consolidation.py

โš™๏ธ Configuration Scenarios

[!NOTE] Base URL is Optional: Standard OpenAI, Google Gemini, Anthropic, and local Ollama setups are auto-configured by default. You do not need to set SHAHEEN_LLM_BASE_URL unless you are using a third-party gateway like OpenRouter or Groq.

Supported Providers

Provider SHAHEEN_LLM_PROVIDER Native SDK Embedding Source Notes
OpenAI openai โœ… OpenAI API Default provider
Google Gemini gemini โœ… Gemini API Uses google-genai SDK
Anthropic Claude anthropic โœ… SentenceTransformers (local) Anthropic has no embedding API
Ollama (local) openai Via base URL Ollama API Point base URL to localhost:11434
Groq openai Via base URL Groq API Ultra-fast inference
OpenRouter openai Via base URL OpenRouter API Access to 200+ models
DeepSeek openai Via base URL DeepSeek API Cost-effective cloud inference
LM Studio openai Via base URL LM Studio API Local GUI-based model runner
SentenceTransformers local โœ… Python (in-process) 100% offline, no API needed

Shaheen DB is highly flexible and supports 100% cloud, hybrid, and 100% offline stacks. Configure it using these environment variables:

Scenario 1: 100% Google Gemini Cloud

export SHAHEEN_LLM_PROVIDER="gemini"
export GEMINI_API_KEY="your-gemini-api-key"

# Defaults used automatically:
# SHAHEEN_LLM_MODEL="gemini-2.5-flash"
# SHAHEEN_EMBEDDING_MODEL="text-embedding-004"

Scenario 2: 100% OpenAI Cloud

export SHAHEEN_LLM_PROVIDER="openai"
export SHAHEEN_LLM_API_KEY="your-openai-key"

# Defaults used automatically:
# SHAHEEN_LLM_MODEL="gpt-4o-mini"
# SHAHEEN_EMBEDDING_MODEL="text-embedding-3-small"

Scenario 3: Anthropic Claude (Native)

Uses the native Claude SDK with Tool Use to guarantee structured JSON extraction during the sleep cycle. Since Anthropic has no embedding API, vector search runs locally via SentenceTransformers.

export SHAHEEN_LLM_PROVIDER="anthropic"
export ANTHROPIC_API_KEY="your-anthropic-key"

# Defaults used automatically:
# SHAHEEN_LLM_MODEL="claude-3-5-sonnet-20241022"
# SHAHEEN_EMBEDDING_MODEL="all-MiniLM-L6-v2" (runs locally, no API cost)

Scenario 4: Universal Gateway (OpenRouter / Groq / DeepSeek)

Point Shaheen DB at any OpenAI-compatible provider to access hundreds of models with a single API key.

export SHAHEEN_LLM_PROVIDER="openai"

# OpenRouter (access to 200+ models including Claude, Llama, Mistral, Gemini)
export SHAHEEN_LLM_BASE_URL="https://openrouter.ai/api/v1"
export SHAHEEN_LLM_API_KEY="your-openrouter-key"
export SHAHEEN_LLM_MODEL="meta-llama/llama-3-70b-instruct" # or any model slug

# Groq (ultra-fast Llama / Mixtral inference)
# export SHAHEEN_LLM_BASE_URL="https://api.groq.com/openai/v1"
# export SHAHEEN_LLM_API_KEY="your-groq-key"
# export SHAHEEN_LLM_MODEL="llama3-70b-8192"

Scenario 3: Hybrid (Local Search + DeepSeek Cloud via OpenRouter)

Runs vector search 100% locally on your CPU (saving API costs), and only queries OpenRouter's cloud for the background consolidation sleep cycle.

export SHAHEEN_LLM_PROVIDER="local" # Runs SentenceTransformers locally

export SHAHEEN_LLM_BASE_URL="https://openrouter.ai/api/v1"
export SHAHEEN_LLM_API_KEY="your-openrouter-key"
export SHAHEEN_LLM_MODEL="deepseek/deepseek-chat"

Scenario 4: 100% Offline (Ollama-only, Lightweight)

Runs both embeddings and the LLM locally via Ollama. Keeps your Python environment lightweight (does NOT require PyTorch or sentence-transformers).

export SHAHEEN_LLM_PROVIDER="openai"
export SHAHEEN_LLM_BASE_URL="http://localhost:11434/v1" # Required to redirect the API client to Ollama
export SHAHEEN_LLM_API_KEY="ollama"                    # Dummy key

export SHAHEEN_LLM_MODEL="llama3"
export SHAHEEN_EMBEDDING_MODEL="nomic-embed-text"
export SHAHEEN_EMBEDDING_DIM="768"

Scenario 5: 100% Offline (Local Python Embeddings + Ollama LLM)

Uses native SentenceTransformers locally on your CPU for vector search, and calls Ollama for the LLM consolidation sleep cycle.

export SHAHEEN_LLM_PROVIDER="local" # Runs SentenceTransformers locally (no base URL needed for embeddings)

# Points the LLM client to Ollama (Defaults to http://localhost:11434/v1 automatically)
export SHAHEEN_LLM_MODEL="llama3"
export SHAHEEN_LLM_API_KEY="ollama" # Dummy key

๐Ÿ›๏ธ Systems & Engineering Design

  • ACID Transactions: The Knowledge Graph (entities/edges) and vector embeddings reside in the same SQLite database. All consolidation updates happen in single database transactions. If an LLM extraction fails or gets interrupted, the database rolls back cleanly.
  • WAL Mode Concurrency: Configured with SQLite Write-Ahead Logging. Your AI agent can stream messages and write sensory logs on the main thread, while the consolidation loop runs on a background scheduler without locking the database.
  • Lazy Loading: Offline embedding dependencies (sentence-transformers) are only loaded if you explicitly set SHAHEEN_LLM_PROVIDER="local". If you use cloud APIs or Ollama, your application starts instantly without PyTorch memory overhead.

๐Ÿ“œ License

Shaheen DB is open-source software licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shaheen_db-0.1.0.tar.gz (27.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shaheen_db-0.1.0-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file shaheen_db-0.1.0.tar.gz.

File metadata

  • Download URL: shaheen_db-0.1.0.tar.gz
  • Upload date:
  • Size: 27.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for shaheen_db-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5d6212b5c27433f8929eb9f8a9faa38946f53730dfbf254e1d054e0333a062fa
MD5 984dd10b1724771a07561ee01161a700
BLAKE2b-256 8fda2b0ae1a30ab998c3cb00ddcbcfdf6cb3ea2fe7899c2486f38d88db5bf3df

See more details on using hashes here.

File details

Details for the file shaheen_db-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: shaheen_db-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for shaheen_db-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f78807901e2038847e9673a441a8b11d3c3f357ae9fe3fc056d4cf7e2da85393
MD5 5e6c708ae9758c5952b8ac24ee7efbd6
BLAKE2b-256 3642bd13d1e19fee37b2add2bf1d1770f1cb5c608d9ec629b5fa9f96cc451521

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page