Skip to main content

Fast, embedded vector + graph memory for AI agents

Project description

CortexaDB: SQLite for AI Agents

License: MIT/Apache-2.0 Status: Beta Version

CortexaDB is a simple, fast, and hard-durable embedded database designed specifically for AI agent memory. It provides a single-file-like experience (no server required) but with native support for vectors, graphs, and temporal search.

Think of it as SQLite, but with semantic and relational intelligence for your agents.


What's New in v0.1.3

  • Automatic HNSW Persistence - HNSW index is now automatically saved to disk on checkpoint or database close, enabling fast restart without rebuilding the index
  • Improved reliability for production use

Quickstart

Python (Recommended)

CortexaDB is designed to be extremely easy to use from Python via high-performance Rust bindings.

from cortexadb import CortexaDB
from cortexadb.providers.openai import OpenAIEmbedder

# Open database with embedder (auto-embeds text)
db = CortexaDB.open("agent.mem", embedder=OpenAIEmbedder())

# Store memories
db.remember("The user prefers dark mode.")
db.remember("User works at Stripe.")

# Load a file (TXT, MD, JSON, DOCX, PDF)
db.load("document.pdf", strategy="recursive")

# Ask questions (Semantic Search)
hits = db.ask("What does the user like?")
for hit in hits:
    print(f"ID: {hit.id}, Score: {hit.score}")

# Connect memories (Graph Relationships)
db.connect(mid1, mid2, "relates_to")

Installation

Python

CortexaDB is available on PyPI and can be installed via pip:

# Recommended: Install from PyPI
pip install cortexadb

# With document support (DOCX, PDF)
pip install cortexadb[docs]
pip install cortexadb[pdf]

# From GitHub (Install latest release)
pip install "cortexadb @ git+https://github.com/anaslimem/CortexaDB.git#subdirectory=crates/cortexadb-py"

Rust

Add CortexaDB to your Cargo.toml:

[dependencies]
cortexadb-core = { git = "https://github.com/anaslimem/CortexaDB.git" }

Key Features

  • Hybrid Retrieval: Combine vector similarity (semantic), graph relations (structural), and recency (temporal) in a single query.
  • Smart Chunking: Multiple strategies for document ingestion - fixed, recursive, semantic, markdown, json.
  • File Support: Load documents directly - TXT, MD, JSON, DOCX, PDF.
  • HNSW Indexing: Ultra-fast approximate nearest neighbor search using USearch (95%+ recall at millisecond latency).
  • Hard Durability: Write-Ahead Log (WAL) and Segmented logs ensure your agent never forgets, even after a crash.
  • Multi-Agent Namespaces: Isolate memories between different agents or workspaces within a single database file.
  • Deterministic Replay: Record operations to a log file and replay them exactly to debug agent behavior or migrate data.
  • Automatic Capacity Management: Set max_entries or max_bytes and let CortexaDB handle LRU/Importance-based eviction automatically.
  • Crash-Safe Compaction: Background maintenance that keeps your storage lean without risking data loss.

HNSW Indexing

CortexaDB uses USearch for high-performance approximate nearest neighbor search. Switch between exact and HNSW modes based on your needs:

Mode Use Case Recall Speed
exact Small datasets (<10K) 100% O(n)
hnsw Large datasets 95%+ O(log n)

Automatic Persistence

HNSW indexing now includes automatic persistence:

  • On checkpoint() - HNSW index is saved to disk
  • On database close/drop - HNSW index is automatically saved
  • On restart - HNSW index is loaded from disk (fast recovery!)

No extra configuration needed - just use index_mode="hnsw" and it just works.

from cortexadb import CortexaDB, HashEmbedder

# Default: exact (brute-force)
db = CortexaDB.open("db.mem", dimension=128)

# Or use HNSW for large-scale search
db = CortexaDB.open("db.mem", dimension=128, index_mode="hnsw")

# HNSW with custom parameters
db = CortexaDB.open("db.mem", dimension=128, index_mode={
    "type": "hnsw",
    "m": 16,           # connections per node
    "ef_search": 50,   # query-time search width
    "ef_construction": 200  # build-time search width
})

HNSW Parameters

Parameter Default Range Description
m 16 4-64 Connections per node. Higher = more memory, higher recall.
ef_search 50 10-500 Query search width. Higher = better recall, slower search.
ef_construction 200 50-500 Build search width. Higher = better index, slower build.

Trade-offs:

  • Speed vs Recall: Increase ef_search for better results, decrease for speed
  • Memory vs Quality: Increase m for higher recall, uses more memory
  • Build Time vs Quality: Increase ef_construction for better index, slower initial build

Chunking Strategies

CortexaDB provides 5 smart chunking strategies for document ingestion:

Strategy Use Case
fixed Simple character-based with word-boundary snap
recursive General purpose - splits paragraphs → sentences → words
semantic Articles, blogs - split by paragraphs
markdown Technical docs - preserves headers, lists, code blocks
json Structured data - flattens to key-value pairs
from cortexadb import CortexaDB, chunk

# Use chunk() directly
chunks = chunk(text, strategy="recursive", chunk_size=512, overlap=50)

# Or use db.ingest() / db.load()
db.ingest("text...", strategy="markdown")
db.load("document.pdf", strategy="recursive")

File Format Support

Format Extension Install
Plain Text .txt Built-in
Markdown .md Built-in
JSON .json Built-in
Word .docx pip install cortexadb[docs]
PDF .pdf pip install cortexadb[pdf]

API Guide

Core Operations

Method Description
CortexaDB.open(path, ...) Opens or creates a database at the specified path.
.remember(text, ...) Stores a new memory. Auto-embeds if an embedder is configured.
.ingest(text, ...) Ingests text with smart chunking.
.load(path, ...) Loads and ingests a file.
.ask(query, ...) Performs a hybrid search across vectors, graphs, and time.
.connect(id1, id2, rel) Creates a directed edge between two memory entries.
.namespace(name) Returns a scoped view of the database for a specific agent/context.
.delete_memory(id) Permanently removes a memory and updates all indexes.
.compact() Reclaims space by removing deleted entries from disk.
.checkpoint() Truncates the WAL and snapshots the current state for fast startup.

Configuration Options

When calling CortexaDB.open(), you can tune the behavior:

  • sync: "strict" (safest), "async" (fastest), or "batch" (balanced).
  • max_entries: Limits the total number of memories (triggers auto-eviction).
  • record: Path to a log file for capturing the entire session for replay.

Technical Essentials: How it's built

Click to see the Rust Architecture

Why Rust?

CortexaDB is written in Rust to provide memory safety without a garbage collector, ensuring predictable performance (sub-100ms startup) and low resource overhead—critical for "embedded" use cases where the DB runs inside your agent's process.

The Storage Engine

CortexaDB follows a Log-Structured design:

  1. WAL (Write-Ahead Log): Every command is first appended to a durable log with CRC32 checksums.
  2. Segment Storage: Large memory payloads are stored in append-only segments.
  3. Deterministic State Machine: On startup, the database replays the log into an in-memory state machine. This ensures 100% consistency between the disk and your queries.

Hybrid Query Engine

Unlike standard vector DBs, CortexaDB doesn't just look at distance. Our query planner can:

  • Vector: Find semantic matches using Cosine Similarity.
  • Graph: Discover related concepts by traversing edges created with .connect().
  • Temporal: Boost or filter results based on when they were "remembered".

Smart Chunking

The chunking engine is built in Rust for performance:

  • 5 strategies covering most use cases
  • Word-boundary awareness to avoid splitting words
  • Overlap support for context continuity
  • JSON flattening for structured data

Versioned Serialization

We use a custom versioned serialization layer (with a "magic-byte" header). This allows us to update the CortexaDB engine without breaking your existing database files—it knows how to read "legacy" data while writing new records in the latest format.


License & Status

CortexaDB is currently in Beta (v0.1.3). It is released under the MIT and Apache-2.0 licenses.
We are actively refining the API and welcome feedback!


^ Windows builds are temporarily unavailable due to a Windows compatibility issue in the usearch library.


CortexaDB — Because agents shouldn't have to choose between speed and a soul (memory).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cortexadb-0.1.3-cp313-cp313-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cortexadb-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

cortexadb-0.1.3-cp312-cp312-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cortexadb-0.1.3-cp311-cp311-manylinux_2_34_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

cortexadb-0.1.3-cp311-cp311-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file cortexadb-0.1.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cortexadb-0.1.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e842c32286ca932beeb601488b8dcaf1e8ab4fcb42946617b5ac4cd73527855f
MD5 998b82e98bce29a1d1b5cbee8c88a0eb
BLAKE2b-256 ffbbfe87fced2168af7e625efcd87975baad25e016c5a2078a221cdb0f40d30a

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortexadb-0.1.3-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on anaslimem/CortexaDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cortexadb-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cortexadb-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 753c6ee13c312e414f740bc671db120f2cab159c2bf183b3a8c7cdae846a15f0
MD5 a097209578af6e7d4813de3bdba86665
BLAKE2b-256 745489ccd9b81491243fd3732b26f09cc44f7588021cdb3a93b3496600eb40a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortexadb-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on anaslimem/CortexaDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cortexadb-0.1.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cortexadb-0.1.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ba30b285cc132fc4d436ea04a8072f87d1b2d63dc727492268beecbb1f430693
MD5 7011a6c5e4b48f9f48f385c31f531ffe
BLAKE2b-256 23bb260512ea09a62b757bd9d03975ba615b2b14ae816805bd1e7729706c2bc6

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortexadb-0.1.3-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on anaslimem/CortexaDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cortexadb-0.1.3-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cortexadb-0.1.3-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 996f7b56066cd651948ae9b3eb4871227a2bef7c5756c3a10baaad2a451fb60a
MD5 662cf67e4923e16a9f35001368645118
BLAKE2b-256 3a1c0dc589590f30fd1d5cae322cfc2833ed0b16ae24486112cb4cc577e11d0e

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortexadb-0.1.3-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: release.yml on anaslimem/CortexaDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cortexadb-0.1.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cortexadb-0.1.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f1ccf2d722ce7a82a32b4e77061369f0bd17c06c78531d54128844c0b6a4cb5d
MD5 51298a7074eec191ca5ef94a1ee5929f
BLAKE2b-256 1764f9bec646a58977ab282d8c1bdb6bf2f6a38c578d2447b5fa52e0bc1a47af

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortexadb-0.1.3-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on anaslimem/CortexaDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page