Fast, embedded vector + graph memory for AI agents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Project description

CortexaDB: SQLite for AI Agents

CortexaDB is a simple, fast, and hard-durable embedded database designed specifically for AI agent memory. It provides a single-file-like experience (no server required) but with native support for vectors, graphs, and temporal search.

Think of it as SQLite, but with semantic and relational intelligence for your agents.

What's New in v0.1.5

Benchmark Suite - Added comprehensive benchmarking with HNSW vs Exact comparison
HNSW Performance Fix - Fixed segmentation fault issue with usearch
5x Speedup - HNSW now runs ~5x faster than exact search with 95% recall

What's New in v0.1.4

L2/Euclidean Distance - Added support for L2 distance metric in HNSW
- Use metric: "l2" in index_mode config
- Best for image embeddings, recommendation systems, geometric data

Quickstart

Python (Recommended)

CortexaDB is designed to be extremely easy to use from Python via high-performance Rust bindings.

from cortexadb import CortexaDB
from cortexadb.providers.openai import OpenAIEmbedder

# Open database with embedder (auto-embeds text)
db = CortexaDB.open("agent.mem", embedder=OpenAIEmbedder())

# Store memories
db.remember("The user prefers dark mode.")
db.remember("User works at Stripe.")

# Load a file (TXT, MD, JSON, DOCX, PDF)
db.load("document.pdf", strategy="recursive")

# Ask questions (Semantic Search)
hits = db.ask("What does the user like?")
for hit in hits:
    print(f"ID: {hit.id}, Score: {hit.score}")

# Connect memories (Graph Relationships)
db.connect(mid1, mid2, "relates_to")

Installation

Python

CortexaDB is available on PyPI and can be installed via pip:

# Recommended: Install from PyPI
pip install cortexadb

# With document support (DOCX, PDF)
pip install cortexadb[docs]
pip install cortexadb[pdf]

# From GitHub (Install latest release)
pip install "cortexadb @ git+https://github.com/anaslimem/CortexaDB.git#subdirectory=crates/cortexadb-py"

Rust

Add CortexaDB to your Cargo.toml:

[dependencies]
cortexadb-core = { git = "https://github.com/anaslimem/CortexaDB.git" }

Key Features

Hybrid Retrieval: Combine vector similarity (semantic), graph relations (structural), and recency (temporal) in a single query.
Smart Chunking: Multiple strategies for document ingestion - fixed, recursive, semantic, markdown, json.
File Support: Load documents directly - TXT, MD, JSON, DOCX, PDF.
HNSW Indexing: Ultra-fast approximate nearest neighbor search using USearch (95%+ recall at millisecond latency).
Hard Durability: Write-Ahead Log (WAL) and Segmented logs ensure your agent never forgets, even after a crash.
Multi-Agent Namespaces: Isolate memories between different agents or workspaces within a single database file.
Deterministic Replay: Record operations to a log file and replay them exactly to debug agent behavior or migrate data.
Automatic Capacity Management: Set max_entries or max_bytes and let CortexaDB handle LRU/Importance-based eviction automatically.
Crash-Safe Compaction: Background maintenance that keeps your storage lean without risking data loss.

HNSW Indexing

CortexaDB uses USearch for high-performance approximate nearest neighbor search. Switch between exact and HNSW modes based on your needs:

Mode	Use Case	Recall	Speed
`exact`	Small datasets (<10K)	100%	O(n)
`hnsw`	Large datasets	95%+	O(log n)

Automatic Persistence

HNSW indexing now includes automatic persistence:

On checkpoint() - HNSW index is saved to disk
On database close/drop - HNSW index is automatically saved
On restart - HNSW index is loaded from disk (fast recovery!)

No extra configuration needed - just use index_mode="hnsw" and it just works.

from cortexadb import CortexaDB, HashEmbedder

# Default: exact (brute-force)
db = CortexaDB.open("db.mem", dimension=128)

# Or use HNSW for large-scale search
db = CortexaDB.open("db.mem", dimension=128, index_mode="hnsw")

# HNSW with custom parameters
db = CortexaDB.open("db.mem", dimension=128, index_mode={
    "type": "hnsw",
    "m": 16,           # connections per node
    "ef_search": 50,   # query-time search width
    "ef_construction": 200,  # build-time search width
    "metric": "cos"    # distance metric: "cos" (cosine) or "l2" (euclidean)
})

# L2/Euclidean metric - best for image embeddings, recommendation systems
db = CortexaDB.open("db.mem", dimension=128, index_mode={
    "type": "hnsw",
    "metric": "l2"
})

HNSW Parameters

Parameter	Default	Range	Description
`m`	16	4-64	Connections per node. Higher = more memory, higher recall.
`ef_search`	50	10-500	Query search width. Higher = better recall, slower search.
`ef_construction`	200	50-500	Build search width. Higher = better index, slower build.
`metric`	`cos`	`cos`, `l2`	Distance metric. `cos` = Cosine, `l2` = Euclidean/L2

Choosing a Distance Metric

Metric	Best For	Description
`cos` (default)	Text/semantic search	Measures angle between vectors. Ignores magnitude.
`l2`	Image embeddings, recommendation systems	Measures straight-line distance. Considers both direction and magnitude.

When to use L2:

Image embeddings where magnitude matters
Recommendation systems comparing user ratings
Geometric data (e.g., GPS coordinates)
When your embedding model was trained with L2 loss

Trade-offs:

Speed vs Recall: Increase ef_search for better results, decrease for speed
Memory vs Quality: Increase m for higher recall, uses more memory
Build Time vs Quality: Increase ef_construction for better index, slower initial build
Cosine vs L2: Use cos for text/semantic search, l2 for image/recommendation data

Chunking Strategies

CortexaDB provides 5 smart chunking strategies for document ingestion:

Strategy	Use Case
`fixed`	Simple character-based with word-boundary snap
`recursive`	General purpose - splits paragraphs → sentences → words
`semantic`	Articles, blogs - split by paragraphs
`markdown`	Technical docs - preserves headers, lists, code blocks
`json`	Structured data - flattens to key-value pairs

from cortexadb import CortexaDB, chunk

# Use chunk() directly
chunks = chunk(text, strategy="recursive", chunk_size=512, overlap=50)

# Or use db.ingest() / db.load()
db.ingest("text...", strategy="markdown")
db.load("document.pdf", strategy="recursive")

File Format Support

Format	Extension	Install
Plain Text	`.txt`	Built-in
Markdown	`.md`	Built-in
JSON	`.json`	Built-in
Word	`.docx`	`pip install cortexadb[docs]`
PDF	`.pdf`	`pip install cortexadb[pdf]`

API Guide

Core Operations

Method	Description
`CortexaDB.open(path, ...)`	Opens or creates a database at the specified path.
`.remember(text, ...)`	Stores a new memory. Auto-embeds if an embedder is configured.
`.ingest(text, ...)`	Ingests text with smart chunking.
`.load(path, ...)`	Loads and ingests a file.
`.ask(query, ...)`	Performs a hybrid search across vectors, graphs, and time.
`.connect(id1, id2, rel)`	Creates a directed edge between two memory entries.
`.namespace(name)`	Returns a scoped view of the database for a specific agent/context.
`.delete_memory(id)`	Permanently removes a memory and updates all indexes.
`.compact()`	Reclaims space by removing deleted entries from disk.
`.checkpoint()`	Truncates the WAL and snapshots the current state for fast startup.

Configuration Options

When calling CortexaDB.open(), you can tune the behavior:

sync: "strict" (safest), "async" (fastest), or "batch" (balanced).
max_entries: Limits the total number of memories (triggers auto-eviction).
record: Path to a log file for capturing the entire session for replay.

Technical Essentials: How it's built

Click to see the Rust Architecture

Why Rust?

CortexaDB is written in Rust to provide memory safety without a garbage collector, ensuring predictable performance (sub-100ms startup) and low resource overhead—critical for "embedded" use cases where the DB runs inside your agent's process.

The Storage Engine

CortexaDB follows a Log-Structured design:

WAL (Write-Ahead Log): Every command is first appended to a durable log with CRC32 checksums.
Segment Storage: Large memory payloads are stored in append-only segments.
Deterministic State Machine: On startup, the database replays the log into an in-memory state machine. This ensures 100% consistency between the disk and your queries.

Hybrid Query Engine

Unlike standard vector DBs, CortexaDB doesn't just look at distance. Our query planner can:

Vector: Find semantic matches using Cosine Similarity.
Graph: Discover related concepts by traversing edges created with .connect().
Temporal: Boost or filter results based on when they were "remembered".

Smart Chunking

The chunking engine is built in Rust for performance:

5 strategies covering most use cases
Word-boundary awareness to avoid splitting words
Overlap support for context continuity
JSON flattening for structured data

Versioned Serialization

We use a custom versioned serialization layer (with a "magic-byte" header). This allows us to update the CortexaDB engine without breaking your existing database files—it knows how to read "legacy" data while writing new records in the latest format.

Benchmarks

CortexaDB has been benchmarked with 10,000 embeddings at 384 dimensions (typical sentence-transformer size).

Results

Mode	Indexing Time	Query (p50)	Throughput	Recall
Exact (baseline)	138s	1.34ms	690 QPS	100%
HNSW	151s	0.29ms	3,203 QPS	95%

→ HNSW is ~5x faster than exact search while maintaining 95% recall

Benchmark Methodology

Dataset: 10,000 embeddings × 384 dimensions (realistic sentence-transformer size)
Indexing: Time to build fresh index from scratch
Query Latency: p50/p95/p99 measured across 1,000 queries (after 100 warmup queries)
Recall: Percentage of HNSW results that match brute-force exact search

Running Benchmarks

# 1. Build the Rust extension
cd crates/cortexadb-py
maturin develop --release
cd ../..

# 2. Generate test embeddings
python benchmark/generate_embeddings.py --count 10000 --dimensions 384

# 3. Run benchmarks
python benchmark/run_benchmark.py --index-mode exact   # baseline (100% recall)
python benchmark/run_benchmark.py --index-mode hnsw    # fast mode (~95% recall)

# Results are saved to benchmark/results/

Custom Benchmark Options

python benchmark/run_benchmark.py \
    --count 10000 \
    --dimensions 384 \
    --top-k 10 \
    --warmup 100 \
    --queries 1000 \
    --index-mode hnsw

License & Status

CortexaDB is currently in Beta (v0.1.5). It is released under the MIT and Apache-2.0 licenses.
We are actively refining the API and welcome feedback!

^ Windows builds are temporarily unavailable due to a Windows compatibility issue in the usearch library.

CortexaDB — Because agents shouldn't have to choose between speed and a soul (memory).

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

anaslimem

Release history Release notifications | RSS feed

1.0.1

Apr 16, 2026

1.0.0

Mar 18, 2026

0.1.8

Mar 8, 2026

0.1.7

Mar 6, 2026

0.1.6

Mar 3, 2026

This version

0.1.5

Mar 1, 2026

0.1.4

Feb 28, 2026

0.1.3

Feb 27, 2026

0.1.2

Feb 27, 2026

0.1.1

Feb 25, 2026

0.1.0

Feb 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cortexadb-0.1.5-cp313-cp313-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded Mar 1, 2026 CPython 3.13macOS 11.0+ ARM64

cortexadb-0.1.5-cp312-cp312-manylinux_2_34_x86_64.whl (1.5 MB view details)

Uploaded Mar 1, 2026 CPython 3.12manylinux: glibc 2.34+ x86-64

cortexadb-0.1.5-cp312-cp312-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded Mar 1, 2026 CPython 3.12macOS 11.0+ ARM64

cortexadb-0.1.5-cp311-cp311-manylinux_2_34_x86_64.whl (1.5 MB view details)

Uploaded Mar 1, 2026 CPython 3.11manylinux: glibc 2.34+ x86-64

cortexadb-0.1.5-cp311-cp311-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded Mar 1, 2026 CPython 3.11macOS 11.0+ ARM64

File details

Details for the file cortexadb-0.1.5-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

Download URL: cortexadb-0.1.5-cp313-cp313-macosx_11_0_arm64.whl
Upload date: Mar 1, 2026
Size: 1.3 MB
Tags: CPython 3.13, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cortexadb-0.1.5-cp313-cp313-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`f639cd4c4108753f8bba27c479f92320576e3bec290fa82d1908158788a58593`
MD5	`119ced0b9031bd8834867d7a71d801cf`
BLAKE2b-256	`1bf621579cc93adee160f42f23759e24a957969ff9b6cae01392aa1087b0dc20`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortexadb-0.1.5-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on anaslimem/CortexaDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cortexadb-0.1.5-cp313-cp313-macosx_11_0_arm64.whl
- Subject digest: f639cd4c4108753f8bba27c479f92320576e3bec290fa82d1908158788a58593
- Sigstore transparency entry: 1006527905
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: anaslimem/CortexaDB@aeb395b5f59e733f370dbf02937d44782d7a9a80
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/anaslimem
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@aeb395b5f59e733f370dbf02937d44782d7a9a80
- Trigger Event: push

File details

Details for the file cortexadb-0.1.5-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

Download URL: cortexadb-0.1.5-cp312-cp312-manylinux_2_34_x86_64.whl
Upload date: Mar 1, 2026
Size: 1.5 MB
Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cortexadb-0.1.5-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`f83c4bf010afb3f0bd4b211c6d88b27b36193d896a3c5676eaac401dd4885b79`
MD5	`6d8739da581cfda0d2ca22c9d31bba31`
BLAKE2b-256	`345f7f14fbb7a5e572f4e8e4965c9589189aaad2183aad83f4d3e4bbb85ca99a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortexadb-0.1.5-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on anaslimem/CortexaDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cortexadb-0.1.5-cp312-cp312-manylinux_2_34_x86_64.whl
- Subject digest: f83c4bf010afb3f0bd4b211c6d88b27b36193d896a3c5676eaac401dd4885b79
- Sigstore transparency entry: 1006527913
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: anaslimem/CortexaDB@aeb395b5f59e733f370dbf02937d44782d7a9a80
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/anaslimem
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@aeb395b5f59e733f370dbf02937d44782d7a9a80
- Trigger Event: push

File details

Details for the file cortexadb-0.1.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

Download URL: cortexadb-0.1.5-cp312-cp312-macosx_11_0_arm64.whl
Upload date: Mar 1, 2026
Size: 1.3 MB
Tags: CPython 3.12, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cortexadb-0.1.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`af5ad7dad9aea116290527bca37f3859b91426aa67944da07cf49741dafa6fbd`
MD5	`5d05416d3f7a0697d3e4d836c3913ab3`
BLAKE2b-256	`f85ce391ee829610810e26cd96c0fc4a43fe322263da82bd53f4911ea2ff919c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortexadb-0.1.5-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on anaslimem/CortexaDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cortexadb-0.1.5-cp312-cp312-macosx_11_0_arm64.whl
- Subject digest: af5ad7dad9aea116290527bca37f3859b91426aa67944da07cf49741dafa6fbd
- Sigstore transparency entry: 1006527908
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: anaslimem/CortexaDB@aeb395b5f59e733f370dbf02937d44782d7a9a80
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/anaslimem
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@aeb395b5f59e733f370dbf02937d44782d7a9a80
- Trigger Event: push

File details

Details for the file cortexadb-0.1.5-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

Download URL: cortexadb-0.1.5-cp311-cp311-manylinux_2_34_x86_64.whl
Upload date: Mar 1, 2026
Size: 1.5 MB
Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cortexadb-0.1.5-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`6f403b2769e5d0b649482b10e5dcc25d778214c99d41ad21cecbc5f01550e705`
MD5	`1f63c674fd93a1238d504fbf5b598c66`
BLAKE2b-256	`ff8ba741834569c7b2dcc023facca3d5740643f04c9b82213ea6aed28e072f44`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortexadb-0.1.5-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: release.yml on anaslimem/CortexaDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cortexadb-0.1.5-cp311-cp311-manylinux_2_34_x86_64.whl
- Subject digest: 6f403b2769e5d0b649482b10e5dcc25d778214c99d41ad21cecbc5f01550e705
- Sigstore transparency entry: 1006527911
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: anaslimem/CortexaDB@aeb395b5f59e733f370dbf02937d44782d7a9a80
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/anaslimem
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@aeb395b5f59e733f370dbf02937d44782d7a9a80
- Trigger Event: push

File details

Details for the file cortexadb-0.1.5-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

Download URL: cortexadb-0.1.5-cp311-cp311-macosx_11_0_arm64.whl
Upload date: Mar 1, 2026
Size: 1.3 MB
Tags: CPython 3.11, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cortexadb-0.1.5-cp311-cp311-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`8118d5d6d5108bb2eb66835a575979fe240df52deab136a78a112cb0dfb9dda1`
MD5	`5b739b2fd3afa707850af5795ee3eb85`
BLAKE2b-256	`1b01e0ed059cc1400b409608faf762055f0b473f1cbbb96bb0f1aac43c26d706`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortexadb-0.1.5-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on anaslimem/CortexaDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cortexadb-0.1.5-cp311-cp311-macosx_11_0_arm64.whl
- Subject digest: 8118d5d6d5108bb2eb66835a575979fe240df52deab136a78a112cb0dfb9dda1
- Sigstore transparency entry: 1006527901
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: anaslimem/CortexaDB@aeb395b5f59e733f370dbf02937d44782d7a9a80
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/anaslimem
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@aeb395b5f59e733f370dbf02937d44782d7a9a80
- Trigger Event: push

cortexadb 0.1.5

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Project description

CortexaDB: SQLite for AI Agents

What's New in v0.1.5

What's New in v0.1.4

Quickstart

Python (Recommended)

Installation

Python

Rust

Key Features

HNSW Indexing

Automatic Persistence

HNSW Parameters

Choosing a Distance Metric

Chunking Strategies

File Format Support

API Guide

Core Operations

Configuration Options

Technical Essentials: How it's built

Why Rust?

The Storage Engine

Hybrid Query Engine

Smart Chunking

Versioned Serialization

Benchmarks

Results

Benchmark Methodology

Running Benchmarks

Custom Benchmark Options

License & Status

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance