Fast, embedded vector + graph memory for AI agents
Project description
CortexaDB: SQLite for AI Agents
CortexaDB is a simple, fast, and hard-durable embedded database designed specifically for AI agent memory. It provides a single-file-like experience (no server required) but with native support for vectors, graphs, and temporal search.
Think of it as SQLite, but with semantic and relational intelligence for your agents.
What's New in v0.1.4
- L2/Euclidean Distance - Added support for L2 distance metric in HNSW
- Use
metric: "l2"in index_mode config - Best for image embeddings, recommendation systems, geometric data
- Use
Quickstart
Python (Recommended)
CortexaDB is designed to be extremely easy to use from Python via high-performance Rust bindings.
from cortexadb import CortexaDB
from cortexadb.providers.openai import OpenAIEmbedder
# Open database with embedder (auto-embeds text)
db = CortexaDB.open("agent.mem", embedder=OpenAIEmbedder())
# Store memories
db.remember("The user prefers dark mode.")
db.remember("User works at Stripe.")
# Load a file (TXT, MD, JSON, DOCX, PDF)
db.load("document.pdf", strategy="recursive")
# Ask questions (Semantic Search)
hits = db.ask("What does the user like?")
for hit in hits:
print(f"ID: {hit.id}, Score: {hit.score}")
# Connect memories (Graph Relationships)
db.connect(mid1, mid2, "relates_to")
Installation
Python
CortexaDB is available on PyPI and can be installed via pip:
# Recommended: Install from PyPI
pip install cortexadb
# With document support (DOCX, PDF)
pip install cortexadb[docs]
pip install cortexadb[pdf]
# From GitHub (Install latest release)
pip install "cortexadb @ git+https://github.com/anaslimem/CortexaDB.git#subdirectory=crates/cortexadb-py"
Rust
Add CortexaDB to your Cargo.toml:
[dependencies]
cortexadb-core = { git = "https://github.com/anaslimem/CortexaDB.git" }
Key Features
- Hybrid Retrieval: Combine vector similarity (semantic), graph relations (structural), and recency (temporal) in a single query.
- Smart Chunking: Multiple strategies for document ingestion -
fixed,recursive,semantic,markdown,json. - File Support: Load documents directly - TXT, MD, JSON, DOCX, PDF.
- HNSW Indexing: Ultra-fast approximate nearest neighbor search using USearch (95%+ recall at millisecond latency).
- Hard Durability: Write-Ahead Log (WAL) and Segmented logs ensure your agent never forgets, even after a crash.
- Multi-Agent Namespaces: Isolate memories between different agents or workspaces within a single database file.
- Deterministic Replay: Record operations to a log file and replay them exactly to debug agent behavior or migrate data.
- Automatic Capacity Management: Set
max_entriesormax_bytesand let CortexaDB handle LRU/Importance-based eviction automatically. - Crash-Safe Compaction: Background maintenance that keeps your storage lean without risking data loss.
HNSW Indexing
CortexaDB uses USearch for high-performance approximate nearest neighbor search. Switch between exact and HNSW modes based on your needs:
| Mode | Use Case | Recall | Speed |
|---|---|---|---|
exact |
Small datasets (<10K) | 100% | O(n) |
hnsw |
Large datasets | 95%+ | O(log n) |
Automatic Persistence
HNSW indexing now includes automatic persistence:
- On
checkpoint()- HNSW index is saved to disk - On database close/drop - HNSW index is automatically saved
- On restart - HNSW index is loaded from disk (fast recovery!)
No extra configuration needed - just use index_mode="hnsw" and it just works.
from cortexadb import CortexaDB, HashEmbedder
# Default: exact (brute-force)
db = CortexaDB.open("db.mem", dimension=128)
# Or use HNSW for large-scale search
db = CortexaDB.open("db.mem", dimension=128, index_mode="hnsw")
# HNSW with custom parameters
db = CortexaDB.open("db.mem", dimension=128, index_mode={
"type": "hnsw",
"m": 16, # connections per node
"ef_search": 50, # query-time search width
"ef_construction": 200, # build-time search width
"metric": "cos" # distance metric: "cos" (cosine) or "l2" (euclidean)
})
# L2/Euclidean metric - best for image embeddings, recommendation systems
db = CortexaDB.open("db.mem", dimension=128, index_mode={
"type": "hnsw",
"metric": "l2"
})
HNSW Parameters
| Parameter | Default | Range | Description |
|---|---|---|---|
m |
16 | 4-64 | Connections per node. Higher = more memory, higher recall. |
ef_search |
50 | 10-500 | Query search width. Higher = better recall, slower search. |
ef_construction |
200 | 50-500 | Build search width. Higher = better index, slower build. |
metric |
cos |
cos, l2 |
Distance metric. cos = Cosine, l2 = Euclidean/L2 |
Choosing a Distance Metric
| Metric | Best For | Description |
|---|---|---|
cos (default) |
Text/semantic search | Measures angle between vectors. Ignores magnitude. |
l2 |
Image embeddings, recommendation systems | Measures straight-line distance. Considers both direction and magnitude. |
When to use L2:
- Image embeddings where magnitude matters
- Recommendation systems comparing user ratings
- Geometric data (e.g., GPS coordinates)
- When your embedding model was trained with L2 loss
Trade-offs:
- Speed vs Recall: Increase
ef_searchfor better results, decrease for speed - Memory vs Quality: Increase
mfor higher recall, uses more memory - Build Time vs Quality: Increase
ef_constructionfor better index, slower initial build - Cosine vs L2: Use
cosfor text/semantic search,l2for image/recommendation data
Chunking Strategies
CortexaDB provides 5 smart chunking strategies for document ingestion:
| Strategy | Use Case |
|---|---|
fixed |
Simple character-based with word-boundary snap |
recursive |
General purpose - splits paragraphs → sentences → words |
semantic |
Articles, blogs - split by paragraphs |
markdown |
Technical docs - preserves headers, lists, code blocks |
json |
Structured data - flattens to key-value pairs |
from cortexadb import CortexaDB, chunk
# Use chunk() directly
chunks = chunk(text, strategy="recursive", chunk_size=512, overlap=50)
# Or use db.ingest() / db.load()
db.ingest("text...", strategy="markdown")
db.load("document.pdf", strategy="recursive")
File Format Support
| Format | Extension | Install |
|---|---|---|
| Plain Text | .txt |
Built-in |
| Markdown | .md |
Built-in |
| JSON | .json |
Built-in |
| Word | .docx |
pip install cortexadb[docs] |
.pdf |
pip install cortexadb[pdf] |
API Guide
Core Operations
| Method | Description |
|---|---|
CortexaDB.open(path, ...) |
Opens or creates a database at the specified path. |
.remember(text, ...) |
Stores a new memory. Auto-embeds if an embedder is configured. |
.ingest(text, ...) |
Ingests text with smart chunking. |
.load(path, ...) |
Loads and ingests a file. |
.ask(query, ...) |
Performs a hybrid search across vectors, graphs, and time. |
.connect(id1, id2, rel) |
Creates a directed edge between two memory entries. |
.namespace(name) |
Returns a scoped view of the database for a specific agent/context. |
.delete_memory(id) |
Permanently removes a memory and updates all indexes. |
.compact() |
Reclaims space by removing deleted entries from disk. |
.checkpoint() |
Truncates the WAL and snapshots the current state for fast startup. |
Configuration Options
When calling CortexaDB.open(), you can tune the behavior:
sync:"strict"(safest),"async"(fastest), or"batch"(balanced).max_entries: Limits the total number of memories (triggers auto-eviction).record: Path to a log file for capturing the entire session for replay.
Technical Essentials: How it's built
Click to see the Rust Architecture
Why Rust?
CortexaDB is written in Rust to provide memory safety without a garbage collector, ensuring predictable performance (sub-100ms startup) and low resource overhead—critical for "embedded" use cases where the DB runs inside your agent's process.
The Storage Engine
CortexaDB follows a Log-Structured design:
- WAL (Write-Ahead Log): Every command is first appended to a durable log with CRC32 checksums.
- Segment Storage: Large memory payloads are stored in append-only segments.
- Deterministic State Machine: On startup, the database replays the log into an in-memory state machine. This ensures 100% consistency between the disk and your queries.
Hybrid Query Engine
Unlike standard vector DBs, CortexaDB doesn't just look at distance. Our query planner can:
- Vector: Find semantic matches using Cosine Similarity.
- Graph: Discover related concepts by traversing edges created with
.connect(). - Temporal: Boost or filter results based on when they were "remembered".
Smart Chunking
The chunking engine is built in Rust for performance:
- 5 strategies covering most use cases
- Word-boundary awareness to avoid splitting words
- Overlap support for context continuity
- JSON flattening for structured data
Versioned Serialization
We use a custom versioned serialization layer (with a "magic-byte" header). This allows us to update the CortexaDB engine without breaking your existing database files—it knows how to read "legacy" data while writing new records in the latest format.
License & Status
CortexaDB is currently in Beta (v0.1.4). It is released under the MIT and Apache-2.0 licenses.
We are actively refining the API and welcome feedback!
^ Windows builds are temporarily unavailable due to a Windows compatibility issue in the usearch library.
CortexaDB — Because agents shouldn't have to choose between speed and a soul (memory).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cortexadb-0.1.4-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: cortexadb-0.1.4-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c3e34b0d4b88331add2dee99f306f7ba33ca7cbb21c3a9405585fd261d5f1d1
|
|
| MD5 |
47f693272c757b1aad242667517f6829
|
|
| BLAKE2b-256 |
e8bdccf626a21a7a1a6410218199aa2851f5e448ca3a4185b977d59e03a3d320
|
Provenance
The following attestation bundles were made for cortexadb-0.1.4-cp313-cp313-macosx_11_0_arm64.whl:
Publisher:
release.yml on anaslimem/CortexaDB
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cortexadb-0.1.4-cp313-cp313-macosx_11_0_arm64.whl -
Subject digest:
4c3e34b0d4b88331add2dee99f306f7ba33ca7cbb21c3a9405585fd261d5f1d1 - Sigstore transparency entry: 1004932005
- Sigstore integration time:
-
Permalink:
anaslimem/CortexaDB@582bb29237887abd54d6bafcc391df18b740a7c9 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/anaslimem
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@582bb29237887abd54d6bafcc391df18b740a7c9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cortexadb-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: cortexadb-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d03af38c7196327413ac263dacfb96682f4ee057d3eedf60b8c1fccc49b39cc
|
|
| MD5 |
72c494802e951f2973a2bd6305711d76
|
|
| BLAKE2b-256 |
418ace62911f7e340cf6116111c4e443658ae3de695147e665e44a69116d7e87
|
Provenance
The following attestation bundles were made for cortexadb-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl:
Publisher:
release.yml on anaslimem/CortexaDB
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cortexadb-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl -
Subject digest:
5d03af38c7196327413ac263dacfb96682f4ee057d3eedf60b8c1fccc49b39cc - Sigstore transparency entry: 1004932002
- Sigstore integration time:
-
Permalink:
anaslimem/CortexaDB@582bb29237887abd54d6bafcc391df18b740a7c9 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/anaslimem
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@582bb29237887abd54d6bafcc391df18b740a7c9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cortexadb-0.1.4-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: cortexadb-0.1.4-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9582bf026d7ef07347bca0ec4a5087d8b7e797c12d3ded0c14b88761e81c999d
|
|
| MD5 |
5e800f4b7cd902b81f63e8c1e679ab42
|
|
| BLAKE2b-256 |
779c0ca77757ff07657c06c1b7084a1e920f4e2438322116bc76f424ad93515e
|
Provenance
The following attestation bundles were made for cortexadb-0.1.4-cp312-cp312-macosx_11_0_arm64.whl:
Publisher:
release.yml on anaslimem/CortexaDB
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cortexadb-0.1.4-cp312-cp312-macosx_11_0_arm64.whl -
Subject digest:
9582bf026d7ef07347bca0ec4a5087d8b7e797c12d3ded0c14b88761e81c999d - Sigstore transparency entry: 1004932003
- Sigstore integration time:
-
Permalink:
anaslimem/CortexaDB@582bb29237887abd54d6bafcc391df18b740a7c9 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/anaslimem
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@582bb29237887abd54d6bafcc391df18b740a7c9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cortexadb-0.1.4-cp311-cp311-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: cortexadb-0.1.4-cp311-cp311-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
372c19a9f71cf7d9c35cbee1a3946faebc6bfda8060c1ad33f8c0d054975dbd8
|
|
| MD5 |
b25d53673cd30f3112afe164e3b22e46
|
|
| BLAKE2b-256 |
79b8942bf95f2fdfe79b0dfecfc308a1078c61401ccbbadae4a81a631c61efb2
|
Provenance
The following attestation bundles were made for cortexadb-0.1.4-cp311-cp311-manylinux_2_34_x86_64.whl:
Publisher:
release.yml on anaslimem/CortexaDB
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cortexadb-0.1.4-cp311-cp311-manylinux_2_34_x86_64.whl -
Subject digest:
372c19a9f71cf7d9c35cbee1a3946faebc6bfda8060c1ad33f8c0d054975dbd8 - Sigstore transparency entry: 1004932007
- Sigstore integration time:
-
Permalink:
anaslimem/CortexaDB@582bb29237887abd54d6bafcc391df18b740a7c9 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/anaslimem
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@582bb29237887abd54d6bafcc391df18b740a7c9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cortexadb-0.1.4-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: cortexadb-0.1.4-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ead57a04396bd543d6aa98b5e74471f3ff1d85acf2b11cd57d0a775b2c0cbf33
|
|
| MD5 |
d7e4ab4cc38b79e05dee5d5beaeb27ce
|
|
| BLAKE2b-256 |
2de4951cf8ecc85a0f5b61b1e9bf8b50f3313ff76ebd8279e5ad3e4c241c0749
|
Provenance
The following attestation bundles were made for cortexadb-0.1.4-cp311-cp311-macosx_11_0_arm64.whl:
Publisher:
release.yml on anaslimem/CortexaDB
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cortexadb-0.1.4-cp311-cp311-macosx_11_0_arm64.whl -
Subject digest:
ead57a04396bd543d6aa98b5e74471f3ff1d85acf2b11cd57d0a775b2c0cbf33 - Sigstore transparency entry: 1004932008
- Sigstore integration time:
-
Permalink:
anaslimem/CortexaDB@582bb29237887abd54d6bafcc391df18b740a7c9 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/anaslimem
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@582bb29237887abd54d6bafcc391df18b740a7c9 -
Trigger Event:
push
-
Statement type: