Skip to main content

FerrumDB — Zero-setup embedded document database for Python. No server. No config. Just open a file and go.

Project description

⚡ FerrumDB

A high-performance, embedded document database written from scratch in Rust.
No server. No config files. No migrations. Open a file and go.


What is FerrumDB?

FerrumDB is an embedded key-value database engine built in Rust, designed for applications that need fast local persistence without the overhead of a server process. It is inspired by Bitcask and implements a custom binary log format, in-memory indexing, AES-256-GCM encryption at rest, atomic transactions, and a live web dashboard — all in ~1,000 lines of safe, async Rust.

It also ships Python bindings via PyO3, making it accessible from Python with a single pip install.


🌟 Features

Feature Detail
O(1) reads & writes Append-only log + in-memory HashMap index rebuilt on startup
📄 Native JSON documents Store any structured data; values are serde_json::Value
🔍 Secondary indexing O(1) field lookups via create_index() — maintained live on writes
🔐 AES-256-GCM encryption Per-block encryption with random nonces; data is protected at rest
⚛️ Atomic transactions All-or-nothing batches written as a single log entry
⏱️ Configurable fsync policy Always / Periodic(ms) / Never — tune durability vs. throughput
🖥️ Ferrum Studio Built-in web dashboard (Axum) at localhost:7474
🐍 Python bindings pip install ferrumdb — no Rust toolchain required
🛡️ Crash resilience Log compaction via atomic rename(); incomplete records are skipped
📊 Observability Lock-free atomic metrics: ops/sec, uptime, GET/SET/DELETE counts

🏗️ Architecture

FerrumDB was built ground-up without using an existing storage library. Every layer is custom:

┌─────────────────────────────────────────┐
│                FerrumDB API              │  ← High-level Rust & Python interface
├─────────────────────────────────────────┤
│             StorageEngine               │  ← Core engine: index + log management
│  ┌─────────────────┐  ┌──────────────┐  │
│  │  In-Memory Index │  │ Secondary    │  │
│  │  HashMap<K,V>   │  │ Indexes      │  │
│  │  RwLock async   │  │ HashMap<F,V> │  │
│  └────────┬────────┘  └──────────────┘  │
│           │ append / reads              │
│  ┌────────▼────────────────────────┐    │
│  │   Append-Only Log (AOF)         │    │  ← Bitcask-inspired binary format
│  │   [len: u64][JSON bytes]...     │    │     length-prefixed, sequential
│  └────────┬────────────────────────┘    │
├───────────┼─────────────────────────────┤
│  ┌────────▼────────────────────────┐    │
│  │  AsyncFileSystem trait          │    │  ← Pluggable I/O abstraction
│  │  ┌──────────┐  ┌─────────────┐  │    │
│  │  │   Disk   │  │  Encrypted  │  │    │  ← Decorator pattern
│  │  │  (tokio) │  │  (AES-GCM)  │  │    │     random nonce per block
│  │  └──────────┘  └─────────────┘  │    │
│  └─────────────────────────────────┘    │
└─────────────────────────────────────────┘

Key design decisions:

  • Bitcask AOF: Writes are append-only (fast, sequential I/O). The in-memory index is the source of truth for reads. On startup, the engine replays the log to rebuild state — making recovery deterministic and crash-safe.
  • Pluggable AsyncFileSystem trait: The I/O layer is fully abstracted. DiskFileSystem and EncryptedFileSystem implement the same trait — swapped via the decorator pattern. This makes the storage engine 100% testable without touching disk.
  • AES-256-GCM per block: Each binary record is individually encrypted with a cryptographically random 12-byte nonce. The nonce is stored alongside the ciphertext. GCM authentication tags detect any file tampering.
  • Tokio async throughout: Reads use RwLock (many concurrent readers), writes serialize via write lock. Metrics use AtomicU64 — no lock contention on the hot path.
  • Log compaction: A background compact() rewrites only live (non-expired, non-deleted) records to a temp file, then swaps atomically via rename() — POSIX-atomic, no data loss possible.

⚙️ Technical Stack

Component Technology
Language Rust (2021 edition)
Async runtime Tokio
Serialization serde + serde_json
Encryption aes-gcm (AES-256-GCM)
Web dashboard Axum
Python bindings PyO3 (via maturin)
Benchmarking Criterion
Testing tokio::test + tempfile

📊 Performance

Benchmarked with Criterion on an append-only log with FsyncPolicy::Never (max throughput):

Operation Performance
Single SET ~1–3 µs
Single GET (in-memory) < 1 µs
1,000 sequential SETs ~2–5 ms
100 concurrent SETs (Tokio tasks) ~3–8 ms
Secondary index query (100 docs) < 1 µs

Run benchmarks yourself: cargo bench


🐍 Python Installation & Usage

FerrumDB is available on PyPI. Install it using pip:

pip install ferrumdb
from ferrumdb import FerrumDB

# Zero-setup: creates myapp.db if it doesn't exist
db = FerrumDB.open("myapp.db")

# Store any JSON-serializable value
db.set("user:1", '{"name": "alice", "role": "admin", "score": 99}')
db.set("user:2", '{"name": "bob",   "role": "user",  "score": 45}')

# Read back
print(db.get("user:1"))       # {"name": "alice", "role": "admin", "score": 99}
print(db.count())             # 2
print(db.keys())              # ["user:1", "user:2"]

# Secondary indexing — O(1) field lookups
db.create_index("role")
admins = db.find("role", '"admin"')   # => ["user:1"]

# Delete
db.delete("user:2")

🦀 Rust Installation & Usage

FerrumDB is available on crates.io. Add it to your project:

cargo add ferrumdb
cargo add tokio -F full
cargo add serde_json

Or manually add to your Cargo.toml:

[dependencies]
ferrumdb = "0.1.0"
tokio = { version = "1", features = ["full"] }
serde_json = "1"
use ferrumdb::{FerrumDB, Config, Transaction, FsyncPolicy};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Standard open (zero-setup, uses ferrum.db)
    let db = FerrumDB::open_default().await?;

    // Store documents
    db.set("user:1".into(), json!({"name": "alice", "role": "admin"})).await?;

    // Secondary index query
    db.create_index("role").await?;
    let admins = db.find("role", &json!("admin")).await;

    // Atomic transaction
    let tx = Transaction::new()
        .set("k1".into(), json!({"tag": "blue"}))
        .set("k2".into(), json!({"tag": "red"}))
        .delete("k1".into());
    db.commit(tx).await?;

    // Encrypted database (AES-256-GCM, random nonce per block)
    let key: [u8; 32] = *b"my_super_secret_key_32_bytes_!!?";
    let db_enc = FerrumDB::open(
        Config::new()
            .with_encryption(key)
            .with_fsync_policy(FsyncPolicy::Periodic(std::time::Duration::from_millis(100)))
    ).await?;

    Ok(())
}

🖥️ Ferrum Studio

When you run the REPL, Ferrum Studio auto-launches — an embedded web dashboard to browse, query, and inspect your live database, including real-time operation metrics.

cargo run --release
# 🔥 Ferrum Studio → http://localhost:7474

🖥️ CLI REPL

cargo run
cargo run -- --fsync=always   # strongest durability
Command Description
SET <key> <json> Store a document
GET <key> Retrieve and pretty-print
DELETE <key> Remove a key
KEYS List all keys
COUNT Total number of entries
INDEX <field> Create secondary index on JSON field
FIND <field> <value> Query by indexed field
HELP Show commands + live session metrics

⚠️ Known Limitations

FerrumDB optimizes for simplicity and embedded use cases. Understand the trade-offs:

Limitation Reason Workaround
Entire index in RAM O(1) reads require full HashMap in memory Best for databases < 1 GB
Single-writer only Append-only log has no cross-process lock protocol One process per DB file
No range queries Secondary indexes store exact value matches Use Tantivy for range scans
No nested field indexes Indexes only top-level JSON keys Flatten documents before storing
Blocking compaction Rewrites entire log — hold write lock Schedule during low-traffic
No WAL / MVCC Simpler append-only design Accept occasional contention
No replication Single-file, embedded design Handle replication at app level

Best for: local-first apps, desktop tools, embedded caching, session/config stores, write-heavy workloads.

Not for: large datasets (> 1 GB), complex queries (JOINs, aggregations), multi-writer or distributed scenarios.


Environment Config

set FERRUMDB_FSYNC=always        # sync every write (safest)
set FERRUMDB_FSYNC=never         # never sync (fastest)
set FERRUMDB_FSYNC=periodic:200  # sync every 200ms
let db = FerrumDB::open_from_env().await?;

📝 License

MIT — see LICENSE for details.

Built with 🦀 by Muhammad Usman

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ferrumdb-0.1.1.tar.gz (56.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ferrumdb-0.1.1-cp313-cp313-win_amd64.whl (517.6 kB view details)

Uploaded CPython 3.13Windows x86-64

File details

Details for the file ferrumdb-0.1.1.tar.gz.

File metadata

  • Download URL: ferrumdb-0.1.1.tar.gz
  • Upload date:
  • Size: 56.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for ferrumdb-0.1.1.tar.gz
Algorithm Hash digest
SHA256 27be8f18070f8da133bc5dbbd3cb8c1376d163b2a08cdc612dc8de218518096e
MD5 da5542b8d971b94e68fb7f4a315496ba
BLAKE2b-256 45b00991ed81f49d432f5bccf2c0bd45fda77c9b2cdfa2560ebe6348208b390b

See more details on using hashes here.

File details

Details for the file ferrumdb-0.1.1-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for ferrumdb-0.1.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 3fef37103b0c5c1c220a093a53a5b5d8dea06bfa298f1028d1f8792b55e00f54
MD5 96aada4deb2c4cca0f03395f9c45cc06
BLAKE2b-256 43acde52727df053cf235a7f8bc2e44e76c021b7f596eb1daf05f137fed9c40d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page