Skip to main content

FerrumDB — Zero-setup embedded document database for Python. No server. No config. Just open a file and go.

Project description

⚡ FerrumDB

A high-performance, embedded document database written from scratch in Rust.
No server. No config files. No migrations. Open a file and go.


What is FerrumDB?

FerrumDB is an embedded key-value database engine built in Rust, designed for applications that need fast local persistence without the overhead of a server process. It is inspired by Bitcask and implements a custom binary log format, in-memory indexing, AES-256-GCM encryption at rest, atomic transactions, and a live web dashboard — all in ~1,000 lines of safe, async Rust.

It also ships Python bindings via PyO3, making it accessible from Python with a single pip install.


🌟 Features

Feature Detail
O(1) reads & writes Append-only log + in-memory HashMap index rebuilt on startup
📄 Native JSON documents Store any structured data; values are serde_json::Value
🔍 Secondary indexing O(1) field lookups via create_index() — maintained live on writes
🔐 AES-256-GCM encryption Per-block encryption with random nonces; data is protected at rest
⚛️ Atomic transactions All-or-nothing batches written as a single log entry
⏱️ Configurable fsync policy Always / Periodic(ms) / Never — tune durability vs. throughput
🖥️ Ferrum Studio Built-in web dashboard (Axum) at localhost:7474
🐍 Python bindings pip install ferrumdb — no Rust toolchain required
🛡️ Crash resilience Log compaction via atomic rename(); incomplete records are skipped
📊 Observability Lock-free atomic metrics: ops/sec, uptime, GET/SET/DELETE counts

🏗️ Architecture

FerrumDB was built ground-up without using an existing storage library. Every layer is custom:

┌─────────────────────────────────────────┐
│                FerrumDB API              │  ← High-level Rust & Python interface
├─────────────────────────────────────────┤
│             StorageEngine               │  ← Core engine: index + log management
│  ┌─────────────────┐  ┌──────────────┐  │
│  │  In-Memory Index │  │ Secondary    │  │
│  │  HashMap<K,V>   │  │ Indexes      │  │
│  │  RwLock async   │  │ HashMap<F,V> │  │
│  └────────┬────────┘  └──────────────┘  │
│           │ append / reads              │
│  ┌────────▼────────────────────────┐    │
│  │   Append-Only Log (AOF)         │    │  ← Bitcask-inspired binary format
│  │   [len: u64][JSON bytes]...     │    │     length-prefixed, sequential
│  └────────┬────────────────────────┘    │
├───────────┼─────────────────────────────┤
│  ┌────────▼────────────────────────┐    │
│  │  AsyncFileSystem trait          │    │  ← Pluggable I/O abstraction
│  │  ┌──────────┐  ┌─────────────┐  │    │
│  │  │   Disk   │  │  Encrypted  │  │    │  ← Decorator pattern
│  │  │  (tokio) │  │  (AES-GCM)  │  │    │     random nonce per block
│  │  └──────────┘  └─────────────┘  │    │
│  └─────────────────────────────────┘    │
└─────────────────────────────────────────┘

Key design decisions:

  • Bitcask AOF: Writes are append-only (fast, sequential I/O). The in-memory index is the source of truth for reads. On startup, the engine replays the log to rebuild state — making recovery deterministic and crash-safe.
  • Pluggable AsyncFileSystem trait: The I/O layer is fully abstracted. DiskFileSystem and EncryptedFileSystem implement the same trait — swapped via the decorator pattern. This makes the storage engine 100% testable without touching disk.
  • AES-256-GCM per block: Each binary record is individually encrypted with a cryptographically random 12-byte nonce. The nonce is stored alongside the ciphertext. GCM authentication tags detect any file tampering.
  • Tokio async throughout: Reads use RwLock (many concurrent readers), writes serialize via write lock. Metrics use AtomicU64 — no lock contention on the hot path.
  • Log compaction: A background compact() rewrites only live (non-expired, non-deleted) records to a temp file, then swaps atomically via rename() — POSIX-atomic, no data loss possible.

⚙️ Technical Stack

Component Technology
Language Rust (2021 edition)
Async runtime Tokio
Serialization serde + serde_json
Encryption aes-gcm (AES-256-GCM)
Web dashboard Axum
Python bindings PyO3 (via maturin)
Benchmarking Criterion
Testing tokio::test + tempfile

📊 Performance

Benchmarked with Criterion on an append-only log with FsyncPolicy::Never (max throughput):

Operation Performance
Single SET ~1–3 µs
Single GET (in-memory) < 1 µs
1,000 sequential SETs ~2–5 ms
100 concurrent SETs (Tokio tasks) ~3–8 ms
Secondary index query (100 docs) < 1 µs

Run benchmarks yourself: cargo bench


🐍 Python Usage

pip install ferrumdb
from ferrumdb import FerrumDB

# Zero-setup: creates myapp.db if it doesn't exist
db = FerrumDB.open("myapp.db")

# Store any JSON-serializable value
db.set("user:1", '{"name": "alice", "role": "admin", "score": 99}')
db.set("user:2", '{"name": "bob",   "role": "user",  "score": 45}')

# Read back
print(db.get("user:1"))       # {"name": "alice", "role": "admin", "score": 99}
print(db.count())             # 2
print(db.keys())              # ["user:1", "user:2"]

# Secondary indexing — O(1) field lookups
db.create_index("role")
admins = db.find("role", '"admin"')   # => ["user:1"]

# Delete
db.delete("user:2")

🦀 Rust Usage

# Cargo.toml
[dependencies]
ferrumdb = "0.1.0"
tokio = { version = "1", features = ["full"] }
serde_json = "1"
use ferrumdb::{FerrumDB, Config, Transaction, FsyncPolicy};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Standard open (zero-setup, uses ferrum.db)
    let db = FerrumDB::open_default().await?;

    // Store documents
    db.set("user:1".into(), json!({"name": "alice", "role": "admin"})).await?;

    // Secondary index query
    db.create_index("role").await?;
    let admins = db.find("role", &json!("admin")).await;

    // Atomic transaction
    let tx = Transaction::new()
        .set("k1".into(), json!({"tag": "blue"}))
        .set("k2".into(), json!({"tag": "red"}))
        .delete("k1".into());
    db.commit(tx).await?;

    // Encrypted database (AES-256-GCM, random nonce per block)
    let key: [u8; 32] = *b"my_super_secret_key_32_bytes_!!?";
    let db_enc = FerrumDB::open(
        Config::new()
            .with_encryption(key)
            .with_fsync_policy(FsyncPolicy::Periodic(std::time::Duration::from_millis(100)))
    ).await?;

    Ok(())
}

🖥️ Ferrum Studio

When you run the REPL, Ferrum Studio auto-launches — an embedded web dashboard to browse, query, and inspect your live database, including real-time operation metrics.

cargo run --release
# 🔥 Ferrum Studio → http://localhost:7474

🖥️ CLI REPL

cargo run
cargo run -- --fsync=always   # strongest durability
Command Description
SET <key> <json> Store a document
GET <key> Retrieve and pretty-print
DELETE <key> Remove a key
KEYS List all keys
COUNT Total number of entries
INDEX <field> Create secondary index on JSON field
FIND <field> <value> Query by indexed field
HELP Show commands + live session metrics

⚠️ Known Limitations

FerrumDB optimizes for simplicity and embedded use cases. Understand the trade-offs:

Limitation Reason Workaround
Entire index in RAM O(1) reads require full HashMap in memory Best for databases < 1 GB
Single-writer only Append-only log has no cross-process lock protocol One process per DB file
No range queries Secondary indexes store exact value matches Use Tantivy for range scans
No nested field indexes Indexes only top-level JSON keys Flatten documents before storing
Blocking compaction Rewrites entire log — hold write lock Schedule during low-traffic
No WAL / MVCC Simpler append-only design Accept occasional contention
No replication Single-file, embedded design Handle replication at app level

Best for: local-first apps, desktop tools, embedded caching, session/config stores, write-heavy workloads.

Not for: large datasets (> 1 GB), complex queries (JOINs, aggregations), multi-writer or distributed scenarios.


Environment Config

set FERRUMDB_FSYNC=always        # sync every write (safest)
set FERRUMDB_FSYNC=never         # never sync (fastest)
set FERRUMDB_FSYNC=periodic:200  # sync every 200ms
let db = FerrumDB::open_from_env().await?;

📝 License

MIT — see LICENSE for details.

Built with 🦀 by Muhammad Usman

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ferrumdb-0.1.0.tar.gz (54.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ferrumdb-0.1.0-cp313-cp313-win_amd64.whl (503.6 kB view details)

Uploaded CPython 3.13Windows x86-64

File details

Details for the file ferrumdb-0.1.0.tar.gz.

File metadata

  • Download URL: ferrumdb-0.1.0.tar.gz
  • Upload date:
  • Size: 54.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for ferrumdb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 110c43f777cb8eb9510f9dca1f4f6b6efae9d7e6e41dba2422e296debc1baae3
MD5 ff6a8acde11f4c2faad5fac6c9dce79b
BLAKE2b-256 b4fbea3afede33a20f68fca4307d0040d1eac194afef878aa3123cd26116ef77

See more details on using hashes here.

File details

Details for the file ferrumdb-0.1.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for ferrumdb-0.1.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 d38dad0b93afab6941e7a79547053296c969c44d5f5141a905224dd859273407
MD5 b8c3019282a1231b79f0c8bcd781649d
BLAKE2b-256 830b77cb42b74d5f478aec2644b7e3f88c36144fdb50aa49a9fb7ae650f67582

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page