A lightweight educational storage engine with mmap, WAL, and a custom B+ Tree index.
Project description
Markdown# Custom Log-Structured Page Storage Engine with B+ Tree Indexing ๐
A high-performance, crash-resilient key-value storage engine implemented from scratch in Python. This project is built to explore and demonstrate the fundamental systems engineering concepts behind modern transactional databases like SQLite, RocksDB, and MongoDB.
๐๏ธ System Architecture
The engine is built around an append-only, page-allocated storage model optimized for low-latency hardware interaction, backed by a dynamic in-memory index.
โโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Client Application โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ put() / get() / get_range()
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Concurrency Lock Layer โ <-- Thread-safe multi-user protection
โโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโ
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ LRU Memory Cache โ โ Write-Ahead Log โ
โ (Fast RAM Hits) โ โ (wal.log Durability)โ
โโโโโโโโโโโโฌโโโโโโโโโโโ โโโโโโโโโโโโฌโโโโโโโโโโโ
โ โ
โ Cache Miss โ Write-Ahead Protocol
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ Custom B+ Tree Map โโโโโโโโโโโโโ>โ Memory-Mapped Pages โ
โ (Sorted RAM Index) โ โ (4KB data.db) โ
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
๐ ๏ธ Core Engineering FeaturesPage-Based Storage & mmap: The database splits files into fixed 4KB blocks matching physical SSD block sectors. Using Python's mmap module, the database file is mapped directly into virtual memory space, bypassing traditional read/write user-space context switches for direct memory slicing.Custom B+ Tree Indexing: Replaces standard hash-maps with a self-balancing B+ Tree structure. This enables point lookups while introducing native support for sorted alphabetic range queries.Write-Ahead Logging (WAL): Provides atomicity and durability. Mutations are safely flushed to wal.log prior to updating the virtual memory map, guaranteeing zero data loss or database corruption during simulated crashes.Tombstone Deletions & Compaction Engine: Deletions append a 1-byte tombstone marker to disk to prevent slow midpoint file rewrites. A background compaction routine physically sweeps the pages, dropping stale records and dead data to reclaim storage footprints.Thread-Safe Concurrency: Implements fine-grained threading.Lock protection, maintaining complete data integrity across overlapping multi-threaded reader and writer workers.๐ Performance & Benchmark MetricsMeasurements collected sequentially under local hardware workloads (Intel Core Platform / NVMe SSD Platform):Metric OperationThroughput SpeedMechanismSequential Writes~4,900 ops/secLog-Structured Page Appends + WALPoint Reads~220,000+ ops/secVirtual Memory Slicing + LRU CacheRange Queries~14,000 scans/secLinked B+ Tree Leaf Pointer Traversal๐ฌ Architectural Design Decisions & Trade-Offs1. Hash Index vs. B+ Tree IndexApproach A (Hash Map): Offers pure $O(1)$ lookup performance, but makes range sorting mathematically impossible without pulling the whole database into RAM.Approach B (B+ Tree - Selected): Increases lookup complexity slightly to $O(\log n)$, but links leaf nodes together, transforming multi-key range scanning into a highly efficient sequential stroll.2. Append-Only vs. In-Place UpdatesIn-Place Updates: Reduces immediate file footprint sizes, but requires expensive search-and-seek hard disk operations that stall execution threads.Append-Only Logging (Selected): Prioritizes low-latency throughput by treating writes as fast sequential appends, relying on an isolated Compaction cycle to sweep out space bloating during off-peak hours.๐พ Installation & UsageBashpip install custom_db_engine
Basic CRUD OperationsPythonfrom engine import SimpleStorageEngine
# Initialize the storage engine instance
db = SimpleStorageEngine(db_filename="data.db", wal_filename="wal.log")
# Insert data
db.put("user_101", "Alice")
# Low-latency retrieval
record = db.get("user_101")
print(f"Retrieved: {record}") # Output: Alice
# Alphabetic range tracking scan
db.put("user_102", "Bob")
db.put("user_103", "Charlie")
results = db.get_range("user_101", "user_103")
print(results)
db.close()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file custom_db_engine_keyur-1.0.0.tar.gz.
File metadata
- Download URL: custom_db_engine_keyur-1.0.0.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b4e6d0b11d39a26a97f9c434f8528f04f41f75596624bbc1d8b53de6faedd97
|
|
| MD5 |
0c2afcedba8aded3292be7b2294d3ce1
|
|
| BLAKE2b-256 |
4615e2b4d35b32ea9d7c3191bd003e688ad0a5a78357d9ad45e1e2fa23c4bfbb
|
File details
Details for the file custom_db_engine_keyur-1.0.0-py3-none-any.whl.
File metadata
- Download URL: custom_db_engine_keyur-1.0.0-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0039b288ef9097035d9b99d119edf7c3957d6bac5f2a35eee49da2c8b870c777
|
|
| MD5 |
9e05c5fbd49eba789986547ffc56d34f
|
|
| BLAKE2b-256 |
63d2f32fa6f6b0013b437e6f0a4cb82c2601eade5cb153b793c52bced174999d
|