Skip to main content

A lightweight educational storage engine with mmap, WAL, and a custom B+ Tree index.

Project description

Markdown# Custom Log-Structured Page Storage Engine with B+ Tree Indexing ๐Ÿš€

License: MIT Python 3.8+

A high-performance, crash-resilient key-value storage engine implemented from scratch in Python. This project is built to explore and demonstrate the fundamental systems engineering concepts behind modern transactional databases like SQLite, RocksDB, and MongoDB.

๐Ÿ—๏ธ System Architecture

The engine is built around an append-only, page-allocated storage model optimized for low-latency hardware interaction, backed by a dynamic in-memory index.

              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚   Client Application   โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚  put() / get() / get_range()
                          โ–ผ
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚ Concurrency Lock Layer โ”‚  <-- Thread-safe multi-user protection
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ–ผ                                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”             โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  LRU Memory Cache   โ”‚             โ”‚   Write-Ahead Log   โ”‚
โ”‚   (Fast RAM Hits)   โ”‚             โ”‚  (wal.log Durability)โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜             โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚                                   โ”‚
โ”‚ Cache Miss                        โ”‚ Write-Ahead Protocol
โ–ผ                                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”             โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Custom B+ Tree Map  โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€>โ”‚ Memory-Mapped Pages โ”‚
โ”‚ (Sorted RAM Index)  โ”‚             โ”‚   (4KB data.db)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜             โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
๐Ÿ› ๏ธ Core Engineering FeaturesPage-Based Storage & mmap: The database splits files into fixed 4KB blocks matching physical SSD block sectors. Using Python's mmap module, the database file is mapped directly into virtual memory space, bypassing traditional read/write user-space context switches for direct memory slicing.Custom B+ Tree Indexing: Replaces standard hash-maps with a self-balancing B+ Tree structure. This enables point lookups while introducing native support for sorted alphabetic range queries.Write-Ahead Logging (WAL): Provides atomicity and durability. Mutations are safely flushed to wal.log prior to updating the virtual memory map, guaranteeing zero data loss or database corruption during simulated crashes.Tombstone Deletions & Compaction Engine: Deletions append a 1-byte tombstone marker to disk to prevent slow midpoint file rewrites. A background compaction routine physically sweeps the pages, dropping stale records and dead data to reclaim storage footprints.Thread-Safe Concurrency: Implements fine-grained threading.Lock protection, maintaining complete data integrity across overlapping multi-threaded reader and writer workers.๐Ÿ“ˆ Performance & Benchmark MetricsMeasurements collected sequentially under local hardware workloads (Intel Core Platform / NVMe SSD Platform):Metric OperationThroughput SpeedMechanismSequential Writes~4,900 ops/secLog-Structured Page Appends + WALPoint Reads~220,000+ ops/secVirtual Memory Slicing + LRU CacheRange Queries~14,000 scans/secLinked B+ Tree Leaf Pointer Traversal๐Ÿ”ฌ Architectural Design Decisions & Trade-Offs1. Hash Index vs. B+ Tree IndexApproach A (Hash Map): Offers pure $O(1)$ lookup performance, but makes range sorting mathematically impossible without pulling the whole database into RAM.Approach B (B+ Tree - Selected): Increases lookup complexity slightly to $O(\log n)$, but links leaf nodes together, transforming multi-key range scanning into a highly efficient sequential stroll.2. Append-Only vs. In-Place UpdatesIn-Place Updates: Reduces immediate file footprint sizes, but requires expensive search-and-seek hard disk operations that stall execution threads.Append-Only Logging (Selected): Prioritizes low-latency throughput by treating writes as fast sequential appends, relying on an isolated Compaction cycle to sweep out space bloating during off-peak hours.๐Ÿ’พ Installation & UsageBashpip install custom_db_engine
Basic CRUD OperationsPythonfrom engine import SimpleStorageEngine

# Initialize the storage engine instance
db = SimpleStorageEngine(db_filename="data.db", wal_filename="wal.log")

# Insert data
db.put("user_101", "Alice")

# Low-latency retrieval
record = db.get("user_101")
print(f"Retrieved: {record}") # Output: Alice

# Alphabetic range tracking scan
db.put("user_102", "Bob")
db.put("user_103", "Charlie")
results = db.get_range("user_101", "user_103")
print(results) 

db.close()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

custom_db_engine_keyur-1.0.0.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

custom_db_engine_keyur-1.0.0-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file custom_db_engine_keyur-1.0.0.tar.gz.

File metadata

  • Download URL: custom_db_engine_keyur-1.0.0.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for custom_db_engine_keyur-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6b4e6d0b11d39a26a97f9c434f8528f04f41f75596624bbc1d8b53de6faedd97
MD5 0c2afcedba8aded3292be7b2294d3ce1
BLAKE2b-256 4615e2b4d35b32ea9d7c3191bd003e688ad0a5a78357d9ad45e1e2fa23c4bfbb

See more details on using hashes here.

File details

Details for the file custom_db_engine_keyur-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for custom_db_engine_keyur-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0039b288ef9097035d9b99d119edf7c3957d6bac5f2a35eee49da2c8b870c777
MD5 9e05c5fbd49eba789986547ffc56d34f
BLAKE2b-256 63d2f32fa6f6b0013b437e6f0a4cb82c2601eade5cb153b793c52bced174999d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page