booklet

A python key-value file database

These details have not been verified by PyPI

Project links

Homepage

Project description

Introduction

Booklet is a pure python key-value file database. It allows for multiple serializers for both the keys and values. Booklet uses the MutableMapping class API which is the same as python’s dictionary in addition to some dbm methods (i.e. sync and prune). It is thread-safe on reads and writes (using thread locks) and multiprocessing-safe (using file locks).

When an error occurs (e.g. trying to access a key that doesn’t exist), booklet will properly close the file and remove the file locks. This will not sync any changes, so the user will lose any changes that were not synced. There will be circumstances that can occur that will not properly close the file, so care still needs to be made.

Installation

Install via pip:

pip install booklet

Serialization

Both the keys and values stored in Booklet must be bytes when written to disk. This is the default when “open” is called. Booklet allows for various serializers to be used for taking input keys and values and converting them to bytes. There are many in-built serializers. Check the booklet.available_serializers list for what’s available. Some serializers require additional packages to be installed (e.g. orjson, zstd, etc). If you want to serialize to json, then it is highly recommended to use orjson or msgpack as they are substantially faster than the standard json python module. If in-built serializers are assigned at initial file creation, then they will be saved on future reading and writing on the same file (i.e. they don’t need to be passed after the first time). Setting a serializer to None will not do any serializing, and the input must be bytes. The user can also pass custom serializers to the key_serializer and value_serializer parameters. These must have “dumps” and “loads” static methods. This allows the user to chain a serializer and a compressor together if desired. Custom serializers must be passed for writing and reading as they are not stored in the booklet file.

import booklet

print(booklet.available_serializers)

Usage

The docstrings have a lot of info about the classes and methods. Files should be opened with the booklet.open function. Read the docstrings of the open function for more details.

Write data using the context manager

import booklet

with booklet.open('test.blt', 'n', value_serializer='pickle', key_serializer='str') as db:
  db['test_key'] = ['one', 2, 'three', 4]

Read data using the context manager

with booklet.open('test.blt', 'r') as db:
  test_data = db['test_key']

Notice that you don’t need to pass serializer parameters when reading (and additional writing) when in-built serializers are used. Booklet stores this info on the initial file creation.

In most cases, the user should use python’s context manager “with” when reading and writing data. This will ensure data is properly written and locks are released on the file. If the context manager is not used, then the user must be sure to run the db.sync() (or db.close()) at the end of a series of writes to ensure the data has been fully written to disk. Only after the writes have been synced can additional reads occur. Make sure you close your file or you’ll run into file deadlocks!

Write data without using the context manager

import booklet

db = booklet.open('test.blt', 'n', value_serializer='pickle', key_serializer='str')

db['test_key'] = ['one', 2, 'three', 4]
db['2nd_test_key'] = ['five', 6, 'seven', 8]

db.sync()  # Normally not necessary if the user closes the file after writing
db.close() # Will also run sync as part of the closing process

Read data without using the context manager

db = booklet.open('test.blt') # 'r' is the default flag

test_data1 = db['test_key']
test_data2 = db['2nd_test_key']

db.close()

Prune deleted items

When a key/value is “deleted”, it’s actually just flagged internally as deleted and the item is ignored on the following requests. This is the same for keys that get reassigned. To remove these deleted items from the file completely, the user can run the “prune” method. This should only be performed when the user has done a ton of deletes/overwrites as prune can be computationally intensive. There is no performance improvement to removing these items from the file. It’s purely to regain space.

with booklet.open('test.blt', 'w') as db:
  del db['test_key']
  db.prune()

File metadata

The user can assign overall metadata to the file as a json serializable object (i.e. dict or list). The methods are called set_metadata and get_metadata. The metadata is independent from all of the other key/value pairs assigned in the normal way. The metadata won’t be returned with any other methods. If metadata has not already been assigned, the get_metadata method will return None.

with booklet.open('test.blt', 'w') as db:
  db.set_metadata({'meta_key1': 'This is stored as metadata'})
  meta = db.get_metadata()

Item timestamps

Timestamps associated with each assigned item have been implemented, but can be turned off at file initialization. By default it’s on. The timestamps are stored and returned as an int of the number of microseconds in POSIX UTC time. There are new methods to set and get the timestamps. It’s quite new…so please test it!

file_path = 'test.blt'
key = 'test_key2'
value = ['five', 6, 'seven', 8]
with booklet.open(file_path, 'w') as f:
      f[key] = value
      ts_old = f.get_timestamp(key)
      ts_new = booklet.utils.make_timestamp_int()
      f.set_timestamp(key, ts_new)

  with booklet.open(file_path) as f:
      ts_new = f.get_timestamp(key)

Auto Reindexing

Booklet now supports (as of version 0.10) automatic reindexing and consequently the user no longer needds to worry about setting an appropriate n_buckets values. When the load factor (number of keys / number of buckets) exceeds 1.0, the booklet will automatically increase the number of buckets and reindex the file to maintain performance. This occurs when the booklet is synced. This ensures that the database remains fast even as it grows beyond the initial n_buckets setting.

Parallel map

The map method applies a function to items in the booklet using multiple worker processes, writing the results back to the same file or to a separate output booklet. This is useful when you have CPU-intensive transformations to apply to many items.

The user function must be a picklable top-level function (not a lambda or closure) with the signature func(key, value) -> (new_key, new_value) or None to skip an item.

Transform all values in-place

def double_value(key, value):
    return (key, value * 2)

with booklet.open('data.blt', 'w') as db:
    stats = db.map(double_value, n_workers=4)
    # stats == {'processed': N, 'written': N, 'errors': 0}

Write results to a separate output file

def transform(key, value):
    return (key, expensive_computation(value))

with booklet.open('input.blt', 'r') as input_db:
    with booklet.open('output.blt', 'n', value_serializer='pickle', key_serializer='str') as output_db:
        stats = input_db.map(transform, write_db=output_db, n_workers=8)

Process specific keys only

keys_to_process = ['key1', 'key5', 'key10']

with booklet.open('data.blt', 'w') as db:
    stats = db.map(my_func, keys=keys_to_process, n_workers=4)

Skip certain items

def selective_process(key, value):
    if value > threshold:
        return (key, expensive_computation(value))
    return None  # skip this item

with booklet.open('data.blt', 'w') as db:
    stats = db.map(selective_process)

Custom serializers

import orjson

class Orjson:
  def dumps(obj):
      return orjson.dumps(obj, option=orjson.OPT_NON_STR_KEYS | orjson.OPT_OMIT_MICROSECONDS | orjson.OPT_SERIALIZE_NUMPY)
  def loads(obj):
      return orjson.loads(obj)

with booklet.open('test.blt', 'n', value_serializer=Orjson, key_serializer='str') as db:
  db['test_key'] = ['one', 2, 'three', 4]

The Orjson class is actually already built into the package. You can pass the string ‘orjson’ to either serializer parameters to use the above serializer. This is just an example of a custom serializer.

Here’s another example with compression.

import orjson
import zstandard as zstd

class OrjsonZstd:
  def dumps(obj):
      return zstd.compress(orjson.dumps(obj, option=orjson.OPT_NON_STR_KEYS | orjson.OPT_OMIT_MICROSECONDS | orjson.OPT_SERIALIZE_NUMPY))
  def loads(obj):
      return orjson.loads(zstd.decompress(obj))

with booklet.open('test.blt', 'n', value_serializer=OrjsonZstd, key_serializer='str') as db:
  db['big_test'] = list(range(1000000))

with booklet.open('test.blt', 'r', value_serializer=OrjsonZstd) as db:
  big_test_data = db['big_test']

If you use a custom serializer, then you’ll always need to pass it to booklet.open for additional reading and writing.

The open flag follows the standard dbm options:

Value	Meaning
'r'	Open existing database for reading only (default)
'w'	Open existing database for reading and writing
'c'	Open database for reading and writing, creating it if it doesn’t exist
'n'	Always create a new, empty database, open for reading and writing

Design

VariableValue (default)

There are two groups in a booklet file plus some initial bytes for parameters (sub index). The sub index is 200 bytes long, but currently only 37 bytes are used. The two other groups are the bucket index group and the data block group. The bucket index group contains the “hash table”. This bucket index contains a fixed number of buckets (n_buckets) and each bucket contains a 6 byte integer of the position of the first data block associated with that bucket. When the user requests a value from a key input, the key is hashed and the modulus of the n_buckets is performed to determine which bucket to read. The 6 bytes is read from that bucket, converted to an integer, then booklet knows where the first data block is located in the file. The data block group contains all of the data blocks each of which contains the key hash, next data block pos, key length, value length, timestamp (if init with timestamps), key, and value (in this order).

The number of bytes per data block object includes: key hash: 13 next data block pos: 6 key length: 2 value length: 4 timestamp: either 0 (if init without timestamps) or 7 key: variable value: variable

When the first data block pos is determined through the initial key hashing and bucket reading, the first 19 bytes (key hash and next data block pos) are read. Booklet then checks the next data block pos (ndbp). If the ndbp is 0, then it has been assigned the delete flag and is ignored. The key hash from the data block is compared to the key hash from the input. If they are the same, then this is the data block we want. If they are different, then we look again at the ndbp. If the ndbp is 1, then this is the last data block associated with the key hash and the input key hash doesn’t exist. If the ndbp is > 1, then we move to the next data block based on the ndbp and try the cycle again until either we hit a dead end or we find the same key hash.

When we find the identical key hash, Booklet reads 6 bytes (key len and value len) to determine how many bytes are needed to be read to get the key/value (since they are variable). Depending on whether the user wants the key, value, and/or timestamp, Booklet will read 7 bytes (timestamp len) plus the number of bytes for the key and value.

Deletes assign ndbp to 0 and reassign the prior data block it’s original ndbp. This essentially just removes this data block from the key hash data block chain. A delete also happens when a user “overwrites” the same key.

A “prune” method has been created that allows the user to remove “deleted” items. It has one optional parameter. If timestamps have been initialized in booklet, then the user can pass a timestamp that will remove all items older than that timestamp.

FixedValue

The main difference from VariableValue is that the value length is globally fixed. The data block in a FixedValue object does not contain the value length as the value will always be the same global value length. The main advantage of this difference is that any overwrites of the same key can be written back to the same location on the file instead of always being appended to the end of the file. If a use-case includes many overwrites and the values are always the same size, then the FixedValue object is ideal.

There are currently no timestamps in the FixedValue. This could be enabled in the future.

Benchmarks

From my initial tests, the performance is comparable to other very fast key-value databases (e.g. gdbm, lmdb) and faster than sqlitedict.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.12.5

Jul 4, 2026

0.12.4

Jul 2, 2026

0.12.3

Jul 1, 2026

0.12.2

Apr 6, 2026

0.12.1

Mar 1, 2026

0.12.0

Feb 21, 2026

0.11.0

Feb 14, 2026

0.10.2

Feb 11, 2026

0.10.1

Feb 11, 2026

0.10.0

Feb 10, 2026

0.9.3

Feb 9, 2026

0.9.2

Jul 14, 2025

0.9.1

Jun 16, 2025

0.8.0

Jun 14, 2025

0.7.8

Mar 18, 2025

0.7.7

Feb 23, 2025

0.7.6

Dec 30, 2024

0.7.5

Oct 26, 2024

0.7.4

Oct 21, 2024

0.7.3

Oct 20, 2024

0.7.2

Oct 16, 2024

0.7.1

Oct 16, 2024

0.7.0

Oct 15, 2024

0.6.6

Oct 14, 2024

0.6.5

Oct 13, 2024

0.6.4

Oct 12, 2024

0.6.3

Oct 9, 2024

0.6.2

Oct 8, 2024

0.6.1

Oct 7, 2024

0.6.0

Oct 7, 2024

0.5.2

Aug 6, 2024

0.5.1

Aug 6, 2024

0.5.0

Aug 6, 2024

0.4.0

Aug 1, 2024

0.3.0

Jul 23, 2024

0.2.0

Jul 18, 2024

0.1.15

May 9, 2024

0.1.14

Apr 22, 2024

0.1.13

Apr 22, 2024

0.1.12

Mar 14, 2024

0.1.11

Mar 11, 2024

0.1.10

Mar 11, 2024

0.1.9

Mar 11, 2024

0.1.8

Mar 11, 2024

0.1.7

Mar 10, 2024

0.1.6

Mar 10, 2024

0.1.5

Mar 10, 2024

0.1.4

Mar 10, 2024

0.1.3

Mar 10, 2024

0.1.2

Mar 10, 2024

0.1.1

Feb 29, 2024

0.1.0

Feb 29, 2024

0.0.18

Jan 26, 2023

0.0.17

Jan 17, 2023

0.0.16

Jan 15, 2023

0.0.15

Jan 14, 2023

0.0.14

Jan 13, 2023

0.0.13

Jan 13, 2023

0.0.12

Jan 13, 2023

0.0.11

Jan 12, 2023

0.0.10

Jan 12, 2023

0.0.9

Jan 12, 2023

0.0.8

Jan 11, 2023

0.0.7

Jan 11, 2023

0.0.6

Jan 11, 2023

0.0.5.dev2 pre-release

Jan 11, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

booklet-0.12.5.tar.gz (31.8 kB view details)

Uploaded Jul 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

booklet-0.12.5-py3-none-any.whl (33.7 kB view details)

Uploaded Jul 4, 2026 Python 3

File details

Details for the file booklet-0.12.5.tar.gz.

File metadata

Download URL: booklet-0.12.5.tar.gz
Upload date: Jul 4, 2026
Size: 31.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.7

File hashes

Hashes for booklet-0.12.5.tar.gz
Algorithm	Hash digest
SHA256	`9ecdd404044c423ca7bd9db29c2ddfff939d38843619ec1bd06c8b2183abc653`
MD5	`2732abc57a8c2da7878c7ca626451885`
BLAKE2b-256	`948d09f5a6134a6a2f327330da5f1643089dc6d0d8fb26f8b4bce1db2d475b6d`

See more details on using hashes here.

File details

Details for the file booklet-0.12.5-py3-none-any.whl.

File metadata

Download URL: booklet-0.12.5-py3-none-any.whl
Upload date: Jul 4, 2026
Size: 33.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.7

File hashes

Hashes for booklet-0.12.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b278cd4fb869704c1c33a0812fe655d20851dfde7af406f9c358691d11b813b`
MD5	`df50223dcc9c712641867789b9c7ee84`
BLAKE2b-256	`20d34f05bc13cf8f4f98306daf63a5aa000d23a8bceacd7b94b3df10ed7e4d18`

See more details on using hashes here.

booklet 0.12.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Introduction

Installation

Serialization

Usage

Write data using the context manager

Read data using the context manager

Write data without using the context manager

Read data without using the context manager

Prune deleted items

File metadata

Item timestamps

Auto Reindexing

Parallel map

Transform all values in-place

Write results to a separate output file

Process specific keys only

Skip certain items

Custom serializers

Design

VariableValue (default)

FixedValue

Benchmarks

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes