Skip to main content

Organise all your data in key/value booklets and sync them with S3

Project description

EBooklet

EBooklet is a Python key-value database that syncs with S3 (AWS or any S3-compatible service). It builds on the Booklet package, providing a MutableMapping (dict-like) interface backed by local files and remote S3 storage.

  • S3 sync — push/pull changes between a local database and an S3 bucket
  • Dict-like API — standard MutableMapping plus dbm-style methods
  • Grouped storage — hash keys into N groups stored as single S3 objects, with automatic byte-range reads
  • Concurrency — thread-safe writes (thread locks), multiprocessing-safe (file locks), and S3 object locking for remote writes

Keys must be strings (S3 object name requirement). Values can use any serializer supported by Booklet.

Installation

pip install ebooklet

Booklet vs EBooklet

Booklet is a single-file key/value database used as the foundation for EBooklet. Booklet manages local data, while EBooklet manages the interaction between local and remote data. It is best to familiarize yourself with Booklet before using EBooklet.

EBooklet is designed so you can primarily work with Booklet locally, then push to S3 later via EBooklet. If you're actively collaborating with others, open the data using EBooklet to prevent conflicts.

Unlike Booklet which uses fast threading and OS-level file locks, EBooklet uses S3 object locking when opened for writing. This ensures only one process has write access to a remote database at a time, but is slower than local file locks.

Quick Start

Connection setup

Create an S3Connection with your credentials and bucket info:

import ebooklet

remote_conn = ebooklet.S3Connection(
    access_key_id='my_key_id',
    access_key='my_secret_key',
    db_key='big_data.blt',
    bucket='my-bucket',
    endpoint_url='https://s3.us-west-001.backblazeb2.com',  # optional, for non-AWS
    db_url='https://my-bucket.org/big_data.blt',            # optional, public URL
)

Read-only shortcut

If you only need to read and have a public URL, pass it directly — no S3Connection needed:

db = ebooklet.open_ebooklet('https://my-bucket.org/big_data.blt', '/tmp/big_data.blt', flag='r')

Open, read, write

with ebooklet.open_ebooklet(remote_conn, '/tmp/big_data.blt', flag='c', value_serializer='pickle') as db:
    db['key1'] = ['one', 2, 'three', 4]
    value = db['key1']

Be careful with flags — using 'n' will delete the remote database in addition to the local one.

Grouped Storage

By default, each key/value pair is stored as a separate S3 object. When num_groups is set, keys are hashed into N groups, each stored as a single S3 object containing all key/value pairs for that bucket.

db = ebooklet.open_ebooklet(remote_conn, '/tmp/big_data.blt', flag='n',
                            value_serializer='pickle', num_groups=64)
  • Keys are assigned to groups via blake2b hash mod num_groups
  • Single-key reads use S3 byte-range GET requests to fetch only the needed bytes
  • Multi-key reads from the same group use a single merged byte-range GET
  • On push, entire affected groups are re-packed and uploaded
  • For existing databases, num_groups is read from S3 metadata (user-provided value is ignored)

Use grouped storage when you have many small values — it reduces the number of S3 objects and can improve read performance through byte-range requests.

Syncing with S3

The changes() method returns a Change object for inspecting and pushing differences between local and remote:

with ebooklet.open_ebooklet(remote_conn, '/tmp/big_data.blt', 'w') as db:
    db['key1'] = 'new value'

    changes = db.changes()

    for change in changes.iter_changes():
        print(change)

    changes.push()     # upload local changes to S3

Use changes.discard() to remove local changes without pushing, or pass specific keys to discard selectively:

    changes.discard()          # discard all local changes
    changes.discard(['key1'])  # discard only key1

Other Methods

Method Description
delete_remote() Delete the entire remote database
copy_remote(remote_conn) Copy the remote to another S3 location. Efficient S3-to-S3 copy when credentials match, otherwise downloads then uploads
load_items(keys=None) Download keys/values to the local file without returning them. Pass None to load everything
get_items(keys) Load then return an iterator of (key, value) pairs
map(func, keys=None, n_workers=None) Apply a function to items in parallel using multiprocessing. func(key, value) should return (new_key, new_value) or None to skip

Remote Connection Groups

Remote connection groups organize and store collections of S3Connection objects. All data from an S3Connection is stored except the access_key and access_key_id. Useful for grouping related or versioned databases together.

They work like a normal EBooklet except they use add instead of set, keys are database UUIDs, and values are dicts of S3Connection parameters plus metadata.

The remote connection must already exist to be added to a group.

remote_conn_rcg = ebooklet.S3Connection(
    access_key_id_rcg, access_key_rcg, db_key_rcg, bucket_rcg,
    endpoint_url=endpoint_url_rcg,
)

with ebooklet.open_rcg(remote_conn_rcg, '/tmp/rcg.blt', 'n') as rcg:
    rcg.add(remote_conn)

    changes = rcg.changes()
    changes.push()

Open Flags

Flag Meaning
'r' Open existing database for reading only (default)
'w' Open existing database for reading and writing
'c' Open database for reading and writing, creating it if it doesn't exist
'n' Always create a new, empty database, open for reading and writing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ebooklet-0.8.2.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ebooklet-0.8.2-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file ebooklet-0.8.2.tar.gz.

File metadata

  • Download URL: ebooklet-0.8.2.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.7

File hashes

Hashes for ebooklet-0.8.2.tar.gz
Algorithm Hash digest
SHA256 db44826b4feaae403571a75f40bcd416edfe3f3248cf0bc4aab71b7158cce1cf
MD5 de41bcaa4943d1ca366654d838f569d7
BLAKE2b-256 f528939439999e37c290ae23aff047dd6b24aac42b09649227e3e7e8dc0c6040

See more details on using hashes here.

File details

Details for the file ebooklet-0.8.2-py3-none-any.whl.

File metadata

  • Download URL: ebooklet-0.8.2-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.7

File hashes

Hashes for ebooklet-0.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 401abdb29a9898e9dea2765ad774f6d982814f2b29448d5eaa91aa431446e6d7
MD5 3cdabc9fd521762ceb03983b3a5b33e8
BLAKE2b-256 542405344ca9018c36cff4e11726e5ba42ec576804ed9db2421308b0e480be14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page