Skip to main content

Embeddable minimal asynchronous on disk DB

Project description

aiodiskdb logo

Minimal, embeddable on-disk DB, tailored for asyncio.


Coverage Status PyPI version PyPI license PyPI pyversions Build Status Chat on Telegram Donate with Bitcoin

aiodiskdb is a lightweight, fast, simple append only database.

To be used in the asyncio event loop.

Install

pip install aiodiskdb

Usage

Start the DB by fire and forget:

from aiodiskdb import AioDiskDB, ItemLocation

db = AioDiskDB('/tmp/aiodiskdb')

loop.create_task(db.start())

Use the db API to write and read data from a coroutine.

async def read_and_write():
    new_data_location: ItemLocation = await db.add(b'data')
    data: bytes = await db.read(location)
    assert data == b'data'

    noted_location = ItemLocation(
        index=0,
        position=80,
        size=1024333
    )
    prev_saved_data: bytes = await db.read(noted_location)
    assert len(prev_saved_data) == 1024333

Stop the DB before closing the application.

await db.stop()

Be alerted when data is actually persisted to disk:

async def callback(timestamp: int, event: WriteEvent):
    human_time = datetime.fromtimestamp(timestamp).isoformat()
    log(f'{human_time} - {event} persisted to disk.')
    await do_something(location)
    
db.events.on_write = callback

Or hook to other events:

db.events.on_start = ...
db.events.on_stop = ...
db.events.on_failure = ...
db.events.on_index_drop = ...

Asynchronous non-blocking

Handle file writes with no locks. Data is appended in RAM and persisted asynchronously, according to customizable settings.

Transactional

"All or nothing" commit. Lock all the DB write operations during commits, still allowing the reads. Ensure an arbitrary sequence of data is persisted to disk.

Transaction is scoped. Data added into a transaction is not available outside until committed.

transaction = await db.transaction()

transaction.add(b'cafe')
transaction.add(b'babe')
transaction.add(b'deadbeef')

locations: typing.Sequence[ItemLocation] = await transaction.commit()

Not-so-append-only

Aiodiskdb is an append-only database. It means you'll never see methods to delete or remove single entries.

However, data pruning is supported, with the following methods:

db.enable_overwrite()
db.rtrim(0, 400)
db.ltrim(8, 900)
db.drop_index(3)
db.disable_overwrite()

These three methods respectively:

  • prune data from the right, at index 0, starting from the location 400 to the index end (rtrim)
  • prune data from the left, at index 8, starting from the beginning to the location 900 (ltrim)
  • drop the whole index 3, resulting in a file deletion: drop_index

All the items locations not involved into a TRIM operation remains unmodified, even after an ltrim.

Highly customizable

The default parameters:

_FILE_SIZE = 128
_FILE_PREFIX = 'data'
_FILE_ZEROS_PADDING = 5
_BUFFER_SIZE = 16
_BUFFER_ITEMS = 1000
_FLUSH_INTERVAL = 30
_TIMEOUT = 30
_CONCURRENCY = 32

can be easily customized. In the following example the files max size is 16 MB, and data is persisted to disk every 1 MB OR every 100 new items OR every minute.

db = AioDiskDB(
    max_file_size=16
    max_buffer_size=1,
    max_buffer_items=100,
    flush_interval=60
)

The max DB size is max_file_size * max_files. With file_padding=5 the max number of files is 10,000.

A DB created with file_padding=5 and max_file_size=16 is capable to store up to 160 GB, or 167,772,160,000 items, at its maximum capacity will allocate 10,000 files.

Try to do its best

Hook the blocking on_stop_signal method to avoid data losses on exit.

import signal
from aiodiskdb import AioDiskDB

db = AioDiskDB(...)

signal.signal(signal.SIGINT, db.on_stop_signal)
signal.signal(signal.SIGTERM, db.on_stop_signal)
signal.signal(signal.SIGKILL, db.on_stop_signal)

Quite enough fast for some use cases

aiodiskdb files

Concurrency tests, part of the unit tests, can be replicated as system benchmark. The followings are performed on a common consumer SSD:

Duration: 14.12s,
Reads: 2271 (~162/s),
Writes: 2014 (~143/s),
Bandwidth: 1000MB (71MB/s),
Avg file size: 508.0kB
Duration: 18.97s,
Reads: 10244 (~540/s),
Writes: 10245 (~540/s),
Bandwidth: 20MB (1.05MB/s),
Avg file size: 1.0kB

Limitations

assert len(data) <= max_buffer_size
assert max_transaction_size < RAM
assert max_file_size < 4096

If rtrim is applied on the current index, the space is reused, otherwise no. With ltrim, once the space is freed, it is not allocated again. With drop_index the discarded index is not reused.

With a lot of data turn-over (pruning by trimming), it may be necessary to set an unusual high file_padding, and increase the database potential size.


Credits

Inspired by the raw block data storage of the bitcoincore blocks database.

Logo by mepheesto.

Notes

Alpha stage. Still under development, use with care and expect data losses.

Donate :heart: Bitcoin to: 3FVGopUDc6tyAP6t4P8f3GkYTJ5JD5tPwV or paypal

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiodiskdb-0.2.3.tar.gz (23.1 kB view details)

Uploaded Source

File details

Details for the file aiodiskdb-0.2.3.tar.gz.

File metadata

  • Download URL: aiodiskdb-0.2.3.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.5

File hashes

Hashes for aiodiskdb-0.2.3.tar.gz
Algorithm Hash digest
SHA256 3aa272de787eb1717764df2bd4cc2d246cfd8f0c2d447f58ddb0b8401e467f9b
MD5 92416f8d52ac4ef91e0703bc9b3e8323
BLAKE2b-256 9e932aca870ac1ff06e216bdb94c89f9282d346bf1a87350364503d56fbb372d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page