Skip to main content

Python bindings for SQLite's LSM key/value engine

Project description

lsm

Fast Python bindings for SQLite's LSM key/value store. The LSM storage engine was initially written as part of the experimental SQLite4 rewrite (now abandoned). More recently, the LSM source code was moved into the SQLite3 source tree and has seen some improvements and fixes. This project uses the LSM code from the SQLite3 source tree.

Features:

  • Embedded zero-conf database.
  • Keys support in-order traversal using cursors.
  • Transactional (including nested transactions).
  • Single writer/multiple reader MVCC based transactional concurrency model.
  • On-disk database stored in a single file.
  • Data is durable in the face of application or power failure.
  • Thread-safe.
  • Releases GIL for read and write operations (each connection has own mutex)
  • Page compression (lz4 or zstd)
  • Zero dependency static library
  • Python 3.x.

Limitations:

The source for Python lsm is hosted on GitHub.

If you encounter any bugs in the library, please open an issue, including a description of the bug and any related traceback.

Quick-start

Below is a sample interactive console session designed to show some of the basic features and functionality of the lsm Python library.

To begin, instantiate a LSM object, specifying a path to a database file.

from lsm import LSM
db = LSM('test.ldb')
assert db.open()

More pythonic variant is using context manager:

from lsm import LSM
with LSM("test.ldb") as db:
    assert db.info()

Not opened database will raise a RuntimeError:

import pytest
from lsm import LSM

db = LSM('test.ldb')

with pytest.raises(RuntimeError):
    db.info()

Binary/string mode

You should select mode for opening the database with binary: bool = True argument.

For example when you want to store strings just pass binary=False:

from lsm import LSM
with LSM("test_0.ldb", binary=False) as db:
    # must be str for keys and values
    db['foo'] = 'bar'
    assert db['foo'] == "bar"

Otherwise, you must pass keys and values ad bytes (default behaviour):

from lsm import LSM

with LSM("test.ldb") as db:
    db[b'foo'] = b'bar'
    assert db[b'foo'] == b'bar'

Key/Value Features

lsm is a key/value store, and has a dictionary-like API:

from lsm import LSM
with LSM("test.ldb", binary=False) as db:
    db['foo'] = 'bar'
    assert db['foo'] == 'bar'

Database apply changes as soon as possible:

import pytest
from lsm import LSM

with LSM("test.ldb", binary=False) as db:
    for i in range(4):
         db[f'k{i}'] = str(i)

    assert 'k3' in db
    assert 'k4' not in db
    del db['k3']

    with pytest.raises(KeyError):
        print(db['k3'])

By default, when you attempt to look up a key, lsm will search for an exact match. You can also search for the closest key, if the specific key you are searching for does not exist:

import pytest
from lsm import LSM, SEEK_LE, SEEK_GE, SEEK_LEFAST


with LSM("test.ldb", binary=False) as db:
    for i in range(4):
        db[f'k{i}'] = str(i)

    # Here we will match "k1".
    assert db['k1xx', SEEK_LE] == '1'

    # Here we will match "k1" but do not fetch a value
    # In this case the value will always be ``True`` or there will
    # be an exception if the key is not found
    assert db['k1xx', SEEK_LEFAST] is True

    with pytest.raises(KeyError):
        print(db['000', SEEK_LEFAST])

    # Here we will match "k2".
    assert db['k1xx', SEEK_GE] == "2"

LSM supports other common dictionary methods such as:

  • keys()
  • values()
  • items()
  • update()

Slices and Iteration

The database can be iterated through directly, or sliced. When you are slicing the database the start and end keys need not exist -- lsm will find the closest key (details can be found in the LSM.fetch_range() documentation).

from lsm import LSM

with LSM("test_slices.ldb", binary=False) as db:

    # clean database
    for key in db.keys():
        del db[key]

    db['foo'] = 'bar'

    for i in range(3):
        db[f'k{i}'] = str(i)

    # Can easily iterate over the database items
    assert (
        sorted(item for item in db.items()) == [
            ('foo', 'bar'), ('k0', '0'), ('k1', '1'), ('k2', '2')
        ]
    )

    # However, you will not read the entire database into memory, as special
    # iterator objects are used.
    assert str(db['k0':'k99']).startswith("<lsm_slice object at")

    # But you can cast it to the list for example
    assert list(db['k0':'k99']) == [('k0', '0'), ('k1', '1'), ('k2', '2')]

You can use open-ended slices. If the lower- or upper-bound is outside the range of keys an empty list is returned.

with LSM("test_slices.ldb", binary=False, readonly=True) as db:
    assert list(db['k0':]) == [('k0', '0'), ('k1', '1'), ('k2', '2')]
    assert list(db[:'k1']) == [('foo', 'bar'), ('k0', '0'), ('k1', '1')]
    assert list(db[:'aaa']) == []

To retrieve keys in reverse order or stepping over more than one item, simply use a third slice argument as usual. Negative step value means reverse order, but first and second arguments must be ordinarily ordered.

with LSM("test_slices.ldb", binary=False, readonly=True) as db:
    assert list(db['k0':'k99':2]) == [('k0', '0'), ('k2', '2')]
    assert list(db['k0'::-1]) == [('k2', '2'), ('k1', '1'), ('k0', '0')]
    assert list(db['k0'::-2]) == [('k2', '2'), ('k0', '0')]
    assert list(db['k0'::3]) == [('k0', '0')]

You can also delete slices of keys, but note that delete will not include the keys themselves:

with LSM("test_slices.ldb", binary=False) as db:
    del db['k0':'k99']

    # Note that 'k0' still exists.
    assert list(db.items()) == [('foo', 'bar'), ('k0', '0')]

Cursors

While slicing may cover most use-cases, for finer-grained control you can use cursors for traversing records.

from lsm import LSM, SEEK_GE, SEEK_LE

with LSM("test_cursors.ldb", binary=False) as db:
    del db["a":"z"]

    db["spam"] = "spam"

    with db.cursor() as cursor:
        cursor.seek('spam')
        key, value = cursor.retrieve()
        assert key == 'spam'
        assert value == 'spam'

Seeking over cursors:

with LSM("test_cursors.ldb", binary=False) as db:
    db.update({'k0': '0', 'k1': '1', 'k2': '2', 'k3': '3', 'foo': 'bar'})

    with db.cursor() as cursor:

        cursor.first()
        key, value = cursor.retrieve()
        assert key == "foo"
        assert value == "bar"

        cursor.last()
        key, value = cursor.retrieve()
        assert key == "spam"
        assert value == "spam"

        cursor.previous()
        key, value = cursor.retrieve()
        assert key == "k3"
        assert value == "3"

Finding the first match that is greater than or equal to 'k0' and move forward until the key is less than 'k99'

with LSM("test_cursors.ldb", binary=False) as db:
    with db.cursor() as cursor:
        cursor.seek("k0", SEEK_GE)
        results = []

        while cursor.compare("k99") > 0:
            key, value = cursor.retrieve()
            results.append((key, value))
            cursor.next()

    assert results == [('k0', '0'), ('k1', '1'), ('k2', '2'), ('k3', '3')]

Finding the last match that is lower than or equal to 'k99' and move backward until the key is less than 'k0'

with LSM("test_cursors.ldb", binary=False) as db:
    with db.cursor() as cursor:
        cursor.seek("k99", SEEK_LE)
        results = []

        while cursor.compare("k0") >= 0:
            key, value = cursor.retrieve()
            results.append((key, value))
            cursor.previous()

    assert results == [('k3', '3'), ('k2', '2'), ('k1', '1'), ('k0', '0')]

It is very important to close a cursor when you are through using it. For this reason, it is recommended you use the LSM.cursor() context-manager, which ensures the cursor is closed properly.

Transactions

lsm supports nested transactions. The simplest way to use transactions is with the LSM.transaction() method, which returns a context-manager:

from lsm import LSM

with LSM("test_tx.ldb", binary=False) as db:
    del db["a":"z"]
    for i in range(10):
        db[f"k{i}"] = f"{i}"


with LSM("test_tx.ldb", binary=False) as db:
    with db.transaction() as tx1:
        db['k1'] = '1-mod'

        with db.transaction() as tx2:
            db['k2'] = '2-mod'
            tx2.rollback()

    assert db['k1'] == '1-mod'
    assert db['k2'] == '2'

You can commit or roll-back transactions part-way through a wrapped block:

from lsm import LSM

with LSM("test_tx_2.ldb", binary=False) as db:
    del db["a":"z"]
    for i in range(10):
        db[f"k{i}"] = f"{i}"

with LSM("test_tx_2.ldb", binary=False) as db:
    with db.transaction() as txn:
        db['k1'] = 'outer txn'

        # The write operation is preserved.
        txn.commit()

        db['k1'] = 'outer txn-2'

        with db.transaction() as txn2:
            # This is committed after the block ends.
            db['k1'] = 'inner-txn'

        assert db['k1'] == "inner-txn"

        # Rolls back both the changes from txn2 and the preceding write.
        txn.rollback()

        assert db['k1'] == 'outer txn', db['k1']

If you like, you can also explicitly call LSM.begin(), LSM.commit(), and LSM.rollback().

from lsm import LSM

# fill db
with LSM("test_db_tx.ldb", binary=False) as db:
    del db["k":"z"]
    for i in range(10):
        db[f"k{i}"] = f"{i}"


with LSM("test_db_tx.ldb", binary=False) as db:
    # start transaction
    db.begin()
    db['k1'] = '1-mod'

    # nested transaction
    db.begin()
    db['k2'] = '2-mod'
    # rolling back nested transaction
    db.rollback()

    # comitting top-level transaction
    db.commit()

    assert db['k1'] == '1-mod'
    assert db['k2'] == '2'

Thanks to

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lsm-0.5.7.tar.gz (896.2 kB view details)

Uploaded Source

Built Distributions

lsm-0.5.7-cp311-cp311-win_amd64.whl (249.3 kB view details)

Uploaded CPython 3.11 Windows x86-64

lsm-0.5.7-cp311-cp311-macosx_10_9_universal2.whl (1.2 MB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

lsm-0.5.7-cp310-cp310-win_amd64.whl (249.1 kB view details)

Uploaded CPython 3.10 Windows x86-64

lsm-0.5.7-cp310-cp310-macosx_10_9_universal2.whl (1.2 MB view details)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

lsm-0.5.7-cp39-cp39-win_amd64.whl (339.9 kB view details)

Uploaded CPython 3.9 Windows x86-64

lsm-0.5.7-cp39-cp39-macosx_10_9_universal2.whl (1.2 MB view details)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file lsm-0.5.7.tar.gz.

File metadata

  • Download URL: lsm-0.5.7.tar.gz
  • Upload date:
  • Size: 896.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for lsm-0.5.7.tar.gz
Algorithm Hash digest
SHA256 043d5c8d1ffd5ac85fc46c3b4cf713e6af5f3e26b162c5d9cfd7342da0073c06
MD5 eeaac1c1fb8d677d77025f4466ac8865
BLAKE2b-256 fcc3d135b7b00bc559c3da31efc456b06b38ac7c52865ef2de1dc6e20056584a

See more details on using hashes here.

File details

Details for the file lsm-0.5.7-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: lsm-0.5.7-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 249.3 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for lsm-0.5.7-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a91a7a673ef91ad404f76a332d3ef9193b0b3e1956897b8e2b8f5245e2ba474a
MD5 79f4ed6254eec3e2f8d07d8281112c94
BLAKE2b-256 d69e5023c1a1ce95fa8c8253b431b364b57753d1d629628e975d9deaf48d5535

See more details on using hashes here.

File details

Details for the file lsm-0.5.7-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for lsm-0.5.7-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 1aef8645e59d0ea9163650ebc6ae4ccde9a392dab6c0d0147973704f4100d953
MD5 eb3fba4af8b41285645a5bd9a581893e
BLAKE2b-256 f11eb45e9bde71e30a3b0406b3cd14654b38a9d751ebd90ab13c39e2a7917a4c

See more details on using hashes here.

File details

Details for the file lsm-0.5.7-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: lsm-0.5.7-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 249.1 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for lsm-0.5.7-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 8064b8f78cac067ed87f9ed9e6f0310ee115d24c316061c87841887ba70ff8ec
MD5 92c8ae91da5aa87908c69a8f9c40173a
BLAKE2b-256 4b527193bc281e6653ae014cc2b705b6c67fb3e34311c5996b9ebe214180c28b

See more details on using hashes here.

File details

Details for the file lsm-0.5.7-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for lsm-0.5.7-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 d92238c9d8cb4bef629ef1c554d24311b2d6edee03654fdcdd69e995cb11771a
MD5 e5f94522edc0be7ed6a3ca53f7bb6d0c
BLAKE2b-256 77e37dbcc3ac7dd065a0f8127e4bc42400115cdaf1a2cfc272e09c0d3178913f

See more details on using hashes here.

File details

Details for the file lsm-0.5.7-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: lsm-0.5.7-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 339.9 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.13

File hashes

Hashes for lsm-0.5.7-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 eb46f3213bf85944eea116ed5733214c3983362536a07a42eaf1a1920c094bdd
MD5 3e6f48fa2197b36d784976bddc8a7e5f
BLAKE2b-256 9d657cf9b016d3a8580d45545f1d1f2b1df981b3f036e7186c9095e121552135

See more details on using hashes here.

File details

Details for the file lsm-0.5.7-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for lsm-0.5.7-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 c0ea16b25c7e53657361b985036e7f5716fcd6da4911f01b0c9a1cc066e95103
MD5 168a2e5cd4ad338058fc10ac64a009d4
BLAKE2b-256 5d2bfe2cef6ba3f938801e5744c8fb9afaa2b9ee44495f43c99e7cd6871b213b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page