Skip to main content

Python writable in-memory virtual filesystem for SQLite

Project description

sqlite-memory-vfs

PyPI package Test suite Code coverage

Python virtual filesystem for SQLite to read from and write to memory.

While SQLite supports the special filename :memory: that allows the creation of empty databases in memory, sqlite_deserialize allows the population of an in-memory database from raw bytes of a serialized database, and sqlite_serialize allows the extraction of the raw bytes of an in-memory database, there are limitations.

  • The function sqlite_deserialize cannot populate a database from non-contiguous raw bytes.
  • The function sqlite_serialize cannot serialize to non-contiguous bytes.
  • Both of these functions only work with databases that are less than 2GB in total, because SQLite will not allocate more than 2GB in one go.

This virtual filesystem overcomes these limitations. Specifically it allows larger databases to be downloaded and queried without hitting disk, and it allows larger databases to be generated and uploaded without hitting disk.

Based on simonwo's gist and uktrade's sqlite-s3vfs, and inspired by phiresky's sql.js-httpvfs, dacort's Stack Overflow answer and michalc's sqlite-s3-query.

Installation

sqlite-memory-vfs can be installed from PyPI using pip.

pip install sqlite-memory-vfs

This will automatically install APSW along with any other dependencies.

Deserializing (getting a regular SQLite file into the VFS)

This library allows the raw bytes of a SQLite database to be queried without having to save it to disk. This can be done by using the deserialize_iter method of MemoryVFS, passing it an iterable of bytes instances that contain the SQLite database.

from contextlib import closing
import apsw
import httpx
import sqlite_memory_vfs

memory_vfs = sqlite_memory_vfs.MemoryVFS()

# Any iterable of bytes can be used. In this example, they come via HTTP
url = "https://data.api.trade.gov.uk/v1/datasets/uk-trade-quotas/versions/v1.0.366/data?format=sqlite"
with \
        httpx.stream("GET", url) as r, \
        closing(apsw.Connection('quota_balances.sqlite', vfs=memory_vfs.name)) as db:

    memory_vfs.deserialize_iter(db, r.iter_bytes())

    cursor = db.cursor()
    cursor.execute('SELECT * FROM quotas;')
    print(cursor.fetchall())

If the deserialize_iter step is ommitted an empty database is automatically created in memory.

See the APSW documentation for more usage examples.

Serializing (getting a regular SQLite file out of the VFS)

The bytes corresponding to each SQLite database in the VFS can be extracted with the serialize_iter function, which returns an iterable of bytes

with \
        open('my_db.sqlite', 'wb') as f, \
        closing(apsw.Connection('quota_balances.sqlite', vfs=memory_vfs.name)) as db:

    for chunk in memory_vfs.serialize_iter(db):
        f.write(chunk)

Concurrency

It should be safe for any number of readers and writers to attempt to access the database - locking is implemented by the VFS which blocks access to the database when a write in in-flight.

If connection gets blocked, then it will raise apsw.BusyError. This is normal SQLite behaviour. You can request that SQLite retry certain actions automatically for a period of time to try to reduce the chance that this surfaces to your code. This can be done by setting a busy timeout, for example to set a 500 millisecond timeout:

PRAGMA busy_timeout = 500;

Under the hood writer starvation is avoided by the use of a PENDING lock, much like the default SQLite VFS that writes to disk.

Comparison with sqlite_deserialize

The main reason for using sqlite-memory-vfs over sqlite_deserialize is the lower memory usage for larger databases. For example the following may not even complete due to not being able to allocate enough contiguous memory for the database:

import resource
from contextlib import closing

import apsw
import httpx

url = "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/v4.0.46/data?format=sqlite"

with closing(apsw.Connection(':memory:')) as db:
    db.deserialize('main', httpx.get(url).read())
    cursor = db.cursor()
    cursor.execute('SELECT * FROM measures;')
    print(cursor.fetchall())

print('Max memory usage:', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

But the following does / should output a lower value of memory usage:

import resource
from contextlib import closing

import apsw
import httpx
import sqlite_memory_vfs

url = "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/v4.0.46/data?format=sqlite"
memory_vfs = sqlite_memory_vfs.MemoryVFS()

with httpx.stream("GET", url) as r:
    memory_vfs.deserialize_iter('tariff.sqlite', r.iter_bytes())

with closing(apsw.Connection('tariff.sqlite', vfs=memory_vfs.name)) as db:
    cursor = db.cursor()
    cursor.execute('SELECT count(*) FROM measures;')
    print(cursor.fetchall())

print('Max memory usage:', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

Tests

The tests require the dev dependencies installed

pip install -e ".[dev]"

and can then run with pytest

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlite_memory_vfs-0.0.14.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

sqlite_memory_vfs-0.0.14-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file sqlite_memory_vfs-0.0.14.tar.gz.

File metadata

  • Download URL: sqlite_memory_vfs-0.0.14.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for sqlite_memory_vfs-0.0.14.tar.gz
Algorithm Hash digest
SHA256 9a31c73e2d2ab7817d590460889552c6d63164460b30e0729f12ddc5f1fa88e2
MD5 9ea0f830622496dcf7163a3842aeb623
BLAKE2b-256 0a9ee35a87bb5ccabf395b6e62bdf7bc8c29fa19e16a18e2fe6062b79353715a

See more details on using hashes here.

File details

Details for the file sqlite_memory_vfs-0.0.14-py3-none-any.whl.

File metadata

File hashes

Hashes for sqlite_memory_vfs-0.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 9f1b972660f86edbf6407455d0738fa3b4eed847e62c818e8e889f95d3a5c938
MD5 8a95ab527b7f91c8c4e8c8c5a476bb67
BLAKE2b-256 227d9a81714636c1669fd0160bedc70756a9e1836be425b46a918a7d7a9e41ee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page