Python writable in-memory virtual filesystem for SQLite
Project description
sqlite-memory-vfs
Python virtual filesystem for SQLite to read from and write to memory.
While SQLite supports the special filename :memory:
that allows the creation of empty databases in memory, and sqlite_deserialize
allows the population of an in-memory database from a contiguous block of raw bytes of a serialized database, there is no built-in way to populate such a database using non-contiguous raw bytes of a serialized database. The function sqlite_serialize
can also only serialize a database to a contiguous block of memory. This virtual filesystem overcomes these limitations, and so allows larger databases to be downloaded and queried without hitting disk.
Based on simonwo's gist and uktrade's sqlite-s3vfs, and inspired by phiresky's sql.js-httpvfs, dacort's Stack Overflow answer and michalc's sqlite-s3-query.
Installation
sqlite-memory-vfs can be installed from PyPI using pip
.
pip install sqlite-memory-vfs
This will automatically install APSW along with any other dependencies.
Deserializing (getting a regular SQLite file into the VFS)
This library allows the raw bytes of a SQLite database to be queried without having to save it to disk. This can be done by using the deserialize_iter
method of MemoryVFS
, passing it an iterable of bytes
instances that contain the SQLite database.
import apsw
import httpx
import sqlite_memory_vfs
memory_vfs = sqlite_memory_vfs.MemoryVFS()
# Any iterable of bytes can be used. In this example, they come via HTTP
with httpx.stream("GET", "https://data.api.trade.gov.uk/v1/datasets/uk-trade-quotas/versions/v1.0.366/data?format=sqlite") as r:
memory_vfs.deserialize_iter('quota_balances.sqlite', r.iter_bytes())
with apsw.Connection('quota_balances.sqlite', vfs=memory_vfs.name) as db:
cursor = db.cursor()
cursor.execute('SELECT * FROM quotas;')
print(cursor.fetchall())
If the deserialize_iter
step is ommitted an empty database is automatically created in memory.
See the APSW documentation for more usage examples.
Serializing (getting a regular SQLite file out of the VFS)
The bytes corresponding to each SQLite database in the VFS can be extracted with the serialize_iter
function, which returns an iterable of bytes
with open('my_db.sqlite', 'wb') as f:
for chunk in memory_vfs.serialize_iter('my_db.sqlite'):
f.write(chunk)
Concurrency
It should be safe for any number of readers and writers to attempt to access the database - locking is implemented by the VFS which blocks access to the database when a write in in-flight.
If connection gets blocked, then it will raise apsw.BusyError
. This is normal SQLite behaviour. You can request that SQLite retry certain actions automatically for a period of time to try to reduce the chance that this surfaces to your code. This can be done by setting a busy timeout, for example to set a 500 millisecond timeout:
PRAGMA busy_timeout = 500;
Under the hood writer starvation is avoided by the use of a PENDING lock, much like the default SQLite VFS that writes to disk.
Comparison with sqlite_deserialize
The main reason for using sqlite-memory-vfs over sqlite_deserialize
is the lower memory usage for larger databases. For example the following may not even complete due to running out of memory:
import resource
import apsw
import httpx
url = "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/v4.0.46/data?format=sqlite"
with apsw.Connection(':memory:') as db:
db.deserialize('main', httpx.get(url).read())
cursor = db.cursor()
cursor.execute('SELECT * FROM measures;')
print(cursor.fetchall())
print('Max memory usage:', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
But the following does / should output a lower value of memory usage:
import resource
import apsw
import httpx
import sqlite_memory_vfs
url = "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/v4.0.46/data?format=sqlite"
memory_vfs = sqlite_memory_vfs.MemoryVFS()
with httpx.stream("GET", url) as r:
memory_vfs.deserialize_iter('tariff.sqlite', r.iter_bytes())
with apsw.Connection('tariff.sqlite', vfs=memory_vfs.name) as db:
cursor = db.cursor()
cursor.execute('SELECT count(*) FROM measures;')
print(cursor.fetchall())
print('Max memory usage:', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
Tests
The tests require the dev dependencies installed
pip install -e ".[dev]"
and can then run with pytest
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sqlite_memory_vfs-0.0.13-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8c735df8f8e7cda53225ddcfd46e441d1dc0e57e9f35d26377e907052de9ff5 |
|
MD5 | e4a02e522b84ada0694f052380f3e89c |
|
BLAKE2b-256 | 1a959d55bdbf2774575aef3e19af3ac9c07d4f3456b925ad34e7581149746f54 |