Python writable in-memory virtual filesystem for SQLite
Project description
sqlite-memory-vfs
Python virtual filesystem for SQLite to read from and write to memory.
While SQLite supports the special filename :memory:
that allows the creation of empty databases in memory, and sqlite_deserialize
allows the population of an in-memory database from a contiguous block of raw bytes of a serialized database, there is no built-in way to populate such a database using non-contiguous raw bytes of a serialized database. The function sqlite_serialize
can also only serialize a database to a contiguous block of memory. This virtual filesystem overcomes these limitations, and so allows larger databases to be downloaded and queried without hitting disk.
Based on simonwo's gist and uktrade's sqlite-s3vfs, and inspired by phiresky's sql.js-httpvfs, dacort's Stack Overflow answer and michalc's sqlite-s3-query.
Installation
sqlite-memory-vfs can be installed from PyPI using pip
.
pip install sqlite-memory-vfs
This will automatically install APSW along with any other dependencies.
Deserializing (getting a regular SQLite file into the VFS)
This library allows the raw bytes of a SQLite database to be queried without having to save it to disk. This can be done by using the deserialize_iter
method of MemoryVFS
, passing it an iterable of bytes
instances that contain the SQLite database.
import apsw
import httpx
import sqlite_memory_vfs
memory_vfs = sqlite_memory_vfs.MemoryVFS()
# Any iterable of bytes can be used. In this example, they come via HTTP
with httpx.stream("GET", "https://data.api.trade.gov.uk/v1/datasets/uk-trade-quotas/versions/v1.0.366/data?format=sqlite") as r:
memory_vfs.deserialize_iter('quota_balances.sqlite', r.iter_bytes())
with apsw.Connection('quota_balances.sqlite', vfs=memory_vfs.name) as db:
cursor = db.cursor()
cursor.execute('SELECT * FROM quotas;')
print(cursor.fetchall())
If the deserialize_iter
step is ommitted an empty database is automatically created in memory.
See the APSW documentation for more usage examples.
Serializing (getting a regular SQLite file out of the VFS)
The bytes corresponding to each SQLite database in the VFS can be extracted with the serialize_iter
function, which returns an iterable of bytes
with open('my_db.sqlite', 'wb') as f:
for chunk in memory_vfs.serialize_iter('my_db.sqlite'):
f.write(chunk)
Comparison with sqlite_deserialize
The main reason for using sqlite-memory-vfs over sqlite_deserialize
is the lower memory usage for larger databases. For example the following may not even complete due to running out of memory:
import resource
import apsw
import httpx
url = "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/v4.0.46/data?format=sqlite"
with apsw.Connection(':memory:') as db:
db.deserialize('main', httpx.get(url).read())
cursor = db.cursor()
cursor.execute('SELECT * FROM measures;')
print(cursor.fetchall())
print('Max memory usage:', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
But the following does / should output a lower value of memory usage:
import resource
import apsw
import httpx
import sqlite_memory_vfs
url = "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/v4.0.46/data?format=sqlite"
memory_vfs = sqlite_memory_vfs.MemoryVFS()
with httpx.stream("GET", url) as r:
memory_vfs.deserialize_iter('tariff.sqlite', r.iter_bytes())
with apsw.Connection('tariff.sqlite', vfs=memory_vfs.name) as db:
cursor = db.cursor()
cursor.execute('SELECT count(*) FROM measures;')
print(cursor.fetchall())
print('Max memory usage:', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
Tests
The tests require the dev dependencies installed
pip install -e ".[dev]"
and can then run with pytest
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sqlite_memory_vfs-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee64365520082dcb1495ee1a09981c673fbcd5c55546f308d3c8abfcf25c087b |
|
MD5 | ca8a65f018a3af951e95074957af3f56 |
|
BLAKE2b-256 | a8b66194c80187467a2c3b7f186c32564e66abc88e1ce7fb823234c8519092b5 |