Python function to query a SQLite file stored on S3
Project description
sqlite-s3-query
Python function to query a SQLite file stored on S3. It uses multiple HTTP range requests per query to avoid downloading the entire file, and so is suitable for large databases.
All the HTTP requests for a query request the same version of the database object in S3, so queries should complete succesfully even if the database is replaced concurrently by another S3 client. Versioning must be enabled on the S3 bucket.
Operations that write to the database are not supported.
Inspired by phiresky's sql.js-httpvfs, and dacort's Stack Overflow answer.
Installation
sqlite-s3-query depends on APSW, which is not available on PyPI, but can be installed directly from GitHub.
pip install sqlite_s3_query
pip install https://github.com/rogerbinns/apsw/releases/download/3.36.0-r1/apsw-3.36.0-r1.zip --global-option=fetch --global-option=--version --global-option=3.36.0 --global-option=--all --global-option=build --global-option=--enable-all-extensions
Usage
from sqlite_s3_query import sqlite_s3_query
results_iter = sqlite_s3_query(
'SELECT * FROM my_table WHERE my_column = ?', params=('my-value',),
url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
)
for row in results_iter:
print(row)
If in your project you use multiple queries to the same file, functools.partial
can be used to make an interface with less duplication.
from functools import partial
from sqlite_s3_query import sqlite_s3_query
query_my_db = partial(sqlite_s3_query,
url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
)
for row in query_my_db('SELECT * FROM my_table WHERE my_col = ?', params=('my-value',)):
print(row)
for row in query_my_db('SELECT * FROM my_table_2 WHERE my_col = ?', params=('my-value',)):
print(row)
The AWS region and the credentials are taken from environment variables, but this can be changed using the get_credentials
parameter. Below shows the default implementation of this that can be overriden.
import os
from functools import partial
from sqlite_s3_query import sqlite_s3_query
query_my_db = partial(sqlite_s3_query
url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
get_credentials=lambda: (
os.environ['AWS_DEFAULT_REGION'],
os.environ['AWS_ACCESS_KEY_ID'],
os.environ['AWS_SECRET_ACCESS_KEY'],
os.environ.get('AWS_SESSION_TOKEN'), # Only needed for temporary credentials
),
)
for row in query_my_db('SELECT * FROM my_table_2 WHERE my_col = ?', params=('my-value',)):
print(row)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sqlite_s3_query-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b239465e79ed6e7068b4a1de2a82a23f943dce78fe1d3a293f23ce6aa48785af |
|
MD5 | c23777a3ce75dd93c6318a2e0ea98d25 |
|
BLAKE2b-256 | c0ab10f472e95c388abdd515f780fcaefc15dd789faf6826f7ce90d117a6464e |