Skip to main content

Python context manager to query a SQLite file stored on S3

Project description

sqlite-s3-query CircleCI Test Coverage

Python context manager to query a SQLite file stored on S3. It uses multiple HTTP range requests per query to avoid downloading the entire file, and so is suitable for large databases.

All queries using the same instance of the context will query the same version of the database object in S3. This means that a context is roughly equivalent to a REPEATABLE READ transaction, and queries should complete succesfully even if the database is replaced concurrently by another S3 client. Versioning must be enabled on the S3 bucket.

SQL statements that write to the database are not supported.

Inspired by phiresky's sql.js-httpvfs, and dacort's Stack Overflow answer.

Installation

sqlite-s3-query depends on APSW, which is not available on PyPI, but can be installed directly from GitHub.

pip install sqlite_s3_query
pip install https://github.com/rogerbinns/apsw/releases/download/3.36.0-r1/apsw-3.36.0-r1.zip --global-option=fetch --global-option=--version --global-option=3.36.0 --global-option=--all --global-option=build --global-option=--enable-all-extensions

Usage

from sqlite_s3_query import sqlite_s3_query

with sqlite_s3_query(url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite') as query:
    for row in query('SELECT * FROM my_table WHERE my_column = ?', params=('my-value',)):
        print(row)

If in your project you query the same object from multiple places, functools.partial can be used to make an interface with less duplication.

from functools import partial
from sqlite_s3_query import sqlite_s3_query

query_my_db = partial(sqlite_s3_query,
    url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
)

with query_my_db() as query:
    for row in query('SELECT * FROM my_table WHERE my_col = ?', params=('my-value',)):
        print(row)

with query_my_db() as query:
    for row in query('SELECT * FROM my_table_2 WHERE my_col = ?', params=('my-value',)):
        print(row)

The AWS region and the credentials are taken from environment variables, but this can be changed using the get_credentials parameter. Below shows the default implementation of this that can be overriden.

import os
from functools import partial
from sqlite_s3_query import sqlite_s3_query

query_my_db = partial(sqlite_s3_query
    url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
    get_credentials=lambda: (
        os.environ['AWS_DEFAULT_REGION'],
        os.environ['AWS_ACCESS_KEY_ID'],
        os.environ['AWS_SECRET_ACCESS_KEY'],
        os.environ.get('AWS_SESSION_TOKEN'),  # Only needed for temporary credentials
    ),
)

with query_my_db() as query:
    for row in query_my_db('SELECT * FROM my_table_2 WHERE my_col = ?', params=('my-value',)):
        print(row)

The HTTP client can be changed by overriding the the default get_http_client parameter, which is shown below.

from functools import partial
import httpx
from sqlite_s3_query import sqlite_s3_query

query_my_db = partial(sqlite_s3_query,
    url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
    get_http_client=lambda: httpx.Client(),
)

with query_my_db() as query:
    for row in query_my_db('SELECT * FROM my_table WHERE my_col = ?', params=('my-value',)):
        print(row)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlite-s3-query-0.0.8.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlite_s3_query-0.0.8-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file sqlite-s3-query-0.0.8.tar.gz.

File metadata

  • Download URL: sqlite-s3-query-0.0.8.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.3

File hashes

Hashes for sqlite-s3-query-0.0.8.tar.gz
Algorithm Hash digest
SHA256 9b0a3f6f52d92b3e7d8c82afef0584fe82c860aefb54bd0048ff1f3efb9de390
MD5 b2b21ab6b3ccffbc48b225a5b4ffd1d0
BLAKE2b-256 aa14cad65f475efbb501da73f4a422278f6291e61b2dc4a91b5c0577228a4c5a

See more details on using hashes here.

File details

Details for the file sqlite_s3_query-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: sqlite_s3_query-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.3

File hashes

Hashes for sqlite_s3_query-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 a89f9d8cf6cce8bab144f1f887860540b297b64cecf33141da4cbc8b3661e292
MD5 b38782d8af46a20e6becfcdff45067a9
BLAKE2b-256 27fa69497f85d130a7d31735506b3cfec8e969985fe480e9f6857ed35098f0ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page