Skip to main content

Query SQLite databases on S3 using s3fs

Project description

s3sqlite

Query SQLite databases in S3 using s3fs

APSW SQLite VFS. This VFS enables reading databases from S3. This only supports reading operations, any operation that tries to modify the DB file is ignored.

Inspired by sqlite-s3vfs and sqlite-s3-query.

Example usage

import s3fs

# Create an S3 filesystem. Check the s3fs docs for more examples:
# https://s3fs.readthedocs.io/en/latest/
s3 = s3fs.S3FileSystem(
    key="somekey",
    secret="secret",
    client_kwargs={"endpoint_url": "http://..."},
)

s3vfs = s3sqlite.AbstractVFS(name="s3-vfs", fs=s3)

# Define the S3 location
key_prefix = "mybucket/awesome.sqlite3"

# Upload the file to S3
s3vfs.upload_file(get_db_wal[0], dest=key_prefix)

# Create a database and query it
with apsw.Connection(
    key_prefix, vfs=s3vfs.name, flags=apsw.SQLITE_OPEN_READONLY
) as conn:

    cursor = conn.execute("...")
    print(cursor.fetchall())

Installation

https://github.com/litements/s3sqlite.git

Run tests

The testing script will use the Chinook database, it will modify (and VACUUM;) the file to use all the possible combinations of journal modes and page sizes

  1. Download the chinook database:
curl https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite_AutoIncrementPKs.sqlite -o chinook.sqlite3
  1. Make sure you have Docker installed.

The testing script will start a MinIO container to run the tests locally. After the tests finish, the container will be stopped atuomatically.

  1. Run the tests:
python3 -m pytest test.py

Alternatives

  • sqlite-s3vfs: This VFS stores the SQLite file as separate DB pages. This enables having a single writer without having to overwrite the whole file.
  • sqlite-s3-query: This VFS is very similar to s3sqlite, but this uses directly ctypes to create the VFS and uses httpx to make requests to S3.

I decided to create a new VFS that didn't require using ctypes so that it's easier to understand and maintain, but I still want to have a single file in S3 (vs. separate DB pages). At the same time, by using s3f3 I know I can use any S3 storage supported by that library.

Other

The Chinook database used for testing can be obtained from: https://github.com/lerocha/chinook-database/

The testing section in this README contains a command you can run to get the file.

License

Distributed under the Apache 2.0 license. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3sqlite-0.1.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

s3sqlite-0.1-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file s3sqlite-0.1.tar.gz.

File metadata

  • Download URL: s3sqlite-0.1.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for s3sqlite-0.1.tar.gz
Algorithm Hash digest
SHA256 22b65ee92da656f12ed24b41797b5e7a2071489310699eb1a11ba21759f05912
MD5 c5b678bbb99a3b7c950da2e4ec034b60
BLAKE2b-256 39b065f3637bff877854a896f52e17c92e4a6caf667f294570fa723f0a2063f4

See more details on using hashes here.

File details

Details for the file s3sqlite-0.1-py3-none-any.whl.

File metadata

  • Download URL: s3sqlite-0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for s3sqlite-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6b64a49033971488f81a25761616dcba7714a5c0c0bb2938ddca980aa423a90d
MD5 ba95c1b736e63f1595e50cd2d1837dfc
BLAKE2b-256 364367f63c4ae1f0d5ab666c6f313a28d9f74af6196eb25ed47db0fd1bbbbfe9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page