Skip to main content

Query SQLite databases on S3 using s3fs

Project description

s3sqlite

Query SQLite databases in S3 using s3fs

APSW SQLite VFS. This VFS enables reading databases from S3 using s3fs. This only supports reading operations, any operation that tries to modify the DB file is ignored.

Inspired by sqlite-s3vfs and sqlite-s3-query.

Notes about journal mode

This VFS will only work when the DB file is in any journal mode that is not WAL. However, it will work if you set the journal mode to something else just before uploading the file to S3. You can (and probably should) use WAL mode to generate the DB. Then you can change the journal mode (and the page size if you neeed) before uploading it to S3.

The test suite includes tests for that use case. Take into account that the page size can't be changed when the database is in WAL mode. You need to change it before setting the WAL mode or by setting the database to rollback journal mode. You need to execute VACUUM; after changing the page size in a SQLite database.

Example usage

import s3fs
import s3sqlite
import apsw

# Create an S3 filesystem. Check the s3fs docs for more examples:
# https://s3fs.readthedocs.io/en/latest/
s3 = s3fs.S3FileSystem(
    key="somekey",
    secret="secret",
    client_kwargs={"endpoint_url": "http://..."},
)

s3vfs = s3sqlite.S3VFS(name="s3-vfs", fs=s3)

# Define the S3 location
key_prefix = "mybucket/awesome.sqlite3"

# Upload the file to S3
s3vfs.upload_file("awesome.sqlite3", dest=key_prefix)

# Create a database and query it
with apsw.Connection(
    key_prefix, vfs=s3vfs.name, flags=apsw.SQLITE_OPEN_READONLY
) as conn:

    cursor = conn.execute("...")
    print(cursor.fetchall())

Installation

python3 -m pip install s3sqlite

Run tests

The testing script will use the Chinook database, it will modify (and VACUUM;) the file to use all the possible combinations of journal modes and page sizes

  1. Download the chinook database:
curl https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite_AutoIncrementPKs.sqlite -o chinook.sqlite3
  1. Make sure you have Docker installed.

The testing script will start a MinIO container to run the tests locally. After the tests finish, the container will be stopped atuomatically.

  1. Run the tests:
python3 -m pytest test.py

Alternatives

  • sqlite-s3vfs: This VFS stores the SQLite file as separate DB pages. This enables having a single writer without having to overwrite the whole file. s3sqlite's main difference is that this just needs uploading a single file to S3. sqlite-s3vfs will split the database in pages and upload the pages separately to a bucket prefix. Having just a single file has some advantages, like making use of object versioning in the bucket. I also think that relying on s3fs makes the VFS more flexible than calling boto3 as sqlite3-s3vfs does.
  • sqlite-s3-query: This VFS is very similar to s3sqlit, but it uses ctypes directly to create the VFS and uses httpx to make requests to S3.

I decided to create a new VFS that didn't require using ctypes so that it's easier to understand and maintain, but I still want to have a single file in S3 (vs. separate DB pages). At the same time, by using s3f3 I know I can use any S3 storage supported by that library.

Other

The Chinook database used for testing can be obtained from: https://github.com/lerocha/chinook-database/

The testing section in this README contains a command you can run to get the file.

License

Distributed under the Apache 2.0 license. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3sqlite-0.2.1.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

s3sqlite-0.2.1-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file s3sqlite-0.2.1.tar.gz.

File metadata

  • Download URL: s3sqlite-0.2.1.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for s3sqlite-0.2.1.tar.gz
Algorithm Hash digest
SHA256 50d394af3ed2db0104fd4fcdaed96e0ecaf9371f8ab083191fcc472239991a2d
MD5 134a9269d41eda545c337dde682cca2e
BLAKE2b-256 5c89d3e60b76dc5432257e29634c987af8b59805ff769cf89415a7f7bb1fc31c

See more details on using hashes here.

File details

Details for the file s3sqlite-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: s3sqlite-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for s3sqlite-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5f27c0a10780de7a04efbdf69ece295ca378543e3ea26fb3be2aa532298308ed
MD5 7b29b2d92c646c3ca1bfe105ae19e51f
BLAKE2b-256 37f8a24ff194debd5d9f015c86a8ed63dddd99b4bd809474c304a05eb4a23e02

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page