Skip to main content

Query SQLite databases on S3 using s3fs or smart_open

Project description

s3sqlite

Query SQLite databases in S3 using s3fs and smart_open

APSW SQLite VFS. This VFS enables reading databases from S3 using s3fs or smart_open. This only supports reading operations, any operation that tries to modify the DB file is ignored.

Clone of s3sqlite with additional smart_open support.

Inspired by sqlite-s3vfs and sqlite-s3-query.

Notes about journal mode

This VFS will only work when the DB file is in any journal mode that is not WAL. However, it will work if you set the journal mode to something else just before uploading the file to S3. You can (and probably should) use WAL mode to generate the DB. Then you can change the journal mode (and the page size if you neeed) before uploading it to S3.

The test suite includes tests for that use case. Take into account that the page size can't be changed when the database is in WAL mode. You need to change it before setting the WAL mode or by setting the database to rollback journal mode. You need to execute VACUUM; after changing the page size in a SQLite database.

Example usage

import s3sqlite
import apsw

sovfs = s3sqlite.SmartOpenVFS(name="so-vfs")


# Define the S3 location
key_prefix = "mybucket/awesome.sqlite3"

# Let's assume awesome.sqlite3 is uploaded to s3

# Create a database and query it
with apsw.Connection(
    key_prefix, vfs=sovfs.name, flags=apsw.SQLITE_OPEN_READONLY
) as conn:

    cursor = conn.execute("...")
    print(cursor.fetchall())

Installation

python3 -m pip install s3sqlite

Run tests

The testing script will use the Chinook database, it will modify (and VACUUM;) the file to use all the possible combinations of journal modes and page sizes

  1. Download the chinook database:
curl https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite_AutoIncrementPKs.sqlite -o chinook.sqlite3
  1. Make sure you have Docker installed.

The testing script will start a MinIO container to run the tests locally. After the tests finish, the container will be stopped atuomatically.

  1. Run the tests:
python3 -m pytest test.py

Alternatives

  • s3sqlite: The exact same package as this except without smart_open support.
  • sqlite-s3vfs: This VFS stores the SQLite file as separate DB pages. This enables having a single writer without having to overwrite the whole file. s3sqlite's main difference is that this just needs uploading a single file to S3. sqlite-s3vfs will split the database in pages and upload the pages separately to a bucket prefix. Having just a single file has some advantages, like making use of object versioning in the bucket. I also think that relying on s3fs makes the VFS more flexible than calling boto3 as sqlite3-s3vfs does. s3fs should also handle retries automatically.
  • sqlite-s3-query: This VFS is very similar to s3sqlit, but it uses ctypes directly to create the VFS and uses httpx to make requests to S3.

I decided to create a new VFS that didn't require using ctypes so that it's easier to understand and maintain, but I still want to have a single file in S3 (vs. separate DB pages). At the same time, by using s3f3 I know I can use any S3 storage supported by that library.

Other

The Chinook database used for testing can be obtained from: https://github.com/lerocha/chinook-database/

The testing section in this README contains a command you can run to get the file.

License

Distributed under the Apache 2.0 license. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sosqlite-0.3.1.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sosqlite-0.3.1-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file sosqlite-0.3.1.tar.gz.

File metadata

  • Download URL: sosqlite-0.3.1.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.12

File hashes

Hashes for sosqlite-0.3.1.tar.gz
Algorithm Hash digest
SHA256 af73c3f429989f778b29bb862b292c952b16675e3a4f107c42cd9bf3210895f4
MD5 f0866a38b8a6d74eb4361c9e0228b490
BLAKE2b-256 3f8a4683047295d4773973a6ccc35a699233c0cbfcae527586490f6bbc241789

See more details on using hashes here.

File details

Details for the file sosqlite-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: sosqlite-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.12

File hashes

Hashes for sosqlite-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9541a2086f395180bd776a881866265b2a00936a2c93f601d07b65589180783b
MD5 2fadb5137a46845b1c4a09eea99b7b70
BLAKE2b-256 b4af731155d10647be31e0db2546564cf485c1d5338ca0b388822eb3a40ad31e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page