API and UI for bulk loading data into Datasette from a URL
Project description
datasette-load
API and UI for bulk loading data into Datasette from a URL
Installation
Install this plugin in the same environment as Datasette.
datasette install datasette-load
Configuration
This plugin does not require configuration - by default it downloads files to the system temp directory and swaps them into the current working directory once they have been verified as valid SQLite.
The plugin provides two optional settings to control which directories are used here:
plugins:
datasette-load:
staging_directory: /tmp
database_directory: /home/location
staging_directory is used for the initial download. Files will be deleted from here if the download fails.
If the download succeeds (and the database integrity check passes) the file will be moved into the database_directory folder. This defaults to the directory in which the Datasette application was started if you do not otherwise configure it.
To enable WAL mode on the database once it has been saved to the database_directory include the enable_wal: true option:
plugins:
datasette-load:
database_directory: /home/location
enable_wal: true
Usage
Users and API tokens with the datasette-load permission can visit /-/load where they can provide a URL to a SQLite database file and the name it should use within Datasette to trigger a download of that SQLite database.
You can assign that permission to the root user by starting Datasette like this:
datasette -s permissions.datasette-load.id root --root
Or with the following configuration in the datasette -c datasette.yaml file:
permissions:
datasette-load:
id: root
API tokens with that permission can use this API:
POST /-/load
{"url": "https://s3.amazonaws.com/til.simonwillison.net/tils.db", "name": "tils"}
You can optionally include additional HTTP headers to be used when fetching the URL:
POST /-/load
{
"url": "https://example.com/db.sqlite",
"name": "db",
"headers": {"Authorization": "Bearer XXX"}
}
This tells Datasette to download the SQLite database from the given URL and use it to create (or replace) the /tils database in the Datasette instance.
That API endpoint returns:
{
"id": "1D2A2328-199E-4D4D-AF3B-967131ADB795",
"url": "https://s3.amazonaws.com/til.simonwillison.net/tils.db",
"name": "tils",
"done": false,
"error": null,
"todo_bytes": 20250624,
"done_bytes": 0,
"status_url": "https://blah.datasette/-/load/status/1D2A2328-199E-4D4D-AF3B-967131ADB795"
}
The status_url can be polled for completion. It will return the same JSON format.
When the download has finished the API will return "done": true and either "error": null if it worked or "error": "error description" if something went wrong.
Zip support
The URL can point to either a SQLite database file or a zip file containing a SQLite database - if a zip file is provided, the largest file in the archive will be extracted and used (after verifying it is a valid SQLite database). For security, the plugin will reject zip files where the largest file would extract to more than 5x the size of the zip file itself.
Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd datasette-load
python -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
python -m pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datasette_load-0.1a4.tar.gz.
File metadata
- Download URL: datasette_load-0.1a4.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f10b20755aaa3bf4f54e11a9b06583e27ff8767cf9ae3cd84fb407ba65f40235
|
|
| MD5 |
a55546a5361935b804a280fbff4e8f5f
|
|
| BLAKE2b-256 |
210d0743a7bacd4f7d9c843c354394edfe2eaf27ff17d39ac87e33043a2d7e28
|
Provenance
The following attestation bundles were made for datasette_load-0.1a4.tar.gz:
Publisher:
publish.yml on datasette/datasette-load
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datasette_load-0.1a4.tar.gz -
Subject digest:
f10b20755aaa3bf4f54e11a9b06583e27ff8767cf9ae3cd84fb407ba65f40235 - Sigstore transparency entry: 628753668
- Sigstore integration time:
-
Permalink:
datasette/datasette-load@048904ace71b766a21dec05bb401b26d696c99b4 -
Branch / Tag:
refs/tags/0.1a4 - Owner: https://github.com/datasette
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@048904ace71b766a21dec05bb401b26d696c99b4 -
Trigger Event:
release
-
Statement type:
File details
Details for the file datasette_load-0.1a4-py3-none-any.whl.
File metadata
- Download URL: datasette_load-0.1a4-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e81a4b53c76e67632c777cd21d9f2f2fe0f99821e763359c43cb5080d3de367
|
|
| MD5 |
2989f3e4e35ddd3ac83b59d47bf66c86
|
|
| BLAKE2b-256 |
e3e80a635772ede44c6abd2a746150bf09b5725b5718d213d97619b3cd7af5b0
|
Provenance
The following attestation bundles were made for datasette_load-0.1a4-py3-none-any.whl:
Publisher:
publish.yml on datasette/datasette-load
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datasette_load-0.1a4-py3-none-any.whl -
Subject digest:
4e81a4b53c76e67632c777cd21d9f2f2fe0f99821e763359c43cb5080d3de367 - Sigstore transparency entry: 628753673
- Sigstore integration time:
-
Permalink:
datasette/datasette-load@048904ace71b766a21dec05bb401b26d696c99b4 -
Branch / Tag:
refs/tags/0.1a4 - Owner: https://github.com/datasette
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@048904ace71b766a21dec05bb401b26d696c99b4 -
Trigger Event:
release
-
Statement type: