File management for Datasette

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

datasette-files

File management for Datasette. Upload, serve, search and manage files through a pluggable storage backend system. Ships with built-in filesystem storage and a plugin hook for adding custom backends (S3, Google Cloud Storage, etc.).

Installation

Install this plugin in the same environment as Datasette.

datasette install datasette-files

Usage

datasette-files manages files through sources — named connections to storage backends. Each source has a slug, a storage type, and backend-specific configuration.

Configuring sources

Define sources in your datasette.yaml (or metadata.yaml) under the datasette-files plugin config:

plugins:
  datasette-files:
    sources:
      my-files:
        storage: filesystem
        config:
          root: /data/uploads

This creates a source called my-files backed by a local directory at /data/uploads. The directory will be created if it doesn't exist.

You can configure multiple sources:

plugins:
  datasette-files:
    sources:
      photos:
        storage: filesystem
        config:
          root: /data/photos
      documents:
        storage: filesystem
        config:
          root: /data/documents

Permissions

All access is denied by default. You must explicitly grant permissions in the permissions: block of your datasette.yaml.

There are four permission actions, each scoped to a source:

Action	Description
`files-browse`	Browse, search, view, and download files
`files-upload`	Upload files to a source
`files-edit`	Edit file metadata (e.g. search text)
`files-delete`	Delete files from a source

Grant access to everyone (all sources):

permissions:
  files-browse: true
  files-upload: true

Grant access to a specific user:

permissions:
  files-browse:
    id: alice
  files-upload:
    id: alice

Per-source permissions:

permissions:
  files-browse:
    public-files:
      allow: true
    private-files:
      allow:
        id: alice
  files-upload:
    public-files:
      allow:
        id: alice

Uploading files

Upload a file by sending a POST request with multipart form data to /-/files/upload/{source_slug}:

curl -X POST "http://localhost:8001/-/files/upload/my-files" \
  -F "file=@photo.jpg"

The response includes the file's unique ID and metadata:

{
  "file_id": "df-01j5a3b4c5d6e7f8g9h0jkmnpq",
  "filename": "photo.jpg",
  "content_type": "image/jpeg",
  "size": 48210,
  "url": "/-/files/df-01j5a3b4c5d6e7f8g9h0jkmnpq"
}

File IDs use the format df-{ULID} — the df- prefix makes them instantly recognizable when stored in database columns.

Viewing files

Each file has an HTML info page at /-/files/{file_id} showing its metadata, a preview (for images), and a download link.

Download the file content directly at /-/files/{file_id}/download.

Get file metadata as JSON at /-/files/{file_id}.json.

Searching files

Visit /-/files/search to search across all files you have permission to browse. The search page supports full-text search over filenames, content types, and custom search text.

The search endpoint is also available as JSON at /-/files/search.json?q=query&source=source-slug.

Each file has an editable search_text field (requires files-edit permission) that is included in the full-text search index. This can be used to add descriptions, tags, or transcriptions to make files more discoverable.

Batch metadata

Fetch metadata for multiple files in a single request:

GET /-/files/batch.json?id=df-abc123&id=df-def456

This returns metadata for all requested files that the current user has permission to browse. This endpoint is used internally by the render_cell web component to efficiently load file information for table views.

Listing sources

View all configured sources and their capabilities:

GET /-/files/sources.json

Table cell integration

Any database column containing a df-... file ID will automatically render as a rich file reference in Datasette's table views. The render_cell hook detects file IDs and replaces them with a <datasette-file> web component that displays the filename, content type, and a thumbnail for images.

This works for any text column — store a df-... ID returned from the upload endpoint in a column and it will render as a file link automatically.

API reference

Method	Endpoint	Description
`GET`	`/-/files/search`	Search files (HTML)
`GET`	`/-/files/search.json?q=&source=`	Search files (JSON)
`GET`	`/-/files/sources.json`	List configured sources
`GET`	`/-/files/batch.json?id=df-...&id=df-...`	Bulk file metadata
`POST`	`/-/files/upload/{source_slug}`	Upload a file (multipart)
`GET`	`/-/files/{file_id}`	File info page (HTML)
`GET`	`/-/files/{file_id}.json`	File metadata (JSON)
`GET`	`/-/files/{file_id}/download`	Download file content

Plugin hook: `register_files_storage_types`

datasette-files uses a plugin hook to allow other Datasette plugins to provide custom storage backends. This is how you would build plugins like datasette-files-s3 or datasette-files-gcs.

How it works

The hook is called at startup. Your plugin returns a list of Storage subclasses (not instances). datasette-files handles instantiation, configuration, and lifecycle management.

from datasette import hookimpl

@hookimpl
def register_files_storage_types(datasette):
    from my_plugin.storage import S3Storage
    return [S3Storage]

When a source in datasette.yaml references your storage type, datasette-files will:

Instantiate your class (calling S3Storage())
Call await storage.configure(config, get_secret) with the source's config dict
Use your storage instance for all file operations on that source

The `Storage` base class

Import the base class and supporting dataclasses from datasette_files.base:

from datasette_files.base import Storage, StorageCapabilities, FileMetadata

`StorageCapabilities`

A dataclass declaring what your storage backend supports:

@dataclass
class StorageCapabilities:
    can_upload: bool = False
    can_delete: bool = False
    can_list: bool = False
    can_generate_signed_urls: bool = False
    can_generate_thumbnails: bool = False
    requires_proxy_download: bool = False
    max_file_size: Optional[int] = None

can_upload: The backend can receive file uploads via receive_upload()
can_delete: The backend can delete files via delete_file()
can_list: The backend can list files via list_files()
can_generate_signed_urls: The backend can produce expiring download URLs via download_url() — if True, file downloads will use a 302 redirect to the signed URL instead of proxying content through Datasette
can_generate_thumbnails: The backend can produce thumbnail URLs via thumbnail_url()
requires_proxy_download: File content must be proxied through Datasette (e.g. filesystem storage) rather than redirecting to an external URL
max_file_size: Optional maximum file size in bytes

`FileMetadata`

Returned by several storage methods to describe a file:

@dataclass
class FileMetadata:
    path: str                              # Path within the storage backend
    filename: str                          # Human-readable filename
    content_type: Optional[str] = None     # MIME type
    content_hash: Optional[str] = None     # e.g. "sha256:abcdef..."
    size: Optional[int] = None             # Size in bytes
    width: Optional[int] = None            # Image width in pixels
    height: Optional[int] = None           # Image height in pixels
    created_at: Optional[str] = None
    metadata: dict = field(default_factory=dict)

Required methods

Every Storage subclass must implement these:

storage_type (property) — A unique string identifier for this storage type, used in source configuration. This is how datasette-files matches a source's storage: s3 to your class.

@property
def storage_type(self) -> str:
    return "s3"

capabilities (property) — Return a StorageCapabilities instance declaring what this backend supports.

@property
def capabilities(self) -> StorageCapabilities:
    return StorageCapabilities(
        can_upload=True,
        can_delete=True,
        can_generate_signed_urls=True,
    )

configure(config, get_secret) — Called once at startup with the source's config dict from datasette.yaml and a get_secret callable for retrieving secrets from datasette-secrets.

async def configure(self, config: dict, get_secret) -> None:
    self.bucket = config["bucket"]
    self.prefix = config.get("prefix", "")
    self.region = config.get("region", "us-east-1")

get_file_metadata(path) — Return a FileMetadata for the given path, or None if the file doesn't exist.

async def get_file_metadata(self, path: str) -> Optional[FileMetadata]:
    # Check if the file exists in your backend and return its metadata
    ...

read_file(path) — Return the full content of a file as bytes. Raise FileNotFoundError if missing.

async def read_file(self, path: str) -> bytes:
    # Read and return the file content
    ...

Optional methods

Override these based on the capabilities you declared:

receive_upload(path, content, content_type) — Store file content. Return a FileMetadata with at least the content_hash and size populated. Required if can_upload is True.

async def receive_upload(self, path: str, content: bytes, content_type: str) -> FileMetadata:
    # Store the file and return metadata
    ...

delete_file(path) — Delete a file. Required if can_delete is True.

list_files(prefix, cursor, limit) — List files, returning (files, next_cursor). Required if can_list is True.

download_url(path, expires_in) — Return a signed/expiring download URL. Required if can_generate_signed_urls is True.

stream_file(path) — Yield file content in chunks as an async iterator. The default implementation reads the entire file with read_file() and yields it as a single chunk.

thumbnail_url(path, width, height) — Return a URL for a thumbnail of the file, or None.

Full example: S3 storage plugin

Here's a complete example of what a datasette-files-s3 plugin would look like:

# datasette_files_s3/__init__.py
from datasette import hookimpl
from datasette_files.base import Storage, StorageCapabilities, FileMetadata
import boto3
import hashlib
from typing import Optional


class S3Storage(Storage):
    storage_type = "s3"
    capabilities = StorageCapabilities(
        can_upload=True,
        can_delete=True,
        can_list=True,
        can_generate_signed_urls=True,
        requires_proxy_download=False,
    )

    async def configure(self, config: dict, get_secret) -> None:
        self.bucket = config["bucket"]
        self.prefix = config.get("prefix", "")
        self.region = config.get("region", "us-east-1")
        self.client = boto3.client("s3", region_name=self.region)

    def _key(self, path: str) -> str:
        return f"{self.prefix}{path}" if self.prefix else path

    async def get_file_metadata(self, path: str) -> Optional[FileMetadata]:
        try:
            resp = self.client.head_object(
                Bucket=self.bucket, Key=self._key(path)
            )
            return FileMetadata(
                path=path,
                filename=path.split("/")[-1],
                content_type=resp.get("ContentType"),
                size=resp.get("ContentLength"),
            )
        except self.client.exceptions.ClientError:
            return None

    async def read_file(self, path: str) -> bytes:
        resp = self.client.get_object(
            Bucket=self.bucket, Key=self._key(path)
        )
        return resp["Body"].read()

    async def receive_upload(
        self, path: str, content: bytes, content_type: str
    ) -> FileMetadata:
        self.client.put_object(
            Bucket=self.bucket,
            Key=self._key(path),
            Body=content,
            ContentType=content_type,
        )
        content_hash = "sha256:" + hashlib.sha256(content).hexdigest()
        return FileMetadata(
            path=path,
            filename=path.split("/")[-1],
            content_type=content_type,
            content_hash=content_hash,
            size=len(content),
        )

    async def download_url(self, path: str, expires_in: int = 300) -> str:
        return self.client.generate_presigned_url(
            "get_object",
            Params={"Bucket": self.bucket, "Key": self._key(path)},
            ExpiresIn=expires_in,
        )

    async def delete_file(self, path: str) -> None:
        self.client.delete_object(
            Bucket=self.bucket, Key=self._key(path)
        )

    async def list_files(
        self, prefix: str = "", cursor: Optional[str] = None, limit: int = 100
    ) -> tuple[list[FileMetadata], Optional[str]]:
        kwargs = {
            "Bucket": self.bucket,
            "Prefix": self._key(prefix),
            "MaxKeys": limit,
        }
        if cursor:
            kwargs["ContinuationToken"] = cursor
        resp = self.client.list_objects_v2(**kwargs)
        files = [
            FileMetadata(
                path=obj["Key"].removeprefix(self.prefix),
                filename=obj["Key"].split("/")[-1],
                size=obj["Size"],
            )
            for obj in resp.get("Contents", [])
        ]
        next_cursor = resp.get("NextContinuationToken")
        return files, next_cursor


@hookimpl
def register_files_storage_types(datasette):
    return [S3Storage]

The plugin's pyproject.toml would register itself as a Datasette plugin:

[project.entry-points.datasette]
files_s3 = "datasette_files_s3"

Then configure it in datasette.yaml:

plugins:
  datasette-files:
    sources:
      product-images:
        storage: s3
        config:
          bucket: my-photos-bucket
          prefix: "uploads/"
          region: us-west-2

Built-in filesystem storage reference

The built-in FilesystemStorage stores files on the local filesystem. It supports upload, delete, and listing but does not support signed URLs — file downloads are proxied through Datasette.

Configuration options:

Key	Required	Description
`root`	Yes	Absolute path to the directory where files are stored
`max_file_size`	No	Maximum upload size in bytes

Capabilities:

Capability	Value
`can_upload`	`True`
`can_delete`	`True`
`can_list`	`True`
`can_generate_signed_urls`	`False`
`requires_proxy_download`	`True`

Development

To set up this plugin locally, first checkout the code. Run the tests with uv:

cd datasette-files
uv run pytest

Recommendation to run a test server:

./dev-server.sh

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

simonw

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1a3 pre-release

Mar 30, 2026

0.1a2 pre-release

Mar 23, 2026

This version

0.1a1 pre-release

Feb 20, 2026

0.1a0 pre-release

Feb 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasette_files-0.1a1.tar.gz (38.9 kB view details)

Uploaded Feb 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datasette_files-0.1a1-py3-none-any.whl (32.6 kB view details)

Uploaded Feb 20, 2026 Python 3

File details

Details for the file datasette_files-0.1a1.tar.gz.

File metadata

Download URL: datasette_files-0.1a1.tar.gz
Upload date: Feb 20, 2026
Size: 38.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datasette_files-0.1a1.tar.gz
Algorithm	Hash digest
SHA256	`c99de4974a86889297418c937368b9cd633d659f592dfd76830d8b17312b7bb3`
MD5	`29a131c997f4ff6aed3c7ffbd764e81a`
BLAKE2b-256	`362559dd70e0788263bad29f09127034d3a6dc442d7a9fac9dae384f291a0728`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datasette_files-0.1a1.tar.gz:

Publisher: publish.yml on datasette/datasette-files

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datasette_files-0.1a1.tar.gz
- Subject digest: c99de4974a86889297418c937368b9cd633d659f592dfd76830d8b17312b7bb3
- Sigstore transparency entry: 971887702
- Sigstore integration time: Feb 20, 2026
Source repository:
- Permalink: datasette/datasette-files@2d70fc0a93ee7c934d9a13c48b9b561feac2d0e8
- Branch / Tag: refs/tags/0.1a1
- Owner: https://github.com/datasette
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2d70fc0a93ee7c934d9a13c48b9b561feac2d0e8
- Trigger Event: release

File details

Details for the file datasette_files-0.1a1-py3-none-any.whl.

File metadata

Download URL: datasette_files-0.1a1-py3-none-any.whl
Upload date: Feb 20, 2026
Size: 32.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datasette_files-0.1a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ce058f5a4fac720ed343cec4ea9362b0f66cf374cd772cc1a9a865d68e98a347`
MD5	`c20a1022156dcd22892d142ac909ecba`
BLAKE2b-256	`57d8f8ef18b2d54f1720574b80bcd56ce5691f886b097723cd1faa734e9d5f88`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datasette_files-0.1a1-py3-none-any.whl:

Publisher: publish.yml on datasette/datasette-files

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datasette_files-0.1a1-py3-none-any.whl
- Subject digest: ce058f5a4fac720ed343cec4ea9362b0f66cf374cd772cc1a9a865d68e98a347
- Sigstore transparency entry: 971888203
- Sigstore integration time: Feb 20, 2026
Source repository:
- Permalink: datasette/datasette-files@2d70fc0a93ee7c934d9a13c48b9b561feac2d0e8
- Branch / Tag: refs/tags/0.1a1
- Owner: https://github.com/datasette
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2d70fc0a93ee7c934d9a13c48b9b561feac2d0e8
- Trigger Event: release

datasette-files 0.1a1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

datasette-files

Installation

Usage

Configuring sources

Permissions

Uploading files

Viewing files

Searching files

Batch metadata

Listing sources

Table cell integration

API reference

Plugin hook: register_files_storage_types

How it works

The Storage base class

StorageCapabilities

FileMetadata

Required methods

Optional methods

Full example: S3 storage plugin

Built-in filesystem storage reference

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Plugin hook: `register_files_storage_types`

The `Storage` base class

`StorageCapabilities`

`FileMetadata`