Skip to main content

S3 storage backend for Indico

Project description

S3 Storage Plugin

The S3 storage plugin allows Indico to store materials etc. on Amazon S3 or an S3-compatible object storage service (Ceph S3, Minio, etc.) instead of the local file system.

Warning

This plugin has only been tested with Ceph S3 so far. So if you encounter any problems using e.g. the real Amazon S3, please let us know!

It is currently used in production on multiple Indico instances, so we believe it is stable, but please be advised that we do not provide a way to move files back from S3 to local storage (but it would of course be possible to write a script for this).

Changelog

3.3

  • Support (and require) Python 3.12
  • Fix incorrect download filename formatting when using signed URLs or nginx proxying

3.2.2

  • Support Python 3.11

3.2.1

  • Stop using deprecated URL utils from werkzeug

3.2

  • Update translations

3.1.2

  • No technical changes, just fixing a mistake in the README change from 3.1.1

3.1.1

  • No technical changes, just adding the missing README to PyPI and updating the nginx config snippet to correctly work with the changes from 3.1 (avoiding an nginx bug)

3.1

  • Fix "invalid signature" S3 error in some cases when using proxy=nginx for downloads

3.0

  • Initial release for Indico 3.0

Configuration

Configuration is done using the STORAGE_BACKENDS entry of indico.conf; add a new key with a name of your choice (e.g. s3) and specify the details of the S3 storage in the value.

For a single bucket, all you need to specify is the bucket name:

STORAGE_BACKENDS = {
    # ...
    's3': 's3:bucket=indico-test'
}

If you want to dynamically create buckets for each year, month or week, you can do this as well. A task will automatically create new buckets a while before it will become active.

STORAGE_BACKENDS = {
    # ...
    's3': 's3-dynamic:bucket_template=indico-test-<year>,bucket_secret=somethingrandom'
}

For authentication and general S3 config (e.g. to use subdomains for bucket names), the preferred way is to use the standard files, i.e. ~/.aws/credentials and ~/.aws/config, but you can also specify all settings in the storage backend entry like this:

STORAGE_BACKENDS = {
    # ...
    's3': 's3:bucket=my-indico-test-bucket,access_key=12345,secret_key=topsecret'
}

Available config options

Multiple options can be specified by separating them with commas. These options are available:

  • host -- the host where S3 is running, in case you use a custom S3-compatible storage.
  • profile -- the name of a specific S3 profile (used in the ~/.aws/ config files)
  • access_key -- the S3 access key; should not be used in favor of ~/.aws/credentials
  • secret_key -- the S3 access key; should not be used in favor of ~/.aws/credentials
  • addressing_style -- the S3 addressing style (virtual or path); should not be used in favor of ~/.aws/config
  • bucket_policy_file -- the path to a file containing an S3 bucket policy; this only applies to new buckets created by this plugin
  • bucket_versioning -- whether to enable S3 versioning on the bucket; this only applies to new buckets created by this plugin
  • proxy -- whether to proxy downloads. If set to true, all files will be downloaded to memory and then sent to the client by Indico. This may have performance implications if you have large files. A better option is setting it to nginx, which requires some extra configuration (see below), but lets nginx handle proxying downloads transparently. If you do not use proxying at all, downloading a file redirects the user to a temporary S3 URL valid for a few minutes. Generally this works fine, but it may result in people accidentally copying (and forwarding) temporary links that expire quickly.
  • meta -- a custom string that is included in the bucket info API of the plugin. You generally do not need this unless you are using custom scripts accessing that API and want to include some extra data there.

When using the s3 backend (single static bucket), the following extra option is available:

  • bucket (required) -- the name of the bucket

When using the s3-dynamic backend, the following extra options are available:

  • bucket_template (required) -- a template specifying how the bucket names should be generated. Needs to contain at least one of <year>, <month> or <week>
  • bucket_secret (required unless set in aws config) -- a random secret used to make bucket names unguessable (as bucket names need to be globally unique on S3); may also be specified as indico_bucket_secret in ~/.aws/credentials

Proxying downloads through nginx

If you want to use the proxy=nginx option to avoid redirecting users to the actual S3 URL for file downloads without having the extra load and memory usage of downloading a (possibly large) attachment to memory first that comes with proxy=on, you need to add the following to the server block in your nginx config that is responsible for Indico.

location ~ ^/\.xsf/s3/(?<download_protocol>https?)/(?<download_host>[^/]+)/(?<download_path>.+)$ {
        internal;
        set $download_url $download_protocol://$download_host/$download_path;
        resolver YOUR_RESOLVER;
        proxy_set_header Host $download_host;
        proxy_set_header Authorization '';
        proxy_set_header Cookie '';
        proxy_hide_header X-Amz-Request-Id;
        proxy_hide_header Bucket;
        proxy_max_temp_file_size 0;
        proxy_intercept_errors on;
        error_page 301 302 307 = @s3_redirect;
        proxy_pass $download_url$is_args$args;
}

location @s3_redirect {
        internal;
        resolver YOUR_RESOLVER;
        set $saved_redirect_location '$upstream_http_location';
        proxy_pass $saved_redirect_location;
}

Replace YOUR_RESOLVER with the hostname or IP address of a nameserver nginx can use to resolve the S3 hosts. You may find a suitable IP in your /etc/resolv.conf or by asking someone from your IT department. If you are running a local caching nameserver, localhost would work as well.

If you are interested in how this works, check this blog post on which this config is based.

Migration of existing data

The plugin comes with a migration tool, accessible through the indico s3 migrate CLI. It can be used without downtime of your service as it consists of two steps - first copying the files, and then updating references in your database. Please have a look at its --help output if you want to use it; we did not have time to write detailed documentation for it yet.

The step that updates the database can be reversed in case you want to switch back from S3 to local storage for whatever reason, but it will only affect migrated files - any file stored directly on S3 later (and thus not present on the local file system), will not be reverted. You would need to write your own script that downloads those files from S3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

indico_plugin_storage_s3-3.3-py3-none-any.whl (44.0 kB view details)

Uploaded Python 3

File details

Details for the file indico_plugin_storage_s3-3.3-py3-none-any.whl.

File metadata

File hashes

Hashes for indico_plugin_storage_s3-3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c041eae15eb4ebd659b891d7149f575a60a908b8d0a8f77ef2032a82d000d908
MD5 401d59b9646a0fe0ae8e9948d0655c40
BLAKE2b-256 bb14931db6dcc13b7ea8d0beefa96219e00b204829ce4cd5d1738ea509ab10b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page