Skip to main content

S3 file storage support for Invenio.

Project description

oarepo-s3

image image image image

This package built on top of the invenio-s3 library offers integration with any AWS S3 REST API compatible object storage backend. In addition to the invenio-s3, it tries to minimize processing of file requests on the Invenio server side and uses direct access to S3 storage backend as much as possible (neither multipart file uploads, nor downloads are processed by Invenio server itself).

Instalation

To start using this library

  1. install the following packages in your project's venv:

    git clone https://github.com/CESNET/s3-client
    cd s3-client
    poetry install
    pip install oarepo-s3
    
  2. Create an S3 account and bucket on your S3 storage provider of choice.

  3. Put the S3 access configuration into your Invenio server config (e.g. invenio.cfg):

    INVENIO_S3_TENANT=None
    INVENIO_S3_ENDPOINT_URL='https://s3.example.org'
    INVENIO_S3_ACCESS_KEY_ID='your_access_key'
    INVENIO_S3_SECRET_ACCESS_KEY='your_secret_key'
    
  4. Create Invenio files location targetting the S3 bucket

    invenio files location --default 'default-s3' s3://oarepo-bucket
    

Usage

To use this library as an Invenio Files storage in your projects, put the following into your Invenio server config:

FILES_REST_STORAGE_FACTORY = 'oarepo_s3.storage.s3_storage_factory'

This storage overrides the save() method from the InvenioS3 storage and adds the possibility for direct S3 multipart uploads. Every other functionality is handled by underlying InvenioS3 storage library.

Direct multipart upload

To create a direct multipart upload to S3 backend, one should provide an instance of MultipartUpload instead of a usual stream when assigning a file to a certain record, e.g.:

from oarepo_s3.api import MultipartUpload
files = record.files  # Record instance FilesIterator
mu = MultipartUpload(key='filename',
                     base_uri=files.bucket.location.uri,
                     expires=3600,
                     size=1024*1024*1000)  # total file size

# Assigning a MultipartUpload to the FilesIterator here will
# trigger the multipart upload creation on the S3 storage backend.
files['test'] = mu

this will configure the passed in MultipartUpload instance with all the information needed by any uploader client to process and complete the upload. The multipart upload session configuration can be found under the MultipartUpload.session field.

To be able to complete or abort an ongoing multipart upload, after an uploader client finishes uploading all the parts to the S3 backend, one needs to register the provided resources from oarepo_s3.views in the app blueprints:

Create multipart upload

  • files/<key>/?multipart=True

Create presigned URL for part upload

  • files/<key>/<upload_id>/<part_num>/presigned

List uploaded parts for a given multipart upload

  • files/<key>/<upload_id>/parts

Complete a multipart upload

  • files/<key>/<upload_id>/complete

Abort an ongoing multipart upload

  • files/<key>/<upload_id>/abort

OARepo Records Draft integration

This library works best together with oarepo-records-draft library. When integrated into draft endpoints one doesn't need to manually register the completion resources to blueprints. Multipart upload creation is also handled automatically.

To setup a drafts integration, just run the following:

pip install oarepo-records-draft oarepo-s3

and configure draft endpoints according to the library's README. Doing so, will auto-register the following file API actions on the draft endpoints:

Create multipart upload

POST /draft/records/<pid>/files/?multipart=True
{
  "key": "filename.txt",
  "ctype": "text/plain",
  "size": 1024
}

Get presigned URL for part upload

GET /draft/records/<pid>/files/<key>/<upload_id>/<part_num>/presigned

List uploaded parts for a given multipart upload

GET /draft/records/<pid>/files/<key>/<upload_id>/parts

Complete multipart upload

POST /draft/records/<pid>/files/<key>/<upload_id>/complete
{
  "parts": [{"ETag": <uploaded_part_etag>, "PartNumber": <part_num>},...]
}

Abort multipart upload

DELETE /draft/records/<pid>/files/<key>/<upload_id>/abort

Tasks

This library provides a task that looks up the expired ongoing file uploads that could no longer be completed and removes them from the associated record's bucket, to use this task in your Celery cron schedule, configure it in your Invenio server config like this:

CELERY_BEAT_SCHEDULE = {
    'cleanup_expired_multipart_uploads': {
        'task': 'oarepo_s3.tasks.cleanup_expired_multipart_uploads',
        'schedule': timedelta(minutes=60),
    },
    ...
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oarepo-s3-1.4.5.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

oarepo_s3-1.4.5-py3-none-any.whl (30.4 kB view details)

Uploaded Python 3

File details

Details for the file oarepo-s3-1.4.5.tar.gz.

File metadata

  • Download URL: oarepo-s3-1.4.5.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for oarepo-s3-1.4.5.tar.gz
Algorithm Hash digest
SHA256 1bc8fcebec61e9982ee763ae047c6867f4305136f8bd5f36e0bbfff4d7a23167
MD5 b7f776c2ebb319d02df626709dcedd41
BLAKE2b-256 ea6340d4bbfa79b69244b6cd6f8df44bf5055e3ce5f6c479e6410d0b840843b4

See more details on using hashes here.

File details

Details for the file oarepo_s3-1.4.5-py3-none-any.whl.

File metadata

  • Download URL: oarepo_s3-1.4.5-py3-none-any.whl
  • Upload date:
  • Size: 30.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for oarepo_s3-1.4.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f677c9b5caafb056b4b48662ef8be5c8f556816f7c94c00bb60673839bfd7c1a
MD5 e95f456f6a69dc7ca6373e20b5c260ac
BLAKE2b-256 00e3ca0b05676e0acd87b6f7f078061ddb64dc8779b26a900f90e83438c5a2e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page