S3 file storage support for Invenio.
Project description
oarepo-s3
This package built on top of the invenio-s3 library offers integration with any AWS S3 REST API compatible object storage backend. In addition to the invenio-s3, it tries to minimize processing of file requests on the Invenio server side and uses direct access to S3 storage backend as much as possible (neither multipart file uploads, nor downloads are processed by Invenio server itself).
Instalation
To start using this library
-
install the following packages in your project's venv:
git clone https://github.com/CESNET/s3-client cd s3-client poetry install pip install oarepo-s3
-
Create an S3 account and bucket on your S3 storage provider of choice.
-
Put the S3 access configuration into your Invenio server config (e.g.
invenio.cfg
):INVENIO_S3_TENANT=None INVENIO_S3_ENDPOINT_URL='https://s3.example.org' INVENIO_S3_ACCESS_KEY_ID='your_access_key' INVENIO_S3_SECRET_ACCESS_KEY='your_secret_key'
-
Create Invenio files location targetting the S3 bucket
invenio files location --default 'default-s3' s3://oarepo-bucket
Usage
To use this library as an Invenio Files storage in your projects, put the following into your Invenio server config:
FILES_REST_STORAGE_FACTORY = 'oarepo_s3.storage.s3_storage_factory'
This storage overrides the save()
method from the InvenioS3
storage and adds
the possibility for direct S3 multipart uploads. Every other functionality
is handled by underlying InvenioS3
storage library.
Direct multipart upload
To create a direct multipart upload to S3 backend, one should provide an
instance of MultipartUpload
instead of a usual stream
when assigning
a file to a certain record, e.g.:
from oarepo_s3.api import MultipartUpload
files = record.files # Record instance FilesIterator
mu = MultipartUpload(key='filename',
base_uri=files.bucket.location.uri,
expires=3600,
size=1024*1024*1000) # total file size
# Assigning a MultipartUpload to the FilesIterator here will
# trigger the multipart upload creation on the S3 storage backend.
files['test'] = mu
this will configure the passed in MultipartUpload
instance with
all the information needed by any uploader client to process and
complete the upload. The multipart upload session configuration
can be found under the MultipartUpload.session
field.
To be able to complete or abort an ongoing multipart upload, after an
uploader client finishes uploading all the parts to the S3 backend,
one needs to register the provided resources from oarepo_s3.views
in
the app blueprints:
Create multipart upload
files/<key>/?multipart=True
Create presigned URL for part upload
files/<key>/<upload_id>/<part_num>/presigned
List uploaded parts for a given multipart upload
files/<key>/<upload_id>/parts
Complete a multipart upload
files/<key>/<upload_id>/complete
Abort an ongoing multipart upload
files/<key>/<upload_id>/abort
OARepo Records Draft integration
This library works best together with oarepo-records-draft library. When integrated into draft endpoints one doesn't need to manually register the completion resources to blueprints. Multipart upload creation is also handled automatically.
To setup a drafts integration, just run the following:
pip install oarepo-records-draft oarepo-s3
and configure draft endpoints according to the library's README. Doing so, will auto-register the following file API actions on the draft endpoints:
Create multipart upload
POST /draft/records/<pid>/files/?multipart=True
{
"key": "filename.txt",
"ctype": "text/plain",
"size": 1024
}
Get presigned URL for part upload
GET /draft/records/<pid>/files/<key>/<upload_id>/<part_num>/presigned
List uploaded parts for a given multipart upload
GET /draft/records/<pid>/files/<key>/<upload_id>/parts
Complete multipart upload
POST /draft/records/<pid>/files/<key>/<upload_id>/complete
{
"parts": [{"ETag": <uploaded_part_etag>, "PartNumber": <part_num>},...]
}
Abort multipart upload
DELETE /draft/records/<pid>/files/<key>/<upload_id>/abort
Tasks
This library provides a task that looks up the expired ongoing file uploads that could no longer be completed and removes them from the associated record's bucket, to use this task in your Celery cron schedule, configure it in your Invenio server config like this:
CELERY_BEAT_SCHEDULE = {
'cleanup_expired_multipart_uploads': {
'task': 'oarepo_s3.tasks.cleanup_expired_multipart_uploads',
'schedule': timedelta(minutes=60),
},
...
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file oarepo-s3-1.4.5.tar.gz
.
File metadata
- Download URL: oarepo-s3-1.4.5.tar.gz
- Upload date:
- Size: 23.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1bc8fcebec61e9982ee763ae047c6867f4305136f8bd5f36e0bbfff4d7a23167 |
|
MD5 | b7f776c2ebb319d02df626709dcedd41 |
|
BLAKE2b-256 | ea6340d4bbfa79b69244b6cd6f8df44bf5055e3ce5f6c479e6410d0b840843b4 |
File details
Details for the file oarepo_s3-1.4.5-py3-none-any.whl
.
File metadata
- Download URL: oarepo_s3-1.4.5-py3-none-any.whl
- Upload date:
- Size: 30.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f677c9b5caafb056b4b48662ef8be5c8f556816f7c94c00bb60673839bfd7c1a |
|
MD5 | e95f456f6a69dc7ca6373e20b5c260ac |
|
BLAKE2b-256 | 00e3ca0b05676e0acd87b6f7f078061ddb64dc8779b26a900f90e83438c5a2e8 |