Skip to main content

Manages data storage for DCOR

Project description

PyPI Version Build Status Coverage Status

This plugin manages how data are stored in DCOR. There are two types of files in DCOR:

  1. Resources uploaded by users, imported from figshare, or imported from a data archive

  2. Ancillary files that are generated upon resource creation, such as condensed DC data, preview images (see ckanext-dc_view).

This plugin implements:

  • A background job that uploads resources to S3 in after_resource_create if the resources were previously uploaded via the legacy upload route. This functionality is interesting for legacy setups that are not using S3 object storage by default.

  • Import datasets from figshare. Existing datasets on figshare are parsed, datasets are created and resources uploaded to S3 object storage. When running the following command, the “figshare-import” organization is created and the datasets listed in figshare_dois.txt are added to CKAN:

    ckan dcor-import-figshare --limit 2
  • CLI for running all background jobs (migration to S3):

    ckan run-jobs-dcor-depot
  • CLI for appending a resource to a dataset

    ckan append-resource /path/to/file dataset_id --delete-source
  • CLI for migrating data from block storage to an S3-compatible object storage service. For this, the following configuration keys must be specified in the ckan.ini file:

    dcor_object_store.access_key_id = ACCESS_KEY_ID
    dcor_object_store.secret_access_key = SECRET_ACCESS_KEY
    dcor_object_store.endpoint_url = S3_ENDPOINT_URL
    dcor_object_store.ssl_verify = true
    # The bucket name is by default defined by the circle ID. Resources
    # are stored in the "RES/OUR/CEID-SCHEME" in that bucket.
    dcor_object_store.bucket_name = circle-{organization_id}

    Usage:

    ckan dcor-migrate-resources-to-object-store --modified-days 2 --delete-after-migration --verify-checksum
  • CLI for listing all S3 objects for a dataset:

    ckan dcor-list-s3-objects-for-dataset c7a98a04-4e0a-98a7-fb0b-eca379d1f219
  • CLI for listing all resources:

    ckan list-all-resources
  • CLI for pruning stale multipart uploads:

    ckan dcor-prune-stale-multipart-uploads --initiated-before-days 5 --dry-run

Installation

pip install ckanext-dcor_depot

Add this extension to the plugins and default_views in ckan.ini:

ckan.plugins = [...] dcor_depot
ckan.storage_path=/data/ckan-HOSTNAME
ckanext.dcor_depot.users_depot_name=users-HOSTNAME

This plugin stores resources to /data:

mkdir -p /data/depots/users-$(hostname)
chown -R www-data /data/depots/users-$(hostname)

Testing

If CKAN/DCOR is installed and setup for testing, this extension can be tested with pytest:

pytest ckanext

Testing is implemented via GitHub Actions. You may also set up a local docker container with CKAN and MinIO. Take a look at the GitHub Actions workflow for more information.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckanext_dcor_depot-1.0.5.tar.gz (4.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ckanext_dcor_depot-1.0.5-py3-none-any.whl (4.3 MB view details)

Uploaded Python 3

File details

Details for the file ckanext_dcor_depot-1.0.5.tar.gz.

File metadata

  • Download URL: ckanext_dcor_depot-1.0.5.tar.gz
  • Upload date:
  • Size: 4.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for ckanext_dcor_depot-1.0.5.tar.gz
Algorithm Hash digest
SHA256 8c6c039940370b92923831f7f676efd74857d5619ab6ee463e63490e8f0328fe
MD5 876ebaa44cf7586d987be32dcc0a6426
BLAKE2b-256 9a83e6ccebe4a52bcba7bae202965b6bcda9ef9cdb58f141be5d1da9cdb28447

See more details on using hashes here.

File details

Details for the file ckanext_dcor_depot-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for ckanext_dcor_depot-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 682af6ce7c32ff280c835bfde17168825f2b070a3612c8c5bd505a091e3a80fe
MD5 0209afa25d726864f61a5443539c0e42
BLAKE2b-256 5271be236b38e0407254a4670e89e3888ac48c3617936e5f25461c8f8dc8d20b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page