Skip to main content

GHGA Data Steward Kit - A utils package for GHGA data stewards.

Project description

GHGA Data Steward Kit

Utilities for data stewards interacting with GHGA infrastructure.

Installation:

This package can be installed using pip:

pip install ghga-datasteward-kit

Usage:

An overview of all commands is provided using:

ghga-datasteward-kit --help

The following paragraphs provide additional help for using the different commands:

s3-upload

This command facilitates encrypting files using Crypt4GH and uploading the encrypted content to a (remote) S3-compatible object storage. This process consists of multiple steps:

  1. Generate a unique file id
  2. Create unencrypted file checksum
  3. Encrypt file
  4. Extract file secret and remove Crypt4GH envelope
  5. Upload encrypted file content
  6. Download encrypted file content, decrypt and verify checksum
  7. Write file/upload information to output file

The user needs to provide a config yaml containing information as described here.

In addition to these general configuration options, each invocation of this script needs 2 additional parameters passed to the command line:

  1. The path to the file on the local file system
  2. A human readable alias for the file (choose a unique one)

An output file is written to the specified output directory under <alias>.json. If such a file already exists, an error is thrown.

The resulting file is owner read-only and contains the following information:

  1. The file alias
  2. A unique identifier for the file
  3. The local file path
  4. A SHA256 checksum over the unencrypted content
  5. MD5 checksums over all encrypted file parts
  6. SHA256 checksums over all encrypted file parts
  7. The file encryption/decryption secret

Attention: Keep this output file in a safe, private location. If this file is lost, the uploaded file content becomes inaccessible.

generate-catalog-accessions

A command for generating accessions for the metadata catalog. Accessions wiil be stored in a text file.

Development

For setting up the development environment, we rely on the devcontainer feature of vscode in combination with Docker Compose.

To use it, you have to have Docker Compose as well as vscode with its "Remote - Containers" extension (ms-vscode-remote.remote-containers) installed. Then open this repository in vscode and run the command Remote-Containers: Reopen in Container from the vscode "Command Palette".

This will give you a full-fledged, pre-configured development environment including:

  • infrastructural dependencies (databases, etc.)
  • all relevant vscode extensions pre-installed
  • pre-configured linting and auto-formating
  • a pre-configured debugger
  • automatic license-header insertion

If you prefer not to use vscode, you could get a similar setup (without the editor specific features) by running the following commands:

# Execute in the repo's root dir:
cd ./.devcontainer

# build and run the environment with docker-compose
docker-compose up

# attach to the main container:
# (you can open multiple shell sessions like this)
docker exec -it devcontainer_app_1 /bin/bash

License

This repository is free to use and modify according to the Apache 2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghga_datasteward_kit-0.4.7.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

ghga_datasteward_kit-0.4.7-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file ghga_datasteward_kit-0.4.7.tar.gz.

File metadata

  • Download URL: ghga_datasteward_kit-0.4.7.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for ghga_datasteward_kit-0.4.7.tar.gz
Algorithm Hash digest
SHA256 32fb7f0c003f669cb63ddcf2ff4ad804e3e1feb9130cc5a91022452f2415a617
MD5 ff9a66c84cf7e95f7bc96217a6359b7d
BLAKE2b-256 0cce04b0522e5a90259e43f7b290a3ee0da908b559a1f705d788280ca15438af

See more details on using hashes here.

File details

Details for the file ghga_datasteward_kit-0.4.7-py3-none-any.whl.

File metadata

File hashes

Hashes for ghga_datasteward_kit-0.4.7-py3-none-any.whl
Algorithm Hash digest
SHA256 0647b5ba4837f45f38689de5c002d9e395504e7878e66168818f7fd9db713fa8
MD5 3c02f28518f4a9b1bb387a41a9985a3d
BLAKE2b-256 498d45d6726a4d966d882b4bbdb1138ef571306cfe9aa5ed1f68deb0bbc2f045

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page