GHGA Data Steward Kit - A utils package for GHGA data stewards.
Project description
GHGA Data Steward Kit
Utilities for data stewards interacting with GHGA infrastructure.
Installation:
This package can be installed using pip:
pip install ghga-datasteward-kit
Usage:
An overview of all commands is provided using:
ghga-datasteward-kit --help
The following paragraphs provide additional help for using the different commands:
files (batch-)upload
This command facilitates encrypting files using Crypt4GH and uploading the encrypted content to a (remote) S3-compatible object storage. This process consists of multiple steps:
- Generate a unique file id
- Create unencrypted file checksum
- Encrypt file
- Extract file secret and remove Crypt4GH envelope
- Upload encrypted file content
- Download encrypted file content, decrypt and verify checksum
- Write file/upload information to output file
The user needs to provide a config yaml containing information as described here.
An overview of important information about each the upload is written to a file called <alias>.json in the output directory.
It contains the following information:
- The file alias
- A unique identifier for the file
- The local file path
- A SHA256 checksum over the unencrypted content
- MD5 checksums over all encrypted file parts
- SHA256 checksums over all encrypted file parts
- The file encryption/decryption secret
Attention: Keep this output file in a safe, private location. If this file is lost, the uploaded file content becomes inaccessible.
metadata
The metadata label groups metadata related commands.
Some of them require a configuration file as described here.
load
The load command makes files and metadata available to user in the running system.
It needs a configuration parameters as described here.
generate-catalog-accessions
A command for generating accessions for the metadata catalog. Accessions will be stored in a text file.
Development
For setting up the development environment, we rely on the devcontainer feature of vscode in combination with Docker Compose.
To use it, you have to have Docker Compose as well as vscode with its "Remote - Containers" extension (ms-vscode-remote.remote-containers
) installed.
Then open this repository in vscode and run the command
Remote-Containers: Reopen in Container
from the vscode "Command Palette".
This will give you a full-fledged, pre-configured development environment including:
- infrastructural dependencies (databases, etc.)
- all relevant vscode extensions pre-installed
- pre-configured linting and auto-formating
- a pre-configured debugger
- automatic license-header insertion
If you prefer not to use vscode, you could get a similar setup (without the editor specific features) by running the following commands:
# Execute in the repo's root dir:
cd ./.devcontainer
# build and run the environment with docker-compose
docker-compose up
# attach to the main container:
# (you can open multiple shell sessions like this)
docker exec -it devcontainer_app_1 /bin/bash
License
This repository is free to use and modify according to the Apache 2.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ghga_datasteward_kit-0.6.0.tar.gz
.
File metadata
- Download URL: ghga_datasteward_kit-0.6.0.tar.gz
- Upload date:
- Size: 27.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88cbbbc8807a4d6325ea2b82e77ca2fbfcfda97d77898fb3ac2feba4d6ac0f4f |
|
MD5 | d1548779c27ab3ba166b4efe68ea923d |
|
BLAKE2b-256 | b830607cebd96a30c71b745b00fba5a23c45ddb8355c3962defeb5393d1f143b |
File details
Details for the file ghga_datasteward_kit-0.6.0-py3-none-any.whl
.
File metadata
- Download URL: ghga_datasteward_kit-0.6.0-py3-none-any.whl
- Upload date:
- Size: 38.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f786ab57143d090c0e94d2c4ee5ec48e543f538fa53c0af3fab8422f57ed07b6 |
|
MD5 | 8d07cfc86fe8a03fa43893fc2db797f2 |
|
BLAKE2b-256 | 2eeebdc1c0f5e06e1e665c7532829166cf86c743eaa9664e7ae88f4680609f22 |