Skip to main content

Tool for validation, encryption and upload of MV submissions to GDCs.

Project description

GRZ CLI

A command-line tool for validating, encrypting, uploading and downloading submissions to/from a GDC/GRZ (Genomrechenzentrum).

Table of Contents

Introduction

This tool provides a way to validate files, encrypt/decrypt files using the crypt4gh library and upload/download the encrypted files to an S3 bucket of a GDC/GRZ. It also logs the progress and outcomes of these operations in a metadata file.

It is recommended to have the following folder structure for a single submission:

EXAMPLE_SUBMISSION
├── files
│   ├── aaaaaaaa00000000aaaaaaaa00000000_blood_normal.read1.fastq.gz
│   ├── aaaaaaaa00000000aaaaaaaa00000000_blood_normal.read2.fastq.gz
│   ├── aaaaaaaa00000000aaaaaaaa00000000_blood_normal.vcf
│   ├── aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.read1.fastq.gz
│   ├── aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.read2.fastq.gz
│   ├── aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.vcf
│   ├── target_regions.bed
└── metadata
    └── metadata.json

The current version of the tool requires the working_dir to have at least as much free disk space as the total size of the data being submitted.

Features

  • Validation: Validate file checksums, basic file metadata and BfArM requirements.
  • Encryption: Encrypt files using crypt4gh.
  • Decryption: Encrypt files using crypt4gh.
  • Upload: Upload encrypted files directly to a GRZ either (via built-in boto3).
  • Download: Download encrypted files from a GRZ (via built-in boto3).
  • Logging: Log progress and results of operations

Installation

Requirements

Beside of the disk space requirements for the submission data, this tool also requires a linux environment, e.g.:

  • Linux server
  • Virtual machine running linux
  • Docker container
  • Windows subsystem for linux
  • ...

End-user setup

The recommended method to install this tool is using the conda package manager.

Installation via conda (recommended)

If conda is not yet available on your system, we recommend to install the Miniforge conda distribution by running the following commands:

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh

There are also alternative ways to install conda:

Next, install the grz-cli tool:

# create conda environment and activate it
conda create -n grz-tools -c conda-forge -c bioconda "grz-cli"
conda activate grz-tools
Update instructions:

Use the following command to update the tool:

conda update -n grz-tools "grz-cli"

Installation via pip (not recommended)

While installation via pip is possible, it is not recommended because users must ensure that the correct Python version is already installed and that they are using a virtual python environment.

pip install grz-cli
Update instructions:

Use the following command to update the tool:

pip upgrade grz-cli

Development setup

For development purposes, you can clone the repository and install the package in editable mode:

git clone https://codebase.helmholtz.cloud/grz-mv-genomseq/grz-cli
# create conda environment and activate it
conda env create -f grz-cli/environment-dev.yaml -n grz-tools-dev
conda activate grz-tools-dev
# install the grz-cli tool
pip install -e grz-cli/

Usage

Configuration

The configuration file will be provided by your associated GRZ, please place it into ~/.config/grz-cli/config.yaml.

The tool requires a configuration file in YAML format to specify the S3 bucket and other options. For an exemplary configuration, see resources/config.yaml.

S3 access and secret key can be listed either in the config file or as environment variable (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY).

Exemplary submission procedure

After preparing your submission as outlined above, you can use the following commands to validate, encrypt and upload the submission:

# Validate the submission
grz-cli validate --submission-dir EXAMPLE_SUBMISSION

# Encrypt the submission
grz-cli encrypt --submission-dir EXAMPLE_SUBMISSION

# Upload the submission
grz-cli upload --submission-dir EXAMPLE_SUBMISSION

Troubleshooting

In case of issues, please re-run your commands with grz-cli --log-level DEBUG --log-file <your-log-file.log> [...] and submit the log file to the GRZ data steward!

Command-Line Interface

grz-cli provides a command-line interface with the following subcommands:

validate

It is recommended to run this command before continuing with encryption and upload. Progress files are stored relative to the submission directory.

  • --submission-dir: Path to the submission directory containing both 'metadata/' and 'files/' directories [Required]

Example usage:

grz_cli validate --submission-dir foo

Option is for the usage at a hospital (Leistungserbringer) and GDC/GRZ.

encrypt

If a working directory is not provided, then the current directory is used automatically. The log-files are going to be stored in the sub-folder of the working directory. Files are stored in a folder named encrypted_files as a sub-folder of the working directory.

  • -s, --submission-dir: Path to the submission directory containing both 'metadata/' and 'files/' directories [Required]
  • -c, --config-file: Path to config file [optional]
grz-cli encrypt --submission-dir foo

Option is for the usage at a hospital (Leistungserbringer). Please approach your GDC/GRZ for a valid config file.

decrypt

Decrypt a submission using the GRZ private key.

  • -s, --submission-dir: Path to the submission directory containing both 'metadata/' and 'encrypted_files/' directories [Required]
  • -c,--config-file: Path to config file [optional]
grz-cli decrypt --submission-dir foo

Option is for the usage at a GDC/GRZ.

upload

Upload the submission into a S3 structure of a GRZ.

  • -s, --submission-dir: Path to the submission directory containing both 'metadata/' and 'encrypted_files/' directories [Required]
  • -c, --config-file: Path to config file [optional]

Example usage:

grz-cli upload --submission-dir foo

Option is for the usage at a hospital (Leistungserbringer). Please approach your GDC/GRZ for a valid config file.

download

Download a submission from a GRZ

  • -s, --submission-id: S3 submission prefix [Required]
  • -o, --output-dir: Path to the target submission output directory [Required]
  • -c, --config-file: Path to config file [optional]

Example usage:

grz-cli download --submission-id foo --output-dir bar

Option is for the usage at a GDC/GRZ.

Testing

To run the tests, navigate to the root directory of your project and invoke pytest. Alternatively, install uv and tox and run uv run tox.

Contributing

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Parts of cryp4gh code is used in modified form

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grz_cli-0.2.0.tar.gz (42.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grz_cli-0.2.0-py3-none-any.whl (36.9 kB view details)

Uploaded Python 3

File details

Details for the file grz_cli-0.2.0.tar.gz.

File metadata

  • Download URL: grz_cli-0.2.0.tar.gz
  • Upload date:
  • Size: 42.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for grz_cli-0.2.0.tar.gz
Algorithm Hash digest
SHA256 5068acd0c6c2b1e076902992e42c9f58e62414f3165e3c6f6ff6698f00be3582
MD5 d087ba68a40a465d3bcfaa18f943a5e8
BLAKE2b-256 ae2fc59a700a20c0109082218437f0ac9efd2e79eae6436d66d5ed17ecbe5c1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for grz_cli-0.2.0.tar.gz:

Publisher: pypi.yml on BfArM-MVH/grz-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file grz_cli-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: grz_cli-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 36.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for grz_cli-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e856f512778641617b04e6866f9db732b8ba1c4603609ad6b778d4f601ce03f2
MD5 383a0f001edeb9bd1ea76f7dbb17e3db
BLAKE2b-256 623a14feb4381f3bcb979b1c1cab1118b9efc7738e96d9a12eecf0044286ad68

See more details on using hashes here.

Provenance

The following attestation bundles were made for grz_cli-0.2.0-py3-none-any.whl:

Publisher: pypi.yml on BfArM-MVH/grz-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page