Tool for validation, encryption and upload of MV submissions to GDCs.
Project description
GRZ CLI
A command-line tool for validating, encrypting, uploading and downloading submissions to/from a GDC/GRZ (Genomrechenzentrum).
Table of Contents
- Introduction
- Features
- Installation
- Usage
- Command-Line Interface
- Testing
- Contributing
- License
- Acknowledgements
Introduction
This tool provides a way to validate files, encrypt/decrypt files using the crypt4gh library and upload/download the encrypted files to an S3 bucket of a GDC/GRZ. It also logs the progress and outcomes of these operations in a metadata file.
It is recommended to have the following folder structure for a single submission:
EXAMPLE_SUBMISSION
├── files
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_normal.read1.fastq.gz
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_normal.read2.fastq.gz
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_normal.vcf
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.read1.fastq.gz
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.read2.fastq.gz
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.vcf
│ ├── target_regions.bed
└── metadata
└── metadata.json
The current version of the tool requires the working_dir to have at least as much free disk space as the total size of the data being submitted.
Features
- Validation: Validate file checksums, basic file metadata and BfArM requirements.
- Encryption: Encrypt files using
crypt4gh. - Decryption: Encrypt files using
crypt4gh. - Upload: Upload encrypted files directly to a GRZ either (via built-in
boto3). - Download: Download encrypted files from a GRZ (via built-in
boto3). - Logging: Log progress and results of operations
Installation
Requirements
Beside of the disk space requirements for the submission data, this tool also requires a linux environment, e.g.:
- Linux server
- Virtual machine running linux
- Docker container
- Windows subsystem for linux
- ...
End-user setup
The recommended method to install this tool is using the conda package manager.
Installation via conda (recommended)
If conda is not yet available on your system, we recommend to install the Miniforge conda distribution by running the following commands:
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
There are also alternative ways to install conda:
- Micromamba, a single executable that does not require a base environment
- Official installation instructions
Next, install the grz-cli tool:
# create conda environment and activate it
conda create -n grz-tools -c conda-forge -c bioconda "grz-cli"
conda activate grz-tools
Update instructions
Use the following command to update the tool:
conda update -n grz-tools -c conda-forge -c bioconda grz-cli
Installation via pip (not recommended)
While installation via pip is possible, it is not recommended because users must ensure
that the correct Python version is already installed and that they are using a virtual python environment.
pip install grz-cli
Update instructions:
Use the following command to update the tool:
pip upgrade grz-cli
Docker
Docker images are available via biocontainers at https://biocontainers.pro/tools/grz-cli.
The build process can take at least a few days after the Bioconda release, so double-check that the latest version in Bioconda is also the latest Docker image version.
Usage
Configuration
The configuration file will be provided by your associated GRZ, please place it into ~/.config/grz-cli/config.yaml.
The tool requires a configuration file in YAML format to specify the S3 bucket and other options. For an exemplary configuration, see resources/config.yaml.
S3 access and secret key can be listed either in the config file or as environment variable (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY).
Exemplary submission procedure
After preparing your submission as outlined above, you can use the following commands to validate, encrypt and upload the submission:
# Validate the submission
grz-cli validate --submission-dir EXAMPLE_SUBMISSION
# Encrypt the submission
grz-cli encrypt --submission-dir EXAMPLE_SUBMISSION
# Upload the submission
grz-cli upload --submission-dir EXAMPLE_SUBMISSION
Troubleshooting
In case of issues, please re-run your commands with grz-cli --log-level DEBUG --log-file <your-log-file.log> [...] and submit the log file to the GRZ data steward!
Command-Line Interface
grz-cli provides a command-line interface with the following subcommands:
validate
It is recommended to run this command before continuing with encryption and upload. Progress files are stored relative to the submission directory.
--submission-dir: Path to the submission directory containing both 'metadata/' and 'files/' directories [Required]
Example usage:
grz_cli validate --submission-dir foo
encrypt
If a working directory is not provided, then the current directory is used automatically. The log-files are going to be stored in the sub-folder of the working directory.
Files are stored in a folder named encrypted_files as a sub-folder of the working directory.
-s, --submission-dir: Path to the submission directory containing both 'metadata/' and 'files/' directories [Required]-c, --config-file: Path to config file [optional]
grz-cli encrypt --submission-dir foo
upload
Upload the submission into a S3 structure of a GRZ.
-s, --submission-dir: Path to the submission directory containing both 'metadata/' and 'encrypted_files/' directories [Required]-c, --config-file: Path to config file [optional]
Example usage:
grz-cli upload --submission-dir foo
Testing
Please note that binary files used for testing are managed with Git LFS, which will be needed to clone them locally with the git repository.
To run the tests, navigate to the root directory of your project and invoke pytest.
Alternatively, install uv and tox and run uv run tox.
Contributing
License
This project is licensed under the MIT License — see the LICENSE file for details.
Acknowledgements
Parts of crypt4gh code is used in modified form
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file grz_cli-1.1.1.tar.gz.
File metadata
- Download URL: grz_cli-1.1.1.tar.gz
- Upload date:
- Size: 47.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d32764ea00f20637daa298fb934619d4d82709498a986f05231459b568f7e3b
|
|
| MD5 |
81c8617724db5ec9f4a5851b33087792
|
|
| BLAKE2b-256 |
297c8d70743c87a91c03dd0683ce3b1bd79c7008bee93db7fbb2bfc23e20f323
|
Provenance
The following attestation bundles were made for grz_cli-1.1.1.tar.gz:
Publisher:
pypi.yml on BfArM-MVH/grz-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
grz_cli-1.1.1.tar.gz -
Subject digest:
2d32764ea00f20637daa298fb934619d4d82709498a986f05231459b568f7e3b - Sigstore transparency entry: 305895623
- Sigstore integration time:
-
Permalink:
BfArM-MVH/grz-tools@728168875c07659ba82ea7d0096c6cef94338099 -
Branch / Tag:
refs/tags/grz-cli-v1.1.1 - Owner: https://github.com/BfArM-MVH
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@728168875c07659ba82ea7d0096c6cef94338099 -
Trigger Event:
push
-
Statement type:
File details
Details for the file grz_cli-1.1.1-py3-none-any.whl.
File metadata
- Download URL: grz_cli-1.1.1-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6cd64ca303b67d26466542805c05c65aaaa72db7f762cd928dbcaf9f196fd096
|
|
| MD5 |
208498a605dc497dbce111a4930daf68
|
|
| BLAKE2b-256 |
f23736bad9542eeb0fdbacfd224d0769c40b9a4e1ac2708b203571ffd4f963b9
|
Provenance
The following attestation bundles were made for grz_cli-1.1.1-py3-none-any.whl:
Publisher:
pypi.yml on BfArM-MVH/grz-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
grz_cli-1.1.1-py3-none-any.whl -
Subject digest:
6cd64ca303b67d26466542805c05c65aaaa72db7f762cd928dbcaf9f196fd096 - Sigstore transparency entry: 305895630
- Sigstore integration time:
-
Permalink:
BfArM-MVH/grz-tools@728168875c07659ba82ea7d0096c6cef94338099 -
Branch / Tag:
refs/tags/grz-cli-v1.1.1 - Owner: https://github.com/BfArM-MVH
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@728168875c07659ba82ea7d0096c6cef94338099 -
Trigger Event:
push
-
Statement type: