Tool for validation, encryption and upload of MV submissions to GDCs.
Project description
grz-cli
A command-line tool for validating, encrypting, and uploading submissions to a genomDE Model Project GDC (Genome Data Center).
Table of Contents
Installation
Requirements
The current version of the tool requires the working directory to have at least as much free disk space as the total size of the data being submitted.
Beside of the disk space requirements for the submission data, this tool also requires a Linux environment. For example:
- Linux server
- Virtual machine running Linux
- Docker container
- Windows Subsystem for Linux
Using Conda (recommended)
If Conda is not yet available on your system, we recommend to install it through the Miniforge Conda installer by running the following commands:
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
Next, install the grz-cli tool:
conda create -n grz-tools -c conda-forge -c bioconda grz-cli
conda activate grz-tools
grz-cli --help
Updating
Use the following command to update the tool:
conda update -n grz-tools -c conda-forge -c bioconda grz-cli
Using pip (not recommended)
While installation via pip is possible, it is not recommended because users should create/manage a Python virtual environment and must ensure that the correct Python version is being used.
pip install grz-cli
Updating
Use the following command to update the tool:
pip upgrade grz-cli
Using Docker
Docker images are available via biocontainers at https://biocontainers.pro/tools/grz-cli.
The build process can take at least a few days after the Bioconda release, so double-check that the latest version in Bioconda is also the latest Docker image version.
Usage
Configuration
The configuration file will be provided by your associated GRZ. Do not create this file as an LE.
The tool requires a configuration file in YAML format to specify the S3 API parameters and other validation options.
This file may be placed at ~/.config/grz-cli/config.yaml or provided each time to grz-cli using the --config-file option on the command line.
The S3 secrets can either be directly within the config file or as defined with the usual AWS environment variables: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
Submission Layout
It is recommended to have the following folder structure for a single submission:
EXAMPLE_SUBMISSION
├── files
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_normal.read1.fastq.gz
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_normal.read2.fastq.gz
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_normal.vcf
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.read1.fastq.gz
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.read2.fastq.gz
│ ├── aaaaaaaa00000000aaaaaaaa00000000_blood_tumor.vcf
│ └─── target_regions.bed
└── metadata
└── metadata.json
The only requirements are that metadata/metadata.json exists and the files/ directory contains all of the other files.
Data files may be nested under subfolders inside files/ for better organization.
For example, each donor could have their own folder for files.
Example submission procedure
After preparing your submission as outlined above, you can use the following command to validate, encrypt and upload the submission:
grz-cli submit --submission-dir EXAMPLE_SUBMISSION
Troubleshooting
In case of issues, please re-run your commands with grz-cli --log-level DEBUG --log-file path/to/write/file.log [...] and submit the log file to the GDC data steward.
Command-Line Interface
grz-cli provides a command-line interface with the following subcommands:
submit
The submit command is the recommended command for submitting data to genome data centers.
It combines the validate, encrypt, and upload commands (see below).
-s, --submission-dir: Path to the submission directory containing both 'metadata/' and 'files/' directories [Required]-c, --config-file: Path to config file [optional]
grz-cli submit --submission-dir foo
validate
It is recommended to run this command before continuing with encryption and upload.
--submission-dir: Path to the submission directory containing both 'metadata/' and 'files/' directories [Required]
Example usage:
grz_cli validate --submission-dir foo
encrypt
If a working directory is not provided, then the current directory is used automatically.
Files are stored in a folder named encrypted_files as a sub-folder of the working directory.
-s, --submission-dir: Path to the submission directory containing both 'metadata/' and 'files/' directories [Required]-c, --config-file: Path to config file [optional]
grz-cli encrypt --submission-dir foo
upload
Upload the submission into a S3 structure of a GRZ.
-s, --submission-dir: Path to the submission directory containing bothmetadata/andencrypted_files/directories [Required]-c, --config-file: Path to config file [optional]
Example usage:
grz-cli upload --submission-dir foo
get-id
Available in grz-cli v1.2.0 or higher.
Compute and print the submission ID from a submission's JSON metadata.
This is useful in case you forget to store the ID printed during upload.
Example usage:
grz-cli get-id path/to/metadata.json
Contributing
Running unreleased/development versions
First, install uv.
We recommend using Conda or Pixi.
After cloning the desired branch of the grz-tools repo locally, you can run grz-cli directly from the repo using:
uv run --project path/to/cloned/grz-tools grz-cli --help
Testing
Please note that binary files used for testing are managed with Git LFS, which will be needed to clone them locally with the git repository.
To run the tests, navigate to the root directory of your project and invoke pytest.
Alternatively, install uv and tox and run uv run tox.
License
This project is licensed under the MIT License — see the LICENSE file for details.
Acknowledgements
Parts of Crypt4GH are used in modified form.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file grz_cli-1.5.1.tar.gz.
File metadata
- Download URL: grz_cli-1.5.1.tar.gz
- Upload date:
- Size: 49.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e1616b504759c52c454abb482e6bb924a482fdd60c4d5a2e1b70649b8f70ee9
|
|
| MD5 |
fd0528e379da093dca0180a9999e3827
|
|
| BLAKE2b-256 |
b8896f41cec47266b229900a7ae3bcc737295326c7e47ea8f78190cca4b65b63
|
Provenance
The following attestation bundles were made for grz_cli-1.5.1.tar.gz:
Publisher:
pypi.yml on BfArM-MVH/grz-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
grz_cli-1.5.1.tar.gz -
Subject digest:
6e1616b504759c52c454abb482e6bb924a482fdd60c4d5a2e1b70649b8f70ee9 - Sigstore transparency entry: 741067065
- Sigstore integration time:
-
Permalink:
BfArM-MVH/grz-tools@84396a3cefa9d5752e0fa7d11aec0b446066c8c4 -
Branch / Tag:
refs/tags/grz-cli-v1.5.1 - Owner: https://github.com/BfArM-MVH
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@84396a3cefa9d5752e0fa7d11aec0b446066c8c4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file grz_cli-1.5.1-py3-none-any.whl.
File metadata
- Download URL: grz_cli-1.5.1-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21a9a8d1c28f64c8ecf05ada61a1f0e366a5507f85d34da0329f9cdf17e220a0
|
|
| MD5 |
7e25648479a9db84aca6049d5099b4b7
|
|
| BLAKE2b-256 |
13a2f3dc066c712dc83e1edec7a5b64c6850c4fb7d66df5294d8da1ada5793c4
|
Provenance
The following attestation bundles were made for grz_cli-1.5.1-py3-none-any.whl:
Publisher:
pypi.yml on BfArM-MVH/grz-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
grz_cli-1.5.1-py3-none-any.whl -
Subject digest:
21a9a8d1c28f64c8ecf05ada61a1f0e366a5507f85d34da0329f9cdf17e220a0 - Sigstore transparency entry: 741067085
- Sigstore integration time:
-
Permalink:
BfArM-MVH/grz-tools@84396a3cefa9d5752e0fa7d11aec0b446066c8c4 -
Branch / Tag:
refs/tags/grz-cli-v1.5.1 - Owner: https://github.com/BfArM-MVH
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@84396a3cefa9d5752e0fa7d11aec0b446066c8c4 -
Trigger Event:
push
-
Statement type: