Skip to main content

GHGA Connector - A CLI client application for interacting with the GHGA system.

Project description

tests Coverage Status

Ghga Connector

GHGA Connector - A CLI client application for interacting with the GHGA system.

Description

The GHGA Connector is a command line client facilitating interaction with the file storage infrastructure of GHGA. To this end, it provides commands for the up- and download of files that interact with the RESTful APIs exposed by the Upload Controller Service (https://github.com/ghga-de/upload-controller-service) and Download Controller Service (https://github.com/ghga-de/download-controller-service), respectively.

When uploading, the Connector expects an unencrypted file that is subsequently encrypted according to the Crypt4GH standard (https://www.ga4gh.org/news_item/crypt4gh-a-secure-method-for-sharing-human-genetic-data/) and only afterwards uploaded to the GHGA storage infrastructure.

When downloading, the resulting file is still encrypted in this manner and can be decrypted using the Connector's decrypt command. As the user is expected to download multiple files, this command takes a directory location as input and an optional output directory location can be provided, creating the directory if it does not yet exist (defaulting to the current working directory, if none is provided).

Most of the commands need the submitter's private key that matches the public key announced to GHGA. The private key is used for file encryption in the upload path and decryption of the work package access and work order tokens during download. Additionally, the decrypt command needs the private key to decrypt the downloaded file.

Installation

We recommend using the provided Docker container.

A pre-build version is available at docker hub:

docker pull ghga/ghga-connector:0.3.5

Or you can build the container yourself from the ./Dockerfile:

# Execute in the repo's root dir:
docker build -t ghga/ghga-connector:0.3.5 .

For production-ready deployment, we recommend using Kubernetes, however, for simple use cases, you could execute the service using docker on a single server:

# The entrypoint is preconfigured:
docker run -p 8080:8080 ghga/ghga-connector:0.3.5 --help

If you prefer not to use containers, you may install the service from source:

# Execute in the repo's root dir:
pip install .

# To run the service:
ghga_connector --help

Configuration

Parameters

The service requires the following configuration parameters:

  • upload_api (string): URL to the root of the upload controller API. Default: https://hd-dev.ghga-dev.de/ucs.

  • download_api (string): URL to the root of the DRS-compatible API used for download. Default: https://hd-dev.ghga-dev.de/drs3/ga4gh/drs/v1.

  • max_retries (integer): Number of times to retry failed API calls. Default: 5.

  • max_wait_time (integer): Maximal time in seconds to wait before quitting without a download. Default: 3600.

  • part_size (integer): The part size to use for download. Default: 16777216.

  • server_pubkey (string): Base64 encoded current GHGA public key for Crypt4GH encryption.

  • wps_api_url (string): URL to the root of the WPS API.

Usage:

A template YAML for configurating the service can be found at ./example-config.yaml. Please adapt it, rename it to .ghga_connector.yaml, and place it into one of the following locations:

  • in the current working directory were you are execute the service (on unix: ./.ghga_connector.yaml)
  • in your home directory (on unix: ~/.ghga_connector.yaml)

The config yaml will be automatically parsed by the service.

Important: If you are using containers, the locations refer to paths within the container.

All parameters mentioned in the ./example-config.yaml could also be set using environment variables or file secrets.

For naming the environment variables, just prefix the parameter name with ghga_connector_, e.g. for the host set an environment variable named ghga_connector_host (you may use both upper or lower cases, however, it is standard to define all env variables in upper cases).

To using file secrets please refer to the corresponding section of the pydantic documentation.

Architecture and Design:

This is a Python-based client enabling interaction with GHGA's file services. Contrary to the design of the actual services, the client does not follow the triple-hexagonal architecture. The client is roughly structured into three parts:

  1. A command line interface using typer is provided at the highest level of the package, i.e. directly within the ghga_connector directory.
  2. Functionality dealing with intermediate transformations, delegating work and handling state is provided within the core module.
  3. core.api_calls provides abstractions over S3 and work package service interactions.

Development

For setting up the development environment, we rely on the devcontainer feature of vscode in combination with Docker Compose.

To use it, you have to have Docker Compose as well as vscode with its "Remote - Containers" extension (ms-vscode-remote.remote-containers) installed. Then open this repository in vscode and run the command Remote-Containers: Reopen in Container from the vscode "Command Palette".

This will give you a full-fledged, pre-configured development environment including:

  • infrastructural dependencies of the service (databases, etc.)
  • all relevant vscode extensions pre-installed
  • pre-configured linting and auto-formating
  • a pre-configured debugger
  • automatic license-header insertion

Moreover, inside the devcontainer, a convenience commands dev_install is available. It installs the service with all development dependencies, installs pre-commit.

The installation is performed automatically when you build the devcontainer. However, if you update dependencies in the ./setup.cfg or the ./requirements-dev.txt, please run it again.

License

This repository is free to use and modify according to the Apache 2.0 License.

Readme Generation

This readme is autogenerate, please see readme_generation.md for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghga_connector-0.3.5.tar.gz (40.2 kB view details)

Uploaded Source

Built Distribution

ghga_connector-0.3.5-py3-none-any.whl (64.0 kB view details)

Uploaded Python 3

File details

Details for the file ghga_connector-0.3.5.tar.gz.

File metadata

  • Download URL: ghga_connector-0.3.5.tar.gz
  • Upload date:
  • Size: 40.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for ghga_connector-0.3.5.tar.gz
Algorithm Hash digest
SHA256 da3bbdbcfdd0a380d7e963bcede1ac2a97a4577e723424d2e7d9cb19fdd2952c
MD5 fb5d45b76fa17bda30aaf8b47506d91d
BLAKE2b-256 08663b737d3075944dea665ede7e57c5f72744e27f7eb0e29be65ef3b5116722

See more details on using hashes here.

File details

Details for the file ghga_connector-0.3.5-py3-none-any.whl.

File metadata

File hashes

Hashes for ghga_connector-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 230139a323c9379ffee0528ad3b09667cacce490d76f72c2d6e1835b62c8d6fc
MD5 1e131f6713f1ffe06750929f8b2ae39f
BLAKE2b-256 6601fe2a74248e272426d34a8a77e200ba0a2e5a157410b8433349e8c46c1aa3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page