Skip to main content

Biohub data CLI

Project description

data-cli

CI Coverage PyPI Python

Command-line tool for downloading datasets published by CZ Biohub. Resolves a collection ID to its constituent datasets and downloads files from S3 and HTTP, with progress bars, size estimates, and dry-run accounting.

Installation

To install the OPS data CLI, run:

pip install biohub-data-cli

Quick start

See what a collection contains without downloading:

ops-data download collection <collection-id> --dry-run

Download a collection to the current directory:

ops-data download collection <collection-id>

Download multiple collections to a specific directory, skipping the prompt:

ops-data download collection <id-a> <id-b> -o ./data -y

Files land under <outdir>/<collection-slug>/<dataset-slug>/.

Commands

ops-data download collection IDS...

Download one or more collections by ID.

Option Description
-o, --outdir PATH Output directory. Defaults to ..
-y, --yes Skip the size-estimate confirmation prompt.
--dry-run Print per-dataset size statistics without downloading. Mutually exclusive with -y.

Dry run resolves every S3 URI (listing prefixes, heading objects) to report exact byte totals per dataset. HTTP URLs are not sized during dry run and surface as a warning in the summary.

Confirmation prompt shows the aggregate size estimate before any bytes move. Pass -y to skip it in scripts.

Failures are collected and reported at the end. The process exits non-zero if any download failed, but other downloads continue — one bad URL won't abort the run.

Development

This project uses uv for dependency management.

Install dependencies (including dev extras):

uv sync

Run tests:

uv run pytest

Run tests with coverage report:

uv run pytest --cov=biohub_data_cli --cov-report=term-missing

Run the CLI from a checkout:

uv run ops-data --help

Integration tests

Tests marked integration hit real S3 buckets / HTTP servers and are deselected by default. Run them explicitly:

uv run pytest -m integration

Code of Conduct

This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.

Reporting Security Issues

If you believe you have found a security issue, please responsibly disclose by contacting us at security@chanzuckerberg.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biohub_data_cli-0.4.0.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

biohub_data_cli-0.4.0-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file biohub_data_cli-0.4.0.tar.gz.

File metadata

  • Download URL: biohub_data_cli-0.4.0.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for biohub_data_cli-0.4.0.tar.gz
Algorithm Hash digest
SHA256 4b49a3b81dcc46ab002d3856a4b0cee27bf86cb8e5921c3d196f3ec04c5beab2
MD5 8e7fb97a8be88449e1f0fcc8ce3f76d5
BLAKE2b-256 8f393702c43f1bc0dda818923f3c855ea1b2524fcf1ed6f51d9504c5b88b2b30

See more details on using hashes here.

Provenance

The following attestation bundles were made for biohub_data_cli-0.4.0.tar.gz:

Publisher: release-please.yml on chanzuckerberg/biohub-data-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file biohub_data_cli-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for biohub_data_cli-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 40b75846821acb8a1d1ce5f72ea7a1c0de715157de731d30a65bc7ed4ceb01b6
MD5 51aac736e26f97d5daaff6b44dcdcfc9
BLAKE2b-256 b2d51290356ed873da39892ee073af57b02928bcca2e2d9b2d7d1783abc1f027

See more details on using hashes here.

Provenance

The following attestation bundles were made for biohub_data_cli-0.4.0-py3-none-any.whl:

Publisher: release-please.yml on chanzuckerberg/biohub-data-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page