Skip to main content

Biohub data CLI

Project description

data-cli

CI Coverage PyPI Python

Command-line tool for downloading datasets published by CZ Biohub. Resolves a collection ID to its constituent datasets and downloads files from S3 and HTTP, with progress bars, size estimates, and dry-run accounting.

Installation

To install the OPS data CLI, run:

pip install biohub-data-cli

Quick start

See what a collection contains without downloading:

ops-data download collection <collection-id> --dry-run

Download a collection to the current directory:

ops-data download collection <collection-id>

Download multiple collections to a specific directory, skipping the prompt:

ops-data download collection <id-a> <id-b> -o ./data -y

Files land under <outdir>/<collection-slug>/<dataset-slug>/.

Commands

ops-data download collection IDS...

Download one or more collections by ID.

Option Description
-o, --outdir PATH Output directory. Defaults to ..
-y, --yes Skip the size-estimate confirmation prompt.
--dry-run Print per-dataset size statistics without downloading. Mutually exclusive with -y.

Dry run resolves every S3 URI (listing prefixes, heading objects) to report exact byte totals per dataset. HTTP URLs are not sized during dry run and surface as a warning in the summary.

Confirmation prompt shows the aggregate size estimate before any bytes move. Pass -y to skip it in scripts.

Failures are collected and reported at the end. The process exits non-zero if any download failed, but other downloads continue — one bad URL won't abort the run.

Development

This project uses uv for dependency management.

Install dependencies (including dev extras):

uv sync

Run tests:

uv run pytest

Run tests with coverage report:

uv run pytest --cov=biohub_data_cli --cov-report=term-missing

Run the CLI from a checkout:

uv run ops-data --help

Integration tests

Tests marked integration hit real S3 buckets / HTTP servers and are deselected by default. Run them explicitly:

uv run pytest -m integration

Code of Conduct

This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.

Reporting Security Issues

If you believe you have found a security issue, please responsibly disclose by contacting us at security@chanzuckerberg.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biohub_data_cli-0.3.0.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

biohub_data_cli-0.3.0-py3-none-any.whl (23.4 kB view details)

Uploaded Python 3

File details

Details for the file biohub_data_cli-0.3.0.tar.gz.

File metadata

  • Download URL: biohub_data_cli-0.3.0.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for biohub_data_cli-0.3.0.tar.gz
Algorithm Hash digest
SHA256 12cbf94ad2e4b36acc9b8eb9b415053ea1918a4abec73193839f89411659a478
MD5 18cb66f2c8f603a9537a3408e918dc1f
BLAKE2b-256 557f94b72624e345cdf094a442a45f390bc7c60c62c1218b23c0e1fd4a6024aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for biohub_data_cli-0.3.0.tar.gz:

Publisher: release-please.yml on chanzuckerberg/biohub-data-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file biohub_data_cli-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for biohub_data_cli-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 16ae3f403656c3f06f971a3e0ced950763f66b21efe133e07b5cadfb90456272
MD5 9a157e69d9b8dbb5391b4fdf085e77eb
BLAKE2b-256 99660a937964d4e6123e26a59f214c20766ddbc271996bf7922606ee028365aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for biohub_data_cli-0.3.0-py3-none-any.whl:

Publisher: release-please.yml on chanzuckerberg/biohub-data-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page