Biohub data CLI
Project description
data-cli
Command-line tool for downloading datasets published by CZ Biohub. Resolves a collection ID to its constituent datasets and downloads files from S3 and HTTP, with progress bars, size estimates, and dry-run accounting.
Installation
To install the OPS data CLI, run:
pip install biohub-data-cli
Quick start
See what a collection contains without downloading:
ops-data download collection <collection-id> --dry-run
Download a collection to the current directory:
ops-data download collection <collection-id>
Download multiple collections to a specific directory, skipping the prompt:
ops-data download collection <id-a> <id-b> -o ./data -y
Files land under <outdir>/<collection-slug>/<dataset-slug>/.
Commands
ops-data download collection IDS...
Download one or more collections by ID.
| Option | Description |
|---|---|
-o, --outdir PATH |
Output directory. Defaults to .. |
-y, --yes |
Skip the size-estimate confirmation prompt. |
--dry-run |
Print per-dataset size statistics without downloading. Mutually exclusive with -y. |
Dry run resolves every S3 URI (listing prefixes, heading objects) to report exact byte totals per dataset. HTTP URLs are not sized during dry run and surface as a warning in the summary.
Confirmation prompt shows the aggregate size estimate before any bytes move. Pass -y to skip it in scripts.
Failures are collected and reported at the end. The process exits non-zero if any download failed, but other downloads continue — one bad URL won't abort the run.
Development
This project uses uv for dependency management.
Install dependencies (including dev extras):
uv sync
Run tests:
uv run pytest
Run tests with coverage report:
uv run pytest --cov=biohub_data_cli --cov-report=term-missing
Run the CLI from a checkout:
uv run ops-data --help
Integration tests
Tests marked integration hit real S3 buckets / HTTP servers and are deselected by default. Run them explicitly:
uv run pytest -m integration
Code of Conduct
This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.
Reporting Security Issues
If you believe you have found a security issue, please responsibly disclose by contacting us at security@chanzuckerberg.com.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biohub_data_cli-0.3.0.tar.gz.
File metadata
- Download URL: biohub_data_cli-0.3.0.tar.gz
- Upload date:
- Size: 21.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12cbf94ad2e4b36acc9b8eb9b415053ea1918a4abec73193839f89411659a478
|
|
| MD5 |
18cb66f2c8f603a9537a3408e918dc1f
|
|
| BLAKE2b-256 |
557f94b72624e345cdf094a442a45f390bc7c60c62c1218b23c0e1fd4a6024aa
|
Provenance
The following attestation bundles were made for biohub_data_cli-0.3.0.tar.gz:
Publisher:
release-please.yml on chanzuckerberg/biohub-data-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biohub_data_cli-0.3.0.tar.gz -
Subject digest:
12cbf94ad2e4b36acc9b8eb9b415053ea1918a4abec73193839f89411659a478 - Sigstore transparency entry: 1693322051
- Sigstore integration time:
-
Permalink:
chanzuckerberg/biohub-data-cli@e53a386ff39e05ab7176053da13746f3e39505a8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/chanzuckerberg
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@e53a386ff39e05ab7176053da13746f3e39505a8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file biohub_data_cli-0.3.0-py3-none-any.whl.
File metadata
- Download URL: biohub_data_cli-0.3.0-py3-none-any.whl
- Upload date:
- Size: 23.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16ae3f403656c3f06f971a3e0ced950763f66b21efe133e07b5cadfb90456272
|
|
| MD5 |
9a157e69d9b8dbb5391b4fdf085e77eb
|
|
| BLAKE2b-256 |
99660a937964d4e6123e26a59f214c20766ddbc271996bf7922606ee028365aa
|
Provenance
The following attestation bundles were made for biohub_data_cli-0.3.0-py3-none-any.whl:
Publisher:
release-please.yml on chanzuckerberg/biohub-data-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biohub_data_cli-0.3.0-py3-none-any.whl -
Subject digest:
16ae3f403656c3f06f971a3e0ced950763f66b21efe133e07b5cadfb90456272 - Sigstore transparency entry: 1693322311
- Sigstore integration time:
-
Permalink:
chanzuckerberg/biohub-data-cli@e53a386ff39e05ab7176053da13746f3e39505a8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/chanzuckerberg
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yml@e53a386ff39e05ab7176053da13746f3e39505a8 -
Trigger Event:
push
-
Statement type: