Reproducible bundles for EBP genome assemblies
Project description
genomebundle
genomebundle bundles metadata and files for Earth BioGenome Project (EBP) genome assemblies into a single, reproducible package.
The core idea is simple: when you use a genome assembly in your research, you should be able to document exactly what you downloaded, when, and from where — with checksums. genomebundle does this by aggregating data from three sources into a machine-readable manifest.json:
- GoaT (Genomes on a Tree) — taxonomy and cross-references
- NCBI Datasets — assembly statistics and FTP file URLs
- BlobToolKit — BUSCO completeness results (assembly quality metrics)
This makes it easier to cite the data precisely and to keep pipelines reproducible across time.
Installation
pip install genomebundle
Basic CLI usage
# Download FASTA and GFF
genomebundle fetch GCF_040938575.1 --files fasta,gff
# Download all associated files
genomebundle fetch GCF_040938575.1 --files all
# Build manifest only (no download)
genomebundle fetch GCF_040938575.1 --no-download
# Verify checksums of a downloaded bundle
genomebundle verify ./GCF_040938575.1/
# Print manifest of an existing bundle
genomebundle show ./GCF_040938575.1/
Python API
from genomebundle import fetch_assembly, fetch_assembly_report, fetch_busco
goat = fetch_assembly("GCF_040938575.1")
ncbi = fetch_assembly_report("GCF_040938575.1")
btk = fetch_busco("GCF_040938575.1")
Output
Each bundle contains:
manifest.json— machine-readable, includes SHA256 checksums and source URLsREADME.txt— human-readable summary- downloaded files (optional)
References
- Challis et al. (2023). GoaT: Genomes on a Tree. Wellcome Open Research. https://doi.org/10.12688/wellcomeopenres.18658.1
- Byrd et al. (2024). Best practices for genetic and genomic data archiving. Nature Ecology & Evolution. https://doi.org/10.1038/s41559-024-02423-7
- Dainat et al. (2025). Guidelines for gene and genome assembly nomenclature. Genetics. https://doi.org/10.1093/genetics/iyaf006
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file genomebundle-0.1.3.tar.gz.
File metadata
- Download URL: genomebundle-0.1.3.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c8856f7a7b8602f88140226ea973a80a4dd49e434753dee498acf09407a33b4
|
|
| MD5 |
451c53c1638a135dfe0d20a057ce67b0
|
|
| BLAKE2b-256 |
94ec2c4f4b1810c0801edf9f6c39c9811d4d414ac562a99afd6dc2b46f962ae8
|
File details
Details for the file genomebundle-0.1.3-py3-none-any.whl.
File metadata
- Download URL: genomebundle-0.1.3-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19fc2d257f812f2500732b59ffe1b29c07f5da23ce0a1520f3c0f3fb5970bf20
|
|
| MD5 |
764331e413036f1e61ce82a1f3af52ca
|
|
| BLAKE2b-256 |
0b92d84308f3e7904dd5601bfb63d51d58bf7865f12dd168545294b4a5429162
|