Fetch chromosome and assembled sequence tables from the NCBI Datasets API.
Project description
ChromoRetriever
ChromoRetriever is a lightweight Python library and CLI for retrieving chromosome-level sequence metadata from the NCBI Datasets API and exporting it to CSV or TSV. It is designed for bioinformatics workflows that need a simple way to pull chromosome tables for one or many genome assemblies.
Features
- Retrieve chromosome-level sequence reports from NCBI Datasets
- Support single-accession and batch workflows
- Export to CSV or TSV
- Filter out unplaced assembled sequences by default
- Preserve clean chromosome ordering for common naming conventions
- Use as either a Python library or a command-line tool
Installation
From a local checkout
pip install .
Development install
pip install -e .[dev]
Command-line usage
Single accession
chromoretriever GCF_000001735.4
This writes GCF_000001735.4_chromosomes.csv in the current directory.
Batch mode
chromoretriever --file examples/genomes.txt --output chromosomes.csv
Include unplaced assembled sequences
chromoretriever GCF_000001735.4 --include-unplaced
Export as TSV
chromoretriever GCF_000001735.4 --format tsv
Exclude columns
chromoretriever GCF_000001735.4 --exclude-col refseq "gc_content_percent"
Python usage
from chromoretriever import NCBIDatasetsClient, export_records
client = NCBIDatasetsClient()
result = client.fetch_chromosome_table("GCF_000001735.4")
print(result.organism_name)
print(len(result.records))
export_records(result.records, "arabidopsis.csv")
Batch processing from Python
from chromoretriever import process_genome_ids
results = process_genome_ids(
genome_ids=["GCF_000001735.4", "GCF_009914755.1"],
output_path="chromosomes.tsv",
fmt="tsv",
)
for result in results:
print(result.genome_id, result.organism_name, len(result.records))
Output columns
genome_idtaxonhromosomegenbankrefseqsize_bpgc_content_percent
Project structure
ChromoRetriever/
├── src/chromoretriever/
│ ├── __init__.py
│ ├── api.py
│ ├── cli.py
│ ├── export.py
│ ├── models.py
│ └── utils.py
├── tests/
├── examples/
├── pyproject.toml
└── README.md
API notes
The current implementation uses these NCBI Datasets endpoints:
/genome/accession/{accession}/sequence_reports/genome/accession/{accession}/dataset_report
If NCBI changes the API contract, the client may need to be adjusted.
Development
Run tests:
pytest
Build distributions:
python -m build
License
MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chromoretriever-0.1.2.tar.gz.
File metadata
- Download URL: chromoretriever-0.1.2.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b58b93b9ead611438fc512522cec225bf152ff21183225323703908660257f63
|
|
| MD5 |
38206fcfaaf022f7c7db6e917bd743fb
|
|
| BLAKE2b-256 |
3fee17c80afdacd690839feb6fc050e866f0b9541fcab42160a7c6f4abda8c53
|
File details
Details for the file chromoretriever-0.1.2-py3-none-any.whl.
File metadata
- Download URL: chromoretriever-0.1.2-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
926bd7013234994b427ba1d1ae2eba796f2d11dd3aeacad8f676c5d597241a46
|
|
| MD5 |
8cc36c5ab37f87c17c5e97a59f00814a
|
|
| BLAKE2b-256 |
89edacd95e9fb2f69cb6908191f7d2c9c855502d9d3bcfc9ad525c4c534b2961
|