Skip to main content

Fetch chromosome and assembled sequence tables from the NCBI Datasets API.

Project description

ChromoRetriever

PyPI DOI Socket Downloads License: MIT

ChromoRetriever is a lightweight Python library and CLI for retrieving chromosome-level sequence metadata from the NCBI Datasets API and exporting it to CSV or TSV. It is designed for bioinformatics workflows that need a simple way to pull chromosome tables for one or many genome assemblies.

Features

  • Retrieve chromosome-level sequence reports from NCBI Datasets
  • Support single-accession and batch workflows
  • Export to CSV or TSV
  • Filter out unplaced assembled sequences by default
  • Preserve clean chromosome ordering for common naming conventions
  • Use as either a Python library or a command-line tool

Installation

From a local checkout

pip install .

Development install

pip install -e .[dev]

Command-line usage

Single accession

chromoretriever GCF_000001735.4

This writes GCF_000001735.4_chromosomes.csv in the current directory.

Batch mode

chromoretriever --file examples/genomes.txt --output chromosomes.csv

Include unplaced assembled sequences

chromoretriever GCF_000001735.4 --include-unplaced

Export as TSV

chromoretriever GCF_000001735.4 --format tsv

Exclude columns

chromoretriever GCF_000001735.4 --exclude-col refseq "gc_content_percent"

Python usage

from chromoretriever import NCBIDatasetsClient, export_records

client = NCBIDatasetsClient()
result = client.fetch_chromosome_table("GCF_000001735.4")

print(result.organism_name)
print(len(result.records))

export_records(result.records, "arabidopsis.csv")

Batch processing from Python

from chromoretriever import process_genome_ids

results = process_genome_ids(
    genome_ids=["GCF_000001735.4", "GCF_009914755.1"],
    output_path="chromosomes.tsv",
    fmt="tsv",
)

for result in results:
    print(result.genome_id, result.organism_name, len(result.records))

Output columns

  • genome_id
  • taxon
  • hromosome
  • genbank
  • refseq
  • size_bp
  • gc_content_percent

Project structure

ChromoRetriever/
├── src/chromoretriever/
│   ├── __init__.py
│   ├── api.py
│   ├── cli.py
│   ├── export.py
│   ├── models.py
│   └── utils.py
├── tests/
├── examples/
├── pyproject.toml
└── README.md

API notes

The current implementation uses these NCBI Datasets endpoints:

  • /genome/accession/{accession}/sequence_reports
  • /genome/accession/{accession}/dataset_report

If NCBI changes the API contract, the client may need to be adjusted.

Development

Run tests:

pytest

Build distributions:

python -m build

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chromoretriever-0.1.2.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chromoretriever-0.1.2-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file chromoretriever-0.1.2.tar.gz.

File metadata

  • Download URL: chromoretriever-0.1.2.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for chromoretriever-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b58b93b9ead611438fc512522cec225bf152ff21183225323703908660257f63
MD5 38206fcfaaf022f7c7db6e917bd743fb
BLAKE2b-256 3fee17c80afdacd690839feb6fc050e866f0b9541fcab42160a7c6f4abda8c53

See more details on using hashes here.

File details

Details for the file chromoretriever-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for chromoretriever-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 926bd7013234994b427ba1d1ae2eba796f2d11dd3aeacad8f676c5d597241a46
MD5 8cc36c5ab37f87c17c5e97a59f00814a
BLAKE2b-256 89edacd95e9fb2f69cb6908191f7d2c9c855502d9d3bcfc9ad525c4c534b2961

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page