Skip to main content

Fetch chromosome and assembled sequence tables from the NCBI Datasets API.

Project description

ChromoRetriever

PyPI DOI Socket Downloads License: MIT

ChromoRetriever is a lightweight Python library and CLI for retrieving chromosome-level sequence metadata from the NCBI Datasets API and exporting it to CSV or TSV. It is designed for bioinformatics workflows that need a simple way to pull chromosome tables for one or many genome assemblies.

Features

  • Retrieve chromosome-level sequence reports from NCBI Datasets
  • Support single-accession and batch workflows
  • Export to CSV or TSV
  • Filter out unplaced assembled sequences by default
  • Preserve clean chromosome ordering for common naming conventions
  • Use as either a Python library or a command-line tool

Installation

From a local checkout

pip install .

Development install

pip install -e .[dev]

Command-line usage

Single accession

chromoretriever GCF_000001735.4

This writes GCF_000001735.4_chromosomes.csv in the current directory.

Batch mode

chromoretriever --file examples/genomes.txt --output chromosomes.csv

Include unplaced assembled sequences

chromoretriever GCF_000001735.4 --include-unplaced

Export as TSV

chromoretriever GCF_000001735.4 --format tsv

Exclude columns

chromoretriever GCF_000001735.4 --exclude-col refseq "gc_content_percent"

Python usage

from chromoretriever import NCBIDatasetsClient, export_records

client = NCBIDatasetsClient()
result = client.fetch_chromosome_table("GCF_000001735.4")

print(result.organism_name)
print(len(result.records))

export_records(result.records, "arabidopsis.csv")

Batch processing from Python

from chromoretriever import process_genome_ids

results = process_genome_ids(
    genome_ids=["GCF_000001735.4", "GCF_009914755.1"],
    output_path="chromosomes.tsv",
    fmt="tsv",
)

for result in results:
    print(result.genome_id, result.organism_name, len(result.records))

Output columns

  • genome_id
  • taxon
  • hromosome
  • genbank
  • refseq
  • size_bp
  • gc_content_percent

Project structure

ChromoRetriever/
├── src/chromoretriever/
│   ├── __init__.py
│   ├── api.py
│   ├── cli.py
│   ├── export.py
│   ├── models.py
│   └── utils.py
├── tests/
├── examples/
├── pyproject.toml
└── README.md

API notes

The current implementation uses these NCBI Datasets endpoints:

  • /genome/accession/{accession}/sequence_reports
  • /genome/accession/{accession}/dataset_report

If NCBI changes the API contract, the client may need to be adjusted.

Development

Run tests:

pytest

Build distributions:

python -m build

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chromoretriever-0.1.1.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chromoretriever-0.1.1-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file chromoretriever-0.1.1.tar.gz.

File metadata

  • Download URL: chromoretriever-0.1.1.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for chromoretriever-0.1.1.tar.gz
Algorithm Hash digest
SHA256 877db7ab75a7ca90dcc242b9d1cb183ec353acc191ff3b3639f8c0227784a064
MD5 4d702e5c8b3bc9eba48cc4857754a94b
BLAKE2b-256 cccf3e60625789c6468932d4959f404b615a3670ced3191b160aefe8171a8cb6

See more details on using hashes here.

File details

Details for the file chromoretriever-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for chromoretriever-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 21b288d04eaa0485428362f91c4c8b91cf72292edcd003156e5ee3a6b501181a
MD5 92e85323a4cdbf4e1c02abba17a4c1c7
BLAKE2b-256 508a63bad2210f258a7fb654789a257ba2c2e5c088fc488d365fd8869ca076d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page