Skip to main content

Fetch chromosome and assembled sequence tables from the NCBI Datasets API.

Project description

ChromoRetriever

ChromoRetriever is a lightweight Python library and CLI for retrieving chromosome-level sequence metadata from the NCBI Datasets API and exporting it to CSV or TSV. It is designed for bioinformatics workflows that need a simple way to pull chromosome tables for one or many genome assemblies.

Features

  • Retrieve chromosome-level sequence reports from NCBI Datasets
  • Support single-accession and batch workflows
  • Export to CSV or TSV
  • Filter out unplaced assembled sequences by default
  • Preserve clean chromosome ordering for common naming conventions
  • Use as either a Python library or a command-line tool

Installation

From a local checkout

pip install .

Development install

pip install -e .[dev]

Command-line usage

Single accession

chromoretriever GCF_000001735.4

This writes GCF_000001735.4_chromosomes.csv in the current directory.

Batch mode

chromoretriever --file examples/genomes.txt --output chromosomes.csv

Include unplaced assembled sequences

chromoretriever GCF_000001735.4 --include-unplaced

Export as TSV

chromoretriever GCF_000001735.4 --format tsv

Exclude columns

chromoretriever GCF_000001735.4 --exclude-col refseq "gc_content_percent"

Python usage

from chromoretriever import NCBIDatasetsClient, export_records

client = NCBIDatasetsClient()
result = client.fetch_chromosome_table("GCF_000001735.4")

print(result.organism_name)
print(len(result.records))

export_records(result.records, "arabidopsis.csv")

Batch processing from Python

from chromoretriever import process_genome_ids

results = process_genome_ids(
    genome_ids=["GCF_000001735.4", "GCF_009914755.1"],
    output_path="chromosomes.tsv",
    fmt="tsv",
)

for result in results:
    print(result.genome_id, result.organism_name, len(result.records))

Output columns

  • genome_id
  • taxon
  • hromosome
  • genbank
  • refseq
  • size_bp
  • gc_content_percent

Project structure

ChromoRetriever/
├── src/chromoretriever/
│   ├── __init__.py
│   ├── api.py
│   ├── cli.py
│   ├── export.py
│   ├── models.py
│   └── utils.py
├── tests/
├── examples/
├── pyproject.toml
└── README.md

API notes

The current implementation uses these NCBI Datasets endpoints:

  • /genome/accession/{accession}/sequence_reports
  • /genome/accession/{accession}/dataset_report

If NCBI changes the API contract, the client may need to be adjusted.

Development

Run tests:

pytest

Build distributions:

python -m build

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chromoretriever-0.1.0.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chromoretriever-0.1.0-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file chromoretriever-0.1.0.tar.gz.

File metadata

  • Download URL: chromoretriever-0.1.0.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for chromoretriever-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b8f2d2308fcf401d5e88893c89cb225755ce364ece18eb6277e3685628036da4
MD5 b54e8e8ae269cded4748b30f5b16a7f0
BLAKE2b-256 33c4c7b6fdb205e5fd4a5e5e9b53b6a6cea48abc47ba6cda79b24107c67ddc33

See more details on using hashes here.

File details

Details for the file chromoretriever-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for chromoretriever-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3221616ac8d4f017683cc0ee567d11a8d72e8b09b282ad71d840a345eb8e636b
MD5 00f4354b93fec21d89ec8c55f83c43b3
BLAKE2b-256 a37cfca1cf961543cee639b7eb4834930222515c3547e9b505c11ddfa2e9271a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page