Skip to main content

Detect chromosome-level scaffolds in genome assemblies with inconsistent naming conventions

Project description

ChromDetect Logo

ChromDetect

PyPI version Python versions License: MIT Tests DOI

A toolkit for genome assembly classification, validation, and quality control.


Overview

ChromDetect helps you work with genome assemblies by providing six key capabilities:

Feature Description
Scaffold Classification Identify chromosomes vs unplaced scaffolds based on naming patterns and size
Assembly Validation Validate FASTA files against NCBI assembly reports
Karyotype Checking Verify chromosome counts against 29 species databases
Name Standardization Convert between UCSC, Ensembl, RefSeq, and GenBank conventions
Version Tracking Compare assembly versions and detect scaffold changes
QC Dashboard Generate comparative reports across multiple assemblies

Installation

pip install chromdetect

Quick Examples

# Classify scaffolds in an assembly
chromdetect assembly.fasta

# Validate against NCBI report
chromdetect assembly.fasta --assembly-report report.txt --validate

# Check chromosome count for human
chromdetect assembly.fasta --check-karyotype human

# Convert to UCSC naming (chr1, chr2, chrX)
chromdetect assembly.fasta --rename ucsc -o renamed.fasta

# Compare two assembly versions
chromdetect v1.fasta --compare-versions v2.fasta

# Generate QC dashboard for multiple assemblies
chromdetect --dashboard *.fasta -o dashboard.html --format html

Use Cases

Preparing assemblies for submission

Before submitting to NCBI, check compliance and standardize names:

# Check if names meet NCBI requirements
chromdetect assembly.fasta --check-compliance

# Rename to standard convention
chromdetect assembly.fasta --rename refseq -o submission_ready.fasta

Quality control across projects

Compare multiple assemblies from different sources:

# Generate comparative dashboard
chromdetect --dashboard sample1.fa sample2.fa sample3.fa -o qc_report.html --format html

Validating downloaded assemblies

Verify a FASTA matches its NCBI assembly report:

chromdetect GRCh38.fasta --assembly-report GRCh38_report.txt --validate --strict

Tracking assembly improvements

See what changed between versions:

chromdetect old_assembly.fasta --compare-versions new_assembly.fasta

Output shows promotions, demotions, and metric changes:

SCAFFOLD CHANGES:
  Promoted:    2 scaffolds (unplaced → chromosome)
  Unchanged:   1,150 scaffolds
  N50 change:  +6.7 Mb (+14.6%)

Checking species-specific karyotype

Verify your assembly has the expected chromosomes:

# List available species
chromdetect --list-species

# Check against expected karyotype
chromdetect mouse_assembly.fasta --check-karyotype mouse

Output Formats

Format Flag Use Case
Summary --format summary Quick terminal inspection (default)
JSON --format json Programmatic processing
TSV --format tsv Spreadsheet analysis
HTML --format html Visual reports with charts
BED --format bed Genomics pipelines (bedtools, etc.)
GFF --format gff Genome browsers

Python API

from chromdetect import classify_fasta

# Classify an assembly
results, stats = classify_fasta("assembly.fasta")
print(f"Chromosomes: {stats.chromosome_count}")
print(f"N50: {stats.n50 / 1e6:.1f} Mb")

# Filter to just chromosomes
chromosomes = [r for r in results if r.classification == "chromosome"]
for c in chromosomes:
    print(f"  {c.name}: {c.length:,} bp")

Additional modules for specific tasks:

# Validation
from chromdetect.validation import validate_fasta_against_report

# Karyotype checking
from chromdetect.karyotype import validate_karyotype, KaryotypeDatabase

# Name standardization
from chromdetect.standardize import standardize_fasta, check_ncbi_compliance

# Version comparison
from chromdetect.version import compare_fasta_files

# Multi-assembly dashboard
from chromdetect.dashboard import analyze_multiple_assemblies, generate_dashboard_html

Supported Species (Karyotype Database)

ChromDetect includes karyotype data for 29 species:

Mammals: Human, mouse, rat, dog, cat, horse, cow, pig, sheep, goat, rabbit, guinea pig

Other vertebrates: Chicken, zebrafish, frog

Invertebrates: Fruit fly, C. elegans

Plants: Arabidopsis, rice, maize, wheat, soybean, tomato

Microorganisms: Yeast (S. cerevisiae), E. coli

Use chromdetect --list-species to see all available species with chromosome counts.

Recognized Naming Patterns

ChromDetect automatically recognizes common scaffold naming conventions:

  • Chromosome prefixes: chr1, Chr_1, chromosome_1, Chromosome1
  • Super scaffolds: Super_scaffold_1, Superscaffold_1, SUPER_1
  • Linkage groups: LG1, LG_1, linkage_group_1
  • NCBI accessions: NC_000001.11, CM000663.2
  • Assembly tools: HiC_scaffold_1, Scaffold_1_RaGOO
  • Simple numeric: 1, 2, X, MT

Custom patterns can be added via YAML configuration files.

Limitations

ChromDetect uses naming patterns and size heuristics—it cannot:

  • Detect misassemblies or sequence errors
  • Validate sequence correctness
  • Perform synteny or homology analysis

For comprehensive assembly validation, use ChromDetect alongside tools like QUAST or Merqury.

Citation

If you use ChromDetect in your research, please cite:

@software{chromdetect,
  author = {Handley, Scott A.},
  title = {ChromDetect: A toolkit for genome assembly classification and QC},
  url = {https://github.com/shandley/chromdetect},
  version = {0.6.0},
  doi = {10.5281/zenodo.17945062},
  year = {2025}
}

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chromdetect-0.6.0.tar.gz (92.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chromdetect-0.6.0-py3-none-any.whl (63.2 kB view details)

Uploaded Python 3

File details

Details for the file chromdetect-0.6.0.tar.gz.

File metadata

  • Download URL: chromdetect-0.6.0.tar.gz
  • Upload date:
  • Size: 92.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chromdetect-0.6.0.tar.gz
Algorithm Hash digest
SHA256 7d5a56ed77a420021d2fd2880f3588d123c1d7403a4582318dd5225e51394749
MD5 6f6a24f0a3a5455950ffd40131d7b9e4
BLAKE2b-256 69c55e25a31595192b00cfef8c0dbdf786655926430fb4416b0312343d142012

See more details on using hashes here.

Provenance

The following attestation bundles were made for chromdetect-0.6.0.tar.gz:

Publisher: publish.yml on shandley/chromdetect

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chromdetect-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: chromdetect-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 63.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chromdetect-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 021e810bd10f6893fba66714fcd624084b3794a9a412792c1dcbb174ad0a1d26
MD5 8a0fcf3a161749ef3ad476c067fb1b01
BLAKE2b-256 8ff1a8b934cdfd87dcfb9e3863595cf9f93d7d860d2d93c4a392584f4e40578c

See more details on using hashes here.

Provenance

The following attestation bundles were made for chromdetect-0.6.0-py3-none-any.whl:

Publisher: publish.yml on shandley/chromdetect

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page