Detect chromosome-level scaffolds in genome assemblies with inconsistent naming conventions
Project description
ChromDetect
A toolkit for genome assembly classification, validation, and quality control.
Overview
ChromDetect helps you work with genome assemblies by providing six key capabilities:
| Feature | Description |
|---|---|
| Scaffold Classification | Identify chromosomes vs unplaced scaffolds based on naming patterns and size |
| Assembly Validation | Validate FASTA files against NCBI assembly reports |
| Karyotype Checking | Verify chromosome counts against 29 species databases |
| Name Standardization | Convert between UCSC, Ensembl, RefSeq, and GenBank conventions |
| Version Tracking | Compare assembly versions and detect scaffold changes |
| QC Dashboard | Generate comparative reports across multiple assemblies |
Installation
pip install chromdetect
Quick Examples
# Classify scaffolds in an assembly
chromdetect assembly.fasta
# Validate against NCBI report
chromdetect assembly.fasta --assembly-report report.txt --validate
# Check chromosome count for human
chromdetect assembly.fasta --check-karyotype human
# Convert to UCSC naming (chr1, chr2, chrX)
chromdetect assembly.fasta --rename ucsc -o renamed.fasta
# Compare two assembly versions
chromdetect v1.fasta --compare-versions v2.fasta
# Generate QC dashboard for multiple assemblies
chromdetect --dashboard *.fasta -o dashboard.html --format html
Use Cases
Preparing assemblies for submission
Before submitting to NCBI, check compliance and standardize names:
# Check if names meet NCBI requirements
chromdetect assembly.fasta --check-compliance
# Rename to standard convention
chromdetect assembly.fasta --rename refseq -o submission_ready.fasta
Quality control across projects
Compare multiple assemblies from different sources:
# Generate comparative dashboard
chromdetect --dashboard sample1.fa sample2.fa sample3.fa -o qc_report.html --format html
Validating downloaded assemblies
Verify a FASTA matches its NCBI assembly report:
chromdetect GRCh38.fasta --assembly-report GRCh38_report.txt --validate --strict
Tracking assembly improvements
See what changed between versions:
chromdetect old_assembly.fasta --compare-versions new_assembly.fasta
Output shows promotions, demotions, and metric changes:
SCAFFOLD CHANGES:
Promoted: 2 scaffolds (unplaced → chromosome)
Unchanged: 1,150 scaffolds
N50 change: +6.7 Mb (+14.6%)
Checking species-specific karyotype
Verify your assembly has the expected chromosomes:
# List available species
chromdetect --list-species
# Check against expected karyotype
chromdetect mouse_assembly.fasta --check-karyotype mouse
Output Formats
| Format | Flag | Use Case |
|---|---|---|
| Summary | --format summary |
Quick terminal inspection (default) |
| JSON | --format json |
Programmatic processing |
| TSV | --format tsv |
Spreadsheet analysis |
| HTML | --format html |
Visual reports with charts |
| BED | --format bed |
Genomics pipelines (bedtools, etc.) |
| GFF | --format gff |
Genome browsers |
Python API
from chromdetect import classify_fasta
# Classify an assembly
results, stats = classify_fasta("assembly.fasta")
print(f"Chromosomes: {stats.chromosome_count}")
print(f"N50: {stats.n50 / 1e6:.1f} Mb")
# Filter to just chromosomes
chromosomes = [r for r in results if r.classification == "chromosome"]
for c in chromosomes:
print(f" {c.name}: {c.length:,} bp")
Additional modules for specific tasks:
# Validation
from chromdetect.validation import validate_fasta_against_report
# Karyotype checking
from chromdetect.karyotype import validate_karyotype, KaryotypeDatabase
# Name standardization
from chromdetect.standardize import standardize_fasta, check_ncbi_compliance
# Version comparison
from chromdetect.version import compare_fasta_files
# Multi-assembly dashboard
from chromdetect.dashboard import analyze_multiple_assemblies, generate_dashboard_html
Supported Species (Karyotype Database)
ChromDetect includes karyotype data for 29 species:
Mammals: Human, mouse, rat, dog, cat, horse, cow, pig, sheep, goat, rabbit, guinea pig
Other vertebrates: Chicken, zebrafish, frog
Invertebrates: Fruit fly, C. elegans
Plants: Arabidopsis, rice, maize, wheat, soybean, tomato
Microorganisms: Yeast (S. cerevisiae), E. coli
Use chromdetect --list-species to see all available species with chromosome counts.
Recognized Naming Patterns
ChromDetect automatically recognizes common scaffold naming conventions:
- Chromosome prefixes:
chr1,Chr_1,chromosome_1,Chromosome1 - Super scaffolds:
Super_scaffold_1,Superscaffold_1,SUPER_1 - Linkage groups:
LG1,LG_1,linkage_group_1 - NCBI accessions:
NC_000001.11,CM000663.2 - Assembly tools:
HiC_scaffold_1,Scaffold_1_RaGOO - Simple numeric:
1,2,X,MT
Custom patterns can be added via YAML configuration files.
Limitations
ChromDetect uses naming patterns and size heuristics—it cannot:
- Detect misassemblies or sequence errors
- Validate sequence correctness
- Perform synteny or homology analysis
For comprehensive assembly validation, use ChromDetect alongside tools like QUAST or Merqury.
Citation
If you use ChromDetect in your research, please cite:
@software{chromdetect,
author = {Handley, Scott A.},
title = {ChromDetect: A toolkit for genome assembly classification and QC},
url = {https://github.com/shandley/chromdetect},
version = {0.6.0},
doi = {10.5281/zenodo.17945062},
year = {2025}
}
License
MIT License - see LICENSE for details.
Contributing
Contributions welcome! See CONTRIBUTING.md for guidelines.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chromdetect-0.6.0.tar.gz.
File metadata
- Download URL: chromdetect-0.6.0.tar.gz
- Upload date:
- Size: 92.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d5a56ed77a420021d2fd2880f3588d123c1d7403a4582318dd5225e51394749
|
|
| MD5 |
6f6a24f0a3a5455950ffd40131d7b9e4
|
|
| BLAKE2b-256 |
69c55e25a31595192b00cfef8c0dbdf786655926430fb4416b0312343d142012
|
Provenance
The following attestation bundles were made for chromdetect-0.6.0.tar.gz:
Publisher:
publish.yml on shandley/chromdetect
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chromdetect-0.6.0.tar.gz -
Subject digest:
7d5a56ed77a420021d2fd2880f3588d123c1d7403a4582318dd5225e51394749 - Sigstore transparency entry: 868278674
- Sigstore integration time:
-
Permalink:
shandley/chromdetect@2b27c30a26868cfb97ae33efbcc4f64327da9856 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/shandley
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2b27c30a26868cfb97ae33efbcc4f64327da9856 -
Trigger Event:
release
-
Statement type:
File details
Details for the file chromdetect-0.6.0-py3-none-any.whl.
File metadata
- Download URL: chromdetect-0.6.0-py3-none-any.whl
- Upload date:
- Size: 63.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
021e810bd10f6893fba66714fcd624084b3794a9a412792c1dcbb174ad0a1d26
|
|
| MD5 |
8a0fcf3a161749ef3ad476c067fb1b01
|
|
| BLAKE2b-256 |
8ff1a8b934cdfd87dcfb9e3863595cf9f93d7d860d2d93c4a392584f4e40578c
|
Provenance
The following attestation bundles were made for chromdetect-0.6.0-py3-none-any.whl:
Publisher:
publish.yml on shandley/chromdetect
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chromdetect-0.6.0-py3-none-any.whl -
Subject digest:
021e810bd10f6893fba66714fcd624084b3794a9a412792c1dcbb174ad0a1d26 - Sigstore transparency entry: 868278677
- Sigstore integration time:
-
Permalink:
shandley/chromdetect@2b27c30a26868cfb97ae33efbcc4f64327da9856 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/shandley
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2b27c30a26868cfb97ae33efbcc4f64327da9856 -
Trigger Event:
release
-
Statement type: