No project description provided
Project description
Corecomb: create a XMFA file from Panaroo core gene alignments to detect recombination in core-genome using ClonalFrameML.
Installation
pip install corecomb
Quick start
If you are in Panaroo output directory, just run:
corerecomb
Get help
$ corecomb --help
Usage: corecomb [OPTIONS]
Create XMFA file from ClonalFrameML input from Panaroo core-genome gene alignments
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --gene_al_dir TEXT Path to directory containing core-genome gene alignments [default: core_gene_alignments] │
│ --pan_fa TEXT Path to Panaroo pan_genome_reference.fa [default: pan_genome_reference.fa] │
│ --extension TEXT File extension of core-genome gene alignments [default: fas] │
│ --outfile TEXT Path to output XMFA file [default: corecomb.xmfa] │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Why
In theory, using the indivudal core-gene multiple sequence alignments from the core_gene_alignments directory of Panaroo, one could just run a sed command to concatenate these in a XMFA file.
sed -e '$s/$/\n=/' -s ../tests/data/aligned_gene_sequences_raw/*.fas > core_gene_alignment.xmfa
However, this approach suffers from 3 different issues:
- Sequence names need to be cleaned
- Ambiguous non
NIUPAC characters need to be taken care of (CFML only acceptsA,T,G,C,N,-) - Genomes with missing genes will cause CFML to crash (core-genome defined at less 100%)
CoRecomb addresses all 3 of these issues. Additionally, CoRecomb uses the order of the genes defined in the
pan_genome_reference.fato re-order the genes in the XMFA file (which will be kept by CFML outputcore_gene_test_cfml.filtered.fasta).
Test it for yourself
poetry run pytest -vv
Test data can be found here tests/data
corecomb \
--gene_al_dir tests/data/aligned_gene_sequences_raw \
--pan_fa tests/data/pan_genome_reference.fa \
--extension fas \
--outfile corecomb.xmfa
Use the XMFA with ClonalFrameML
ClonalFrameML \
input_tree.nwk \
corecomb.xmfa \
cfml_output_basename \
-xmfa_file true \
-show_progress true \
-output_filtered true
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file corecomb-0.2.1.tar.gz.
File metadata
- Download URL: corecomb-0.2.1.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.13 Linux/6.2.0-1019-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf33f43d441539173da98f274b48e0036ced8c0b065b3d343a66bd6f633a2b17
|
|
| MD5 |
efffa91d86b79ddc36ab8ad81ff20c7f
|
|
| BLAKE2b-256 |
7efe973f71e2361f4593066495f6ea40c45bee0578438ff4483bd0d5874a27d7
|
File details
Details for the file corecomb-0.2.1-py3-none-any.whl.
File metadata
- Download URL: corecomb-0.2.1-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.13 Linux/6.2.0-1019-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bde66973647a712b7d394b20ef3ab48a83b552883fbcbf3b505f6b752e69cd1b
|
|
| MD5 |
a7639225c4ce4d8a521bb7219292a651
|
|
| BLAKE2b-256 |
966295a2f052d3ec87f16f20546e353a9102249d692f72adb32775a341ed2ea9
|