No project description provided
Project description
Corecomb: create a XMFA file from Panaroo core gene alignments to detect recombination in core-genome using ClonalFrameML.
Installation
pip install corecomb
Quick start
If you are in Panaroo output directory, just run:
corerecomb
Get help
$ corecomb --help
Usage: corecomb [OPTIONS]
Create XMFA file from ClonalFrameML input from Panaroo core-genome gene alignments
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --gene_al_dir TEXT Path to directory containing core-genome gene alignments [default: core_gene_alignments] │
│ --pan_fa TEXT Path to Panaroo pan_genome_reference.fa [default: pan_genome_reference.fa] │
│ --extension TEXT File extension of core-genome gene alignments [default: fas] │
│ --outfile TEXT Path to output XMFA file [default: corecomb.xmfa] │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Why
In theory, using the indivudal core-gene multiple sequence alignments from the core_gene_alignments
directory of Panaroo, one could just run a sed
command to concatenate these in a XMFA file.
sed -e '$s/$/\n=/' -s ../tests/data/aligned_gene_sequences_raw/*.fas > core_gene_alignment.xmfa
However, this approach suffers from 3 different issues:
- Sequence names need to be cleaned
- Ambiguous non
N
IUPAC characters need to be taken care of (CFML only acceptsA,T,G,C,N,-
) - Genomes with missing genes will cause CFML to crash (core-genome defined at less 100%)
CoRecomb addresses all 3 of these issues. Additionally, CoRecomb uses the order of the genes defined in the
pan_genome_reference.fa
to re-order the genes in the XMFA file (which will be kept by CFML outputcore_gene_test_cfml.filtered.fasta
).
Test it for yourself
poetry run pytest -vv
Test data can be found here tests/data
corecomb \
--gene_al_dir tests/data/aligned_gene_sequences_raw \
--pan_fa tests/data/pan_genome_reference.fa \
--extension fas \
--outfile corecomb.xmfa
Use the XMFA with ClonalFrameML
ClonalFrameML \
input_tree.nwk \
corecomb.xmfa \
cfml_output_basename \
-xmfa_file true \
-show_progress true \
-output_filtered true
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file corecomb-0.2.1.tar.gz
.
File metadata
- Download URL: corecomb-0.2.1.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.13 Linux/6.2.0-1019-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf33f43d441539173da98f274b48e0036ced8c0b065b3d343a66bd6f633a2b17 |
|
MD5 | efffa91d86b79ddc36ab8ad81ff20c7f |
|
BLAKE2b-256 | 7efe973f71e2361f4593066495f6ea40c45bee0578438ff4483bd0d5874a27d7 |
File details
Details for the file corecomb-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: corecomb-0.2.1-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.13 Linux/6.2.0-1019-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bde66973647a712b7d394b20ef3ab48a83b552883fbcbf3b505f6b752e69cd1b |
|
MD5 | a7639225c4ce4d8a521bb7219292a651 |
|
BLAKE2b-256 | 966295a2f052d3ec87f16f20546e353a9102249d692f72adb32775a341ed2ea9 |