Skip to main content

No project description provided

Project description

Workflow status badge

Corecomb: create a XMFA file from Panaroo core gene alignments to detect recombination in core-genome using ClonalFrameML.

Installation

pip install corecomb

Quick start

If you are in Panaroo output directory, just run:

corerecomb 

Get help

$ corecomb --help

 Usage: corecomb [OPTIONS]

 Create XMFA file from ClonalFrameML input from Panaroo core-genome gene alignments

╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --gene_al_dir    TEXT  Path to directory containing core-genome gene alignments [default: core_gene_alignments]                 │
│ --pan_fa         TEXT  Path to Panaroo pan_genome_reference.fa [default: pan_genome_reference.fa]                               │
│ --extension      TEXT  File extension of core-genome gene alignments [default: fas]                                             │
│ --outfile        TEXT  Path to output XMFA file [default: corecomb.xmfa]                                                        │
│ --help                 Show this message and exit.                                                                              │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Why

In theory, using the indivudal core-gene multiple sequence alignments from the core_gene_alignments directory of Panaroo, one could just run a sed command to concatenate these in a XMFA file.

sed -e '$s/$/\n=/' -s ../tests/data/aligned_gene_sequences_raw/*.fas > core_gene_alignment.xmfa

However, this approach suffers from 3 different issues:

  • Sequence names need to be cleaned
  • Ambiguous non N IUPAC characters need to be taken care of (CFML only accepts A,T,G,C,N,-)
  • Genomes with missing genes will cause CFML to crash (core-genome defined at less 100%)

CoRecomb addresses all 3 of these issues. Additionally, CoRecomb uses the order of the genes defined in the pan_genome_reference.fa to re-order the genes in the XMFA file (which will be kept by CFML output core_gene_test_cfml.filtered.fasta).

Test it for yourself

poetry run pytest -vv

Test data can be found here tests/data

corecomb \
    --gene_al_dir tests/data/aligned_gene_sequences_raw \
    --pan_fa tests/data/pan_genome_reference.fa \
    --extension fas \
    --outfile corecomb.xmfa

Use the XMFA with ClonalFrameML

ClonalFrameML \
    input_tree.nwk \
    corecomb.xmfa \
    cfml_output_basename \
    -xmfa_file true \
    -show_progress true \
    -output_filtered true

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corecomb-0.2.1.tar.gz (15.3 kB view hashes)

Uploaded Source

Built Distribution

corecomb-0.2.1-py3-none-any.whl (16.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page