Skip to main content

No project description provided

Project description

Workflow status badge

Corecomb: create a XMFA file from Panaroo core gene alignments to detect recombination in core-genome using ClonalFrameML.

Installation

pip install corecomb

Quick start

If you are in Panaroo output directory, just run:

corerecomb 

Get help

$ corecomb --help

 Usage: corecomb [OPTIONS]

 Create XMFA file from ClonalFrameML input from Panaroo core-genome gene alignments

╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --gene_al_dir    TEXT  Path to directory containing core-genome gene alignments [default: core_gene_alignments]                 │
│ --pan_fa         TEXT  Path to Panaroo pan_genome_reference.fa [default: pan_genome_reference.fa]                               │
│ --extension      TEXT  File extension of core-genome gene alignments [default: fas]                                             │
│ --outfile        TEXT  Path to output XMFA file [default: corecomb.xmfa]                                                        │
│ --help                 Show this message and exit.                                                                              │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Why

In theory, using the indivudal core-gene multiple sequence alignments from the core_gene_alignments directory of Panaroo, one could just run a sed command to concatenate these in a XMFA file.

sed -e '$s/$/\n=/' -s ../tests/data/aligned_gene_sequences_raw/*.fas > core_gene_alignment.xmfa

However, this approach suffers from 3 different issues:

  • Sequence names need to be cleaned
  • Ambiguous non N IUPAC characters need to be taken care of (CFML only accepts A,T,G,C,N,-)
  • Genomes with missing genes will cause CFML to crash (core-genome defined at less 100%)

CoRecomb addresses all 3 of these issues. Additionally, CoRecomb uses the order of the genes defined in the pan_genome_reference.fa to re-order the genes in the XMFA file (which will be kept by CFML output core_gene_test_cfml.filtered.fasta).

Test it for yourself

poetry run pytest -vv

Test data can be found here tests/data

corecomb \
    --gene_al_dir tests/data/aligned_gene_sequences_raw \
    --pan_fa tests/data/pan_genome_reference.fa \
    --extension fas \
    --outfile corecomb.xmfa

Use the XMFA with ClonalFrameML

ClonalFrameML \
    input_tree.nwk \
    corecomb.xmfa \
    cfml_output_basename \
    -xmfa_file true \
    -show_progress true \
    -output_filtered true

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corecomb-0.2.1.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

corecomb-0.2.1-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file corecomb-0.2.1.tar.gz.

File metadata

  • Download URL: corecomb-0.2.1.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.13 Linux/6.2.0-1019-azure

File hashes

Hashes for corecomb-0.2.1.tar.gz
Algorithm Hash digest
SHA256 cf33f43d441539173da98f274b48e0036ced8c0b065b3d343a66bd6f633a2b17
MD5 efffa91d86b79ddc36ab8ad81ff20c7f
BLAKE2b-256 7efe973f71e2361f4593066495f6ea40c45bee0578438ff4483bd0d5874a27d7

See more details on using hashes here.

File details

Details for the file corecomb-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: corecomb-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.13 Linux/6.2.0-1019-azure

File hashes

Hashes for corecomb-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bde66973647a712b7d394b20ef3ab48a83b552883fbcbf3b505f6b752e69cd1b
MD5 a7639225c4ce4d8a521bb7219292a651
BLAKE2b-256 966295a2f052d3ec87f16f20546e353a9102249d692f72adb32775a341ed2ea9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page