Skip to main content

Annotate orthogenes and create Roary-like plots

Project description

OrthoFinder Tools

Idea

  • Calculate the most common gene name of each orthogroup by majority vote: annotate_orthogroups
  • Create plots analogous to roary_plots: orthofinder_plots

Setup

pip install orthofinder-tools

Usage

annotate_orthogroups

Prerequisites

Your FASTA sequences must have some description, e.g.:

>gnl|extdb|STRAIN-XY_000001 DNA-directed RNA polymerase subunit beta' [Pediococcus stilesii]
MIDVNKFESMQIGLASPDKIRMWSYGEVKKPETINYRTLKPEKDGLFDERIFGPTKDYECACGKYKRIRY
...

From this protein, DNA-directed RNA polymerase subunit beta' will be extracted.

Command line usage

annotate_orthogroups --help

annotate_orthogroups \
    --orthogroups_tsv /path/to/N0_or_Orthogroups.tsv \
    --hog True \
    --fasta_dir /path/to/fastas \
    --file_endings faa \
    --out outfile.tsv \
    --simple True

If --simple=False resulting tsv looks like this:

HOG Best Gene Name Gene Name Occurrences
N0.HOG0000000 amino acid ABC transporter {JSON}
N0.HOG0000001 IS30 family transposase {JSON}
N0.HOG0000002 IS5/IS1182 family transposase {JSON}

The JSON is a dictionary with key='gene name' -> value=occurrence, for example:

{
  'Integrase core domain protein': 47,
  'hypothetical protein': 15,
  'IS30 family transposase': 126
}

If --simple=True resulting tsv looks like this (no header):

N0.HOG0000000 amino acid ABC transporter
N0.HOG0000001 IS30 family transposase
N0.HOG0000002 IS5/IS1182 family transposase

Usage as python class

# load class
from orthofinder_tools import OrthogroupToGeneName

PATH_TO_ORTHOFINDER_FASTAS = '/path/to/OrthoFinder/fastas'
CURRENT_FOLDER = 'Results_Mon00'

otg = OrthogroupToGeneName(
    fasta_dir=PATH_TO_ORTHOFINDER_FASTAS,
    file_endings='faa',
)
otg.load_hog(
    hog_tsv=F'{PATH_TO_ORTHOFINDER_FASTAS}/OrthoFinder/{CURRENT_FOLDER}/Phylogenetic_Hierarchical_Orthogroups/N0.tsv'
)

otg.majority_dict will be a python dict with key='orthogroup' -> value='best name', for example:

{
  'N0.HOG0000000': 'amino acid ABC transporter',
  'N0.HOG0000001': 'IS30 family transposase',
  'N0.HOG0000002': 'IS5/IS1182 family transposase',
}

otg.save_majority_df(outfile='path/to/outfile.tsv) writes the following file:

HOG Best Gene Name  Gene Name Occurrences
N0.HOG0000000   amino acid ABC transporter Counter({'amino acid ABC transporter': 43})
...

otg.save_orthogroup_to_gene_ids(outfile='path/to/outfile.tsv) writes the following file (no header):

N0.HOG0000000   gene_1  gene_2
N0.HOG0000001   gene_3  gene_4  gene_5
...

otg.save_orthogroup_to_gene_ids(outfile='path/to/outfile.tsv) writes the following file (no header):

N0.HOG0000000	amino acid ABC transporter ATP-binding protein
N0.HOG0000001	ATP-binding cassette domain-containing protein
...

orthofinder_plots

Disclaimer: This script is a port of roary_plots by Marco Galardini (marco@ebi.ac.uk).

# Command line usage:
orthofinder_plots --help
orthofinder_plots --tree data/SpeciesTree_rooted.txt --orthogroups_tsv data/Orthogroups.tsv --out output

Three files will be created:




Usage as python class

# load class
from orthofinder_tools import create_plots

create_plots(
    tree='/path/to/SpeciesTree_rooted.txt',
    orthogroups_tsv='/path/to/Orthogroups.tsv',
    format='svg',
    no_labels=False,
    out='/path/to/output/folder'
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orthofinder-tools-0.0.2.tar.gz (9.4 kB view details)

Uploaded Source

File details

Details for the file orthofinder-tools-0.0.2.tar.gz.

File metadata

  • Download URL: orthofinder-tools-0.0.2.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for orthofinder-tools-0.0.2.tar.gz
Algorithm Hash digest
SHA256 f413dfad4792c47aee580f1c3d10129204c46ee1ce4a4b4d75de61665f354e06
MD5 73fa0af1b58a7be118a9246b14b8697d
BLAKE2b-256 9184a5771873da9dcddf39f1a6219234700107d0059380cd3f383578f0830581

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page