Skip to main content

Annotate orthogenes and create Roary-like plots

Project description

OrthoFinder Tools

Idea

  • Calculate the most common gene name of each orthogroup by majority vote: annotate_orthogroups
  • Create plots analogous to roary_plots: orthofinder_plots

Setup

pip install orthofinder-tools

Usage

annotate_orthogroups

Prerequisites

Your FASTA sequences must have some description, e.g.:

>gnl|extdb|STRAIN-XY_000001 DNA-directed RNA polymerase subunit beta' [Pediococcus stilesii]
MIDVNKFESMQIGLASPDKIRMWSYGEVKKPETINYRTLKPEKDGLFDERIFGPTKDYECACGKYKRIRY
...

From this protein, DNA-directed RNA polymerase subunit beta' will be extracted.

Command line usage

annotate_orthogroups --help

annotate_orthogroups \
    --orthogroups_tsv /path/to/N0_or_Orthogroups.tsv \
    --hog True \
    --fasta_dir /path/to/fastas \
    --file_endings faa \
    --out outfile.tsv \
    --simple True \
    --header True

If --simple=False resulting tsv looks like this:

HOG Best Gene Name Gene Name Occurrences
N0.HOG0000000 amino acid ABC transporter {JSON}
N0.HOG0000001 IS30 family transposase {JSON}
N0.HOG0000002 IS5/IS1182 family transposase {JSON}

The JSON is a dictionary with key='gene name' -> value=occurrence, for example:

{
  'Integrase core domain protein': 47,
  'hypothetical protein': 15,
  'IS30 family transposase': 126
}

If --simple=True resulting tsv looks like this (no header):

N0.HOG0000000 amino acid ABC transporter
N0.HOG0000001 IS30 family transposase
N0.HOG0000002 IS5/IS1182 family transposase

Usage as python class

# load class
from orthofinder_tools import OrthogroupToGeneName

PATH_TO_ORTHOFINDER_FASTAS = '/path/to/OrthoFinder/fastas'
CURRENT_FOLDER = 'Results_Mon00'

otg = OrthogroupToGeneName(
    fasta_dir=PATH_TO_ORTHOFINDER_FASTAS,
    file_endings='faa',
)
otg.load_hog(
    hog_tsv=F'{PATH_TO_ORTHOFINDER_FASTAS}/OrthoFinder/{CURRENT_FOLDER}/Phylogenetic_Hierarchical_Orthogroups/N0.tsv'
)

otg.majority_dict will be a python dict with key='orthogroup' -> value='best name', for example:

{
  'N0.HOG0000000': 'amino acid ABC transporter',
  'N0.HOG0000001': 'IS30 family transposase',
  'N0.HOG0000002': 'IS5/IS1182 family transposase',
}

otg.save_majority_df(outfile='path/to/outfile.tsv) writes the following file:

HOG Best Gene Name  Gene Name Occurrences
N0.HOG0000000   amino acid ABC transporter Counter({'amino acid ABC transporter': 43})
...

otg.save_orthogroup_to_gene_ids(outfile='path/to/outfile.tsv) writes the following file (no header):

N0.HOG0000000   gene_1  gene_2
N0.HOG0000001   gene_3  gene_4  gene_5
...

otg.save_orthogroup_to_gene_ids(outfile='path/to/outfile.tsv) writes the following file (no header):

N0.HOG0000000	amino acid ABC transporter ATP-binding protein
N0.HOG0000001	ATP-binding cassette domain-containing protein
...

orthofinder_plots

Disclaimer: This script is a port of roary_plots by Marco Galardini (marco@ebi.ac.uk).

# Command line usage:
orthofinder_plots --help
orthofinder_plots --tree data/SpeciesTree_rooted.txt --orthogroups_tsv data/Orthogroups.tsv --out output

Three files will be created:




Usage as python class

# load class
from orthofinder_tools import create_plots

create_plots(
    tree='/path/to/SpeciesTree_rooted.txt',
    orthogroups_tsv='/path/to/Orthogroups.tsv',
    format='svg',
    no_labels=False,
    out='/path/to/output/folder'
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orthofinder_tools-0.0.4.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orthofinder_tools-0.0.4-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file orthofinder_tools-0.0.4.tar.gz.

File metadata

  • Download URL: orthofinder_tools-0.0.4.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for orthofinder_tools-0.0.4.tar.gz
Algorithm Hash digest
SHA256 0bea76e475226cdfa823c13782bc331d3322c613cce30091f90800b68433a27a
MD5 fd0fb9862cf9670099984f5340c107b6
BLAKE2b-256 890e1b7697c2712c2ffca9c7862b1c182b1c07c2d949a7ccbd68b5b22f6eadbc

See more details on using hashes here.

File details

Details for the file orthofinder_tools-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: orthofinder_tools-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for orthofinder_tools-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 4a77a23c8f204765fed5d97039321a717f92673100abeb3478f777dde00429cd
MD5 47642f2ac9b930612ef29ae9b61e8c9a
BLAKE2b-256 bd5817041e9d319b58b1d43f484e8342fd32289c1ece8320516c176e3fd5098e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page