Annotate orthogenes and create Roary-like plots
Project description
OrthoFinder Tools
Idea
- Calculate the most common gene name of each orthogroup by majority vote:
annotate_orthogroups - Create plots analogous to roary_plots:
orthofinder_plots
Setup
pip install orthofinder-tools
Usage
annotate_orthogroups
Prerequisites
Your FASTA sequences must have some description, e.g.:
>gnl|extdb|STRAIN-XY_000001 DNA-directed RNA polymerase subunit beta' [Pediococcus stilesii]
MIDVNKFESMQIGLASPDKIRMWSYGEVKKPETINYRTLKPEKDGLFDERIFGPTKDYECACGKYKRIRY
...
From this protein, DNA-directed RNA polymerase subunit beta' will be extracted.
Command line usage
annotate_orthogroups --help
annotate_orthogroups \
--orthogroups_tsv /path/to/N0_or_Orthogroups.tsv \
--hog True \
--fasta_dir /path/to/fastas \
--file_endings faa \
--out outfile.tsv \
--simple True \
--header True
If --simple=False resulting tsv looks like this:
| HOG | Best Gene Name | Gene Name Occurrences |
|---|---|---|
| N0.HOG0000000 | amino acid ABC transporter | {JSON} |
| N0.HOG0000001 | IS30 family transposase | {JSON} |
| N0.HOG0000002 | IS5/IS1182 family transposase | {JSON} |
The JSON is a dictionary with key='gene name' -> value=occurrence, for example:
{
'Integrase core domain protein': 47,
'hypothetical protein': 15,
'IS30 family transposase': 126
}
If --simple=True resulting tsv looks like this (no header):
| N0.HOG0000000 | amino acid ABC transporter |
| N0.HOG0000001 | IS30 family transposase |
| N0.HOG0000002 | IS5/IS1182 family transposase |
Usage as python class
# load class
from orthofinder_tools import OrthogroupToGeneName
PATH_TO_ORTHOFINDER_FASTAS = '/path/to/OrthoFinder/fastas'
CURRENT_FOLDER = 'Results_Mon00'
otg = OrthogroupToGeneName(
fasta_dir=PATH_TO_ORTHOFINDER_FASTAS,
file_endings='faa',
)
otg.load_hog(
hog_tsv=F'{PATH_TO_ORTHOFINDER_FASTAS}/OrthoFinder/{CURRENT_FOLDER}/Phylogenetic_Hierarchical_Orthogroups/N0.tsv'
)
otg.majority_dict will be a python dict with key='orthogroup' -> value='best name', for example:
{
'N0.HOG0000000': 'amino acid ABC transporter',
'N0.HOG0000001': 'IS30 family transposase',
'N0.HOG0000002': 'IS5/IS1182 family transposase',
}
otg.save_majority_df(outfile='path/to/outfile.tsv) writes the following file:
HOG Best Gene Name Gene Name Occurrences
N0.HOG0000000 amino acid ABC transporter Counter({'amino acid ABC transporter': 43})
...
otg.save_orthogroup_to_gene_ids(outfile='path/to/outfile.tsv) writes the following file (no header):
N0.HOG0000000 gene_1 gene_2
N0.HOG0000001 gene_3 gene_4 gene_5
...
otg.save_orthogroup_to_gene_ids(outfile='path/to/outfile.tsv) writes the following file (no header):
N0.HOG0000000 amino acid ABC transporter ATP-binding protein
N0.HOG0000001 ATP-binding cassette domain-containing protein
...
orthofinder_plots
Disclaimer: This script is a port of roary_plots by Marco Galardini (marco@ebi.ac.uk).
# Command line usage:
orthofinder_plots --help
orthofinder_plots --tree data/SpeciesTree_rooted.txt --orthogroups_tsv data/Orthogroups.tsv --out output
Three files will be created:
Usage as python class
# load class
from orthofinder_tools import create_plots
create_plots(
tree='/path/to/SpeciesTree_rooted.txt',
orthogroups_tsv='/path/to/Orthogroups.tsv',
format='svg',
no_labels=False,
out='/path/to/output/folder'
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file orthofinder_tools-0.0.4.tar.gz.
File metadata
- Download URL: orthofinder_tools-0.0.4.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bea76e475226cdfa823c13782bc331d3322c613cce30091f90800b68433a27a
|
|
| MD5 |
fd0fb9862cf9670099984f5340c107b6
|
|
| BLAKE2b-256 |
890e1b7697c2712c2ffca9c7862b1c182b1c07c2d949a7ccbd68b5b22f6eadbc
|
File details
Details for the file orthofinder_tools-0.0.4-py3-none-any.whl.
File metadata
- Download URL: orthofinder_tools-0.0.4-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a77a23c8f204765fed5d97039321a717f92673100abeb3478f777dde00429cd
|
|
| MD5 |
47642f2ac9b930612ef29ae9b61e8c9a
|
|
| BLAKE2b-256 |
bd5817041e9d319b58b1d43f484e8342fd32289c1ece8320516c176e3fd5098e
|