extract orthomap from OrthoFinder output for query species
Project description
orthomap
orthologous maps - evolutionary age index
orthomap
is a python package to extract orthologous maps
(in other words the evolutionary age of a given orthologous group) from OrthoFinder or eggNOG results.
Orthomap results (gene ages per orthogroup) can be further used to calculate and visualize weighted expression data
(transcriptome evolutionary index) from scRNA sequencing objects.
Documentation
Online documentation can be found here.
Installing orthomap
More installation options can be found here.
orthomap installation using conda and pip
We recommend installing orthomap
in an independent conda environment to avoid dependent software conflicts.
Please make a new python environment for orthomap
and install dependent libraries in it.
If you do not have a working installation of Python 3.8 (or later), consider installing Anaconda or Miniconda.
To create and activate the environment run:
$ git clone https://github.com/kullrich/orthomap.git
$ cd orthomap
$ conda env create --file environment.yml
$ conda activate orthomap_env
Then to install orthomap
via PyPI:
$ pip install orthomap
Quick usage
Detailed tutorials how to use orthomap
can be found here.
Update/download local ncbi taxonomic database:
The following command downloads or updates your local copy of the
NCBI's taxonomy database (~300MB). The database is saved at
~/.etetoolkit/taxa.sqlite
.
>>> from orthomap import ncbitax
>>> ncbitax.update_ncbi()
Step 1 - Get query species taxonomic lineage information:
You can query a species lineage information based on its name or its
taxID. For example Danio rerio
with taxID 7955
:
>>> from orthomap import qlin
>>> qlin.get_qlin(q = 'Danio rerio')
>>> qlin.get_qlin(qt = '7955')
You can get the query species topology as a tree.
For example for Danio rerio
with taxID 7955
:
>>> from orthomap import qlin
>>> query_topology = qlin.get_lineage_topo(qt = '7955')
>>> query_topology.write()
Step 2 - Get query species orthomap from OrthoFinder results:
The following code extracts the orthomap
for Danio rerio
based on pre-calculated
OrthoFinder results and ensembl release-105:
OrthoFinder results (-S diamond_ultra_sens) using translated, longest-isoform coding sequences from ensembl release-105 have been archived and can be found here.
>>> from orthomap import datasets, of2orthomap
>>> datasets.ensembl105(datapath='.')
>>> query_orthomap = of2orthomap.get_orthomap(
... seqname='Danio_rerio.GRCz11.cds.longest',
... qt='7955',
... sl='ensembl_105_orthofinder_species_list.tsv',
... oc='ensembl_105_orthofinder_Orthogroups.GeneCount.tsv',
... og='ensembl_105_orthofinder_Orthogroups.tsv',
... out=None, quiet=False, continuity=True, overwrite=True)
>>> query_orthomap
Step 3 - Map OrthoFinder gene names and scRNA gene/transcript names:
The following code extracts the gene to transcript table for Danio rerio
:
GTF file obtained from here.
>>> from orthomap import datasets, gtf2t2g
>>> gtf_file = datasets.zebrafish_gtf(datapath='.')
>>> query_species_t2g = gtf2t2g.parse_gtf(
... gtf=gtf_file,
... g=True, b=True, p=True, v=True, s=True, q=True)
>>> query_species_t2g
Import now, the scRNA dataset of the query species.
example: Danio rerio - http://tome.gs.washington.edu (Qui et al. 2022)
AnnData
file can be found here.
>>> import scanpy as sc
>>> from orthomap import datasets, orthomap2tei
>>> # download zebrafish scRNA data here: https://doi.org/10.5281/zenodo.7243602
>>> # or download with datasets.qiu22_zebrafish(datapath='.')
>>> zebrafish_data = datasets.qiu22_zebrafish(datapath='.')
>>> zebrafish_data
>>> # check overlap of transcript table <gene_id> and scRNA data <var_names>
>>> orthomap2tei.geneset_overlap(zebrafish_data.var_names, query_species_t2g['gene_id'])
The replace_by
helper function can be used to add a new column to the orthomap
dataframe by matching e.g.
gene isoform names and their corresponding gene names.
>>> # convert orthomap transcript IDs into GeneIDs and add them to orthomap
>>> query_orthomap['geneID'] = orthomap2tei.replace_by(
... x_orig = query_orthomap['seqID'],
... xmatch = query_species_t2g['transcript_id_version'],
... xreplace = query_species_t2g['gene_id'])
>>> # check overlap of orthomap <geneID> and scRNA data
>>> orthomap2tei.geneset_overlap(zebrafish_data.var_names, query_orthomap['geneID'])
Step 4 - Get transcriptome evolutionary index (TEI) values and add them to scRNA dataset:
Since now the gene names correspond to each other in the orthomap
and the scRNA adata object,
one can calculate the transcriptome evolutionary index (TEI) and add them to the scRNA dataset (adata object).
>>> # add TEI values to existing adata object
>>> orthomap2tei.get_tei(adata=zebrafish_data,
... gene_id=query_orthomap['geneID'],
... gene_age=query_orthomap['PSnum'],
... keep='min',
... layer=None,
... add=True,
... obs_name='tei',
... boot=False,
... bt=10,
... normalize_total=False,
... log1p=False,
... target_sum=1e6)
Step 5 - Downstream analysis
Once the gene age data has been added to the scRNA dataset, one can e.g. plot the corresponding transcriptome evolutionary index (TEI) values by any given observation pre-defined in the scRNA dataset.
Boxplot TEI per stage:
>>>sc.pl.violin(adata=zebrafish_data,
... keys=['tei'],
... groupby='stage',
... rotation=90,
... palette='Paired',
... stripplot=False,
... inner='box')
orthomap via Command Line
orthomap
can also be used via the command line.
Command line documentation can be found here.
$ orthomap
usage: orthomap <sub-command>
orthomap
optional arguments:
-h, --help show this help message and exit
sub-commands:
{cds2aa,gtf2t2g,ncbitax,of2orthomap,plaza2orthomap,qlin}
sub-commands help
cds2aa translate CDS to AA and optional retain longest
isoform <cds2aa -h>
gtf2t2g extracts transcript to gene table from GTF <gtf2t2g
-h>
ncbitax update local ncbi taxonomy database <ncbitax -h>
of2orthomap extract orthomap from OrthoFinder output for query
species <orthomap -h>
plaza2orthomap extract orthomap from PLAZA gene family data for query
species <of2orthomap -h>
qlin get query lineage based on ncbi taxonomy <qlin -h>
To retrieve e.g. the lineage information for Danio rerio
run the following command:
$ orthomap qlin -q "Danio rerio"
Development Version
To work with the latest version on GitHub:
clone the repository and cd
into its root directory.
$ git clone kullrich/orthomap
$ cd orthomap
Install orthomap
into your current python environment:
$ pip install -e .
Testing orthomap
orthmap
has an extensive test suite which is run each time a new contribution
is made to the repository. To run the test suite locally run:
$ pytest tests
Contributing Code
If you would like to contribute to orthomap
, please file an issue so that one can establish a statement of need, avoid redundant work, and track progress on your contribution.
Before you do a pull request, you should always file an issue and make sure that someone from the orthomap
developer team agrees that it's a problem, and is happy with your basic proposal for fixing it.
Once an issue has been filed and we've identified how to best orient your
contribution with package development as a whole,
fork
the main repo, branch off a
feature
branch
from master
,
commit
and
push
your changes to your fork and submit a pull
request
for orthomap:master
.
By contributing to this project, you agree to abide by the Code of Conduct terms.
Bug reports
Please post troubles or questions on the GitHub repository issue tracker. Also, please look at the closed issue pages. This might give an answer to your question.
Inquiry for collabolation or discussion
Please send e-mail to us if you want a discussion with us.
Principal code developer: Kristian Ullrich
E-mail address can be found here.
Code of Conduct - Participation guidelines
This repository adheres to the Contributor Covenant code of conduct for in any interactions you have within this project. (see Code of Conduct)
See also the policy against sexualized discrimination, harassment and violence for the Max Planck Society Code-of-Conduct.
By contributing to this project, you agree to abide by its terms.
References
see references here
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file orthomap-0.0.1.tar.gz
.
File metadata
- Download URL: orthomap-0.0.1.tar.gz
- Upload date:
- Size: 56.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbb7d76f60c214b49f948a4dd52140d4cdd9e9cb0fb38bf602b5284f61ce7fa6 |
|
MD5 | 53ddbd7a526699bdecacc59a07d9bbcc |
|
BLAKE2b-256 | b66ea642140b29484dc675c8395bd8e56239c31c78cfd17b9a76fe4a8ba1dc03 |
File details
Details for the file orthomap-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: orthomap-0.0.1-py3-none-any.whl
- Upload date:
- Size: 56.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2771568503bad85e1cd83c3a481a12dd3e93698f6226c5925619c928809900f2 |
|
MD5 | 9e1c5fca84efbde70b86510a352763fc |
|
BLAKE2b-256 | 219a57f7ba8f7e9a09fc5561517eaf7bc253a53c5652788078fd76a83329ce5f |