Yet another ontology crawler to perform ORA!!!

Project description

obogo: Yet another ontology crawler to perform ORA!!!

installation

pip install obogo

prerequistes

You will need to grab the latest Gene Ontology description in .obo format. Download it from the official GO site. This is a reasonable format, where each GO term is encoded as a paragraph of key:value lines:

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764]
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

Obvisously, is_a is the main relationship of interest to build the DAG of GO terms. But consider, replaced_by and alias_to properties are also handled by this package.

Build the GO DAG structure

Simply read from a flat .obo file

from obogo import create_tree_from_obo
obogo_tree = create_tree_from_obo('../data/go-basic.obo')

You can query a go term by a name or its GO identifier

obogo_tree.view_go_node('GO:1903507')
obogo_tree.view_go_node('biological process')

Build a protein collection

In order to set the background population of proteins for each GO term, you will need to build a collection of uniprot data containers. In obogo, these are called UniprotDatum and can be directly created from a uniprot proteome xml reference file (eg: E.coli K12). Supposed we downloaded the above mentioned E.Coli K12 proteome xml file named uniprotkb_proteome_UP000000625.xml, the collection of uniprot containers can be buildt this way:

from uniprot_redis.store.mockup import UniprotStoreDummy
from uniprot_redis.wrapper import Collector
#populate store
store = UniprotStoreDummy()
load_data = store.load_uniprot_xml(file="path/to/uniprotkb_proteome_UP000000625.xml")
store.save_collection('ecoli_K12', load_data)
#retrieve collection
my_collection = Collector(store, 'ecoli_K12')

This collection is iterable, slice-able or get-able via uniprot Accession numbers.

print(my_collection['P19636'])
for uniprot_datum in my_collection:
    print(uniprot_datum.id, uniprot_datum.go)

Assign whole proteome to the GO structure

Each protein of the collection now has to be attached to the GO terms that are described in its UniprotDatum go field (see above).

obogo_tree.load_proteins('background', my_collection)

The obogo_tree represents each GO term as a straight newtworkx node. The load_proteins call will set the for each node the value of their 'background' key to a list of UniprotDatum.

In most ORA analysis the population of proteins attached to a given node is the union of the proteins attached to its descendant ("GO annotation goes up": "any specific GO term implicitly carries the meaning of a less specific"). Hence, an additional operation is required to propagate protein populations up the tree.

obogo_tree.percolate(percol_type="background")

This currently takes around 2mn for E.Coli proteome.

NB: At this stage, serializing the obogo_tree could be handy

import pickle
pik_fpath = "obogo_ecoliK12.pik"
with open(pik_fpath, "wb") as fp:
    pickle.dump(obogo_tree, fp)

Load the experimental protein set

For this tutorial, we will create a dummy collection of experimental proteins based on a slice of 1200 protein from the proteome and load it into obogo_tree. Note that this time, it is loaded using the 'measured' argument. Then, we also propagate this additional protein population up the tree.

obo_tree.load_proteins('measured', my_collection[1000:2200])
obo_tree.percolate(percol_type='measured')

Define sample protein set

We now define a subset of measured proteins as "of interest" (aka: over-abundant).

sample = my_collection[1100:1180]

Compute ORA of the GO terms within the sample

For a particular GO term

from obogo.statistics import compute_node_ora
print( compute_node_ora(obo_tree, sample, 'GO:0006811') )
print( compute_node_ora(obo_tree, sample, 'metal ion transport') )
print( compute_node_ora(obo_tree, sample, 'GO:0006811', norm='measured') )

The sample parameter can also be a straight Uniprot AC iterator (eg: ['P02930', 'P03819', 'P0A910']). The `norm`` parameter controls the reference population for the Fisher statistic:

'background' : the whole proteome (default)
'measured' : the proteins of the experiment

The returned value is a tuple of the form:

(GO_identfier, GO_name, total_sample protein carrying the GO term, Fisher test log_odd, Fisher test pvalue, contingency table)

A similar operation can be applied to the entiere tree, where a generator of score tuples will be returned

from obogo.statistics import score_ora_tree
for go_score in score_ora_tree(obo_tree, sample):
    print(go_score)

Project details

Release history Release notifications | RSS feed

0.4.0

Dec 16, 2025

0.3.4

Dec 18, 2024

0.3.3

Dec 13, 2024

This version

0.3.2

Dec 13, 2024

0.3.1

Dec 13, 2024

0.3.0

Jun 26, 2024

0.2.7

Nov 3, 2023

0.2.6

Nov 3, 2023

0.2.5

Nov 2, 2023

0.2.4

Nov 1, 2023

0.2.3

Nov 1, 2023

0.2.2

Nov 1, 2023

0.2.1

Nov 1, 2023

0.2.0

Nov 1, 2023

0.1.1

Oct 31, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

obogo-0.3.2.tar.gz (74.2 kB view details)

Uploaded Dec 13, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

obogo-0.3.2-py3-none-any.whl (23.7 kB view details)

Uploaded Dec 13, 2024 Python 3

File details

Details for the file obogo-0.3.2.tar.gz.

File metadata

Download URL: obogo-0.3.2.tar.gz
Upload date: Dec 13, 2024
Size: 74.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.17

File hashes

Hashes for obogo-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`ef40a7e3e0d1f7f10b8421ef17c69286cb68e90ac5d1869ae0516de4132bfae1`
MD5	`71c31acf8773f1cbc4070e2e3393ef8b`
BLAKE2b-256	`653d11ad526f630ff65c8c2f14dd36f63b025921caed7445e83348c49ad50d04`

See more details on using hashes here.

File details

Details for the file obogo-0.3.2-py3-none-any.whl.

File metadata

Download URL: obogo-0.3.2-py3-none-any.whl
Upload date: Dec 13, 2024
Size: 23.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.17

File hashes

Hashes for obogo-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3d84d2b488ab647ccc53776645cc7ea3b0f6d5ba2cd9635822c81a0040c4b801`
MD5	`18f0476d7cc7303f2796a028334a75e8`
BLAKE2b-256	`f37b1a2d347b13ca0b9307c01acfffdb445024d9654c2a0e724fa54672008710`

See more details on using hashes here.

obogo 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

obogo: Yet another ontology crawler to perform ORA!!!

installation

prerequistes

Build the GO DAG structure

Build a protein collection

Assign whole proteome to the GO structure

NB: At this stage, serializing the obogo_tree could be handy

Load the experimental protein set

Define sample protein set

Compute ORA of the GO terms within the sample

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes