Access OrgDB annotations
Project description
orgdb
OrgDb provides an interface to access and query Organism Database (OrgDb) SQLite files in Python. It mirrors functionality from the R/Bioconductor AnnotationDbi package, enabling seamless integration of organism-wide gene annotation into Python workflows.
[!NOTE]
If you are looking to access TxDb databases, check out the txdb package.
Install
To get started, install the package from PyPI
pip install orgdb
Usage
Using OrgDbRegistry
The registry download the AnnotationHub's metadata sqlite file and filters for all available OrgDb databases. You can fetch standard organism databases via the registry (backed by AnnotationHub).
from orgdb import OrgDbRegistry
# Initialize registry and list available organisms
registry = OrgDbRegistry()
available = registry.list_orgdb()
print(available[:5])
# ["org.'Caballeronia_concitans'.eg", "org.'Chlorella_vulgaris'_C-169.eg", ...]
# Load the database for Homo sapiens (downloads and caches automatically)
db = registry.load_db("org.Hs.eg.db")
print(db.species)
# 'Homo sapiens'
Inspecting metadata
Explore the available columns and key types in the database.
# List available columns (and keytypes)
cols = db.columns()
print(cols[:5])
# ['ENTREZID', 'PFAM', 'IPI', 'PROSITE', 'ACCNUM']
# Check available keys for a specific keytype
entrez_ids = db.keys("ENTREZID")
print(entrez_ids[:5])
# ['1', '2', '9', '10', '11']
Querying Annotations (using select)
The select method retrieves data as a BiocFrame. It automatically handles complex joins across tables.
# Retrieve Gene Symbols and Gene Names for a list of Entrez IDs
res = db.select(
keys=["1", "10"],
columns=["SYMBOL", "GENENAME"],
keytype="ENTREZID"
)
print(res)
# BiocFrame with 2 rows and 3 columns
GENENAME ENTREZID SYMBOL
<list> <list> <list>
# [0] alpha-1-B glycoprotein 1 A1BG
# [1] N-acetyltransferase 2 10 NAT2
[!NOTE]
If you request "GO" columns, the result will automatically expand to include "EVIDENCE" and "ONTOLOGY" columns, matching Bioconductor behavior.
go_res = db.select(
keys="1",
columns=["GO"],
keytype="ENTREZID"
)
# BiocFrame with 12 rows and 4 columns
ONTOLOGY ENTREZID GO EVIDENCE
<list> <list> <list> <list>
# [0] BP 1 GO:0002764 IBA
# [1] CC 1 GO:0005576 HDA
# [2] CC 1 GO:0005576 IDA
# ... ... ... ...
# [9] CC 1 GO:0070062 HDA
# [10] CC 1 GO:0072562 HDA
# [11] CC 1 GO:1904813 TAS
Accessing Genomic Ranges
Extract gene coordinates as a GenomicRanges object (requires the chromosome_locations table in the OrgDb database).
gr = db.genes()
print(gr)
# GenomicRanges with 52232 ranges and 1 metadata column
# seqnames ranges strand gene_id
# <str> <IRanges> <ndarray[int8]> <list>
# 1 19 -58345182 - -58336872 * | 1
# 2 12 -9067707 - -9019495 * | 2
# 2 12 -9067707 - -9019185 * | 2
# ... ... ... | ...
# 116804918 11 121024101 - 121191490 * | 116804918
# 117779438 1 20154213 - 20160568 * | 117779438
# 118142757 6 42155405 - 42180056 * | 118142757
# ------
# seqinfo(369 sequences): 1 10 10_GL383545v1_alt ... X_KI270913v1_alt Y Y_KZ208924v1_fix
Note
This project has been set up using BiocSetup and PyScaffold.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file orgdb-0.0.1.tar.gz.
File metadata
- Download URL: orgdb-0.0.1.tar.gz
- Upload date:
- Size: 31.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ca15c58fc5e6ecb5efbbee6003a3cce1e9bda4dc228143e19cf1a2fa3ae119f
|
|
| MD5 |
a31bafe4730735bbc5d6a9c267632d80
|
|
| BLAKE2b-256 |
a02de0989c8d0d08cef369b50f689bf778002e7c3db2c2a4aa57c5a781ad5332
|
Provenance
The following attestation bundles were made for orgdb-0.0.1.tar.gz:
Publisher:
publish-pypi.yml on BiocPy/orgdb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
orgdb-0.0.1.tar.gz -
Subject digest:
8ca15c58fc5e6ecb5efbbee6003a3cce1e9bda4dc228143e19cf1a2fa3ae119f - Sigstore transparency entry: 798624036
- Sigstore integration time:
-
Permalink:
BiocPy/orgdb@6fb56d28d6b0c4f057f185bb01442575c4153b96 -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/BiocPy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@6fb56d28d6b0c4f057f185bb01442575c4153b96 -
Trigger Event:
push
-
Statement type:
File details
Details for the file orgdb-0.0.1-py3-none-any.whl.
File metadata
- Download URL: orgdb-0.0.1-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59bbef942dda1537ff7c26c2d4dfe82ef03fa32800eb6cd4d0fcc736e370fa71
|
|
| MD5 |
2571b68f73a44f3e0eee204caefbda9c
|
|
| BLAKE2b-256 |
a45d91aad367e01f42af511c072ccfe6ee35fbff646a34f5c554b77f1a790c23
|
Provenance
The following attestation bundles were made for orgdb-0.0.1-py3-none-any.whl:
Publisher:
publish-pypi.yml on BiocPy/orgdb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
orgdb-0.0.1-py3-none-any.whl -
Subject digest:
59bbef942dda1537ff7c26c2d4dfe82ef03fa32800eb6cd4d0fcc736e370fa71 - Sigstore transparency entry: 798624039
- Sigstore integration time:
-
Permalink:
BiocPy/orgdb@6fb56d28d6b0c4f057f185bb01442575c4153b96 -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/BiocPy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@6fb56d28d6b0c4f057f185bb01442575c4153b96 -
Trigger Event:
push
-
Statement type: