Skip to main content

Access OrgDB annotations

Project description

PyPI-Server Unit tests

orgdb

OrgDb provides an interface to access and query Organism Database (OrgDb) SQLite files in Python. It mirrors functionality from the R/Bioconductor AnnotationDbi package, enabling seamless integration of organism-wide gene annotation into Python workflows.

[!NOTE]

If you are looking to access TxDb databases, check out the txdb package.

Install

To get started, install the package from PyPI

pip install orgdb

Usage

Using OrgDbRegistry

The registry download the AnnotationHub's metadata sqlite file and filters for all available OrgDb databases. You can fetch standard organism databases via the registry (backed by AnnotationHub).

from orgdb import OrgDbRegistry

# Initialize registry and list available organisms
registry = OrgDbRegistry()
available = registry.list_orgdb()
print(available[:5])
# ["org.'Caballeronia_concitans'.eg", "org.'Chlorella_vulgaris'_C-169.eg", ...]

# Load the database for Homo sapiens (downloads and caches automatically)
db = registry.load_db("org.Hs.eg.db")
print(db.species)
# 'Homo sapiens'

Inspecting metadata

Explore the available columns and key types in the database.

# List available columns (and keytypes)
cols = db.columns()
print(cols[:5])
# ['ENTREZID', 'PFAM', 'IPI', 'PROSITE', 'ACCNUM']

# Check available keys for a specific keytype
entrez_ids = db.keys("ENTREZID")
print(entrez_ids[:5])
# ['1', '2', '9', '10', '11']

Querying Annotations (using select)

The select method retrieves data as a BiocFrame. It automatically handles complex joins across tables.

# Retrieve Gene Symbols and Gene Names for a list of Entrez IDs
res = db.select(
    keys=["1", "10"],
    columns=["SYMBOL", "GENENAME"],
    keytype="ENTREZID"
)

print(res)
# BiocFrame with 2 rows and 3 columns
                   GENENAME ENTREZID SYMBOL
                     <list>   <list> <list>
# [0] alpha-1-B glycoprotein        1   A1BG
# [1]  N-acetyltransferase 2       10   NAT2

[!NOTE]

If you request "GO" columns, the result will automatically expand to include "EVIDENCE" and "ONTOLOGY" columns, matching Bioconductor behavior.

go_res = db.select(
    keys="1",
    columns=["GO"],
    keytype="ENTREZID"
)
# BiocFrame with 12 rows and 4 columns
       ONTOLOGY ENTREZID         GO EVIDENCE
         <list>   <list>     <list>   <list>
#  [0]       BP        1 GO:0002764      IBA
#  [1]       CC        1 GO:0005576      HDA
#  [2]       CC        1 GO:0005576      IDA
#           ...      ...        ...      ...
#  [9]       CC        1 GO:0070062      HDA
# [10]       CC        1 GO:0072562      HDA
# [11]       CC        1 GO:1904813      TAS

Accessing Genomic Ranges

Extract gene coordinates as a GenomicRanges object (requires the chromosome_locations table in the OrgDb database).

gr = db.genes()
print(gr)
# GenomicRanges with 52232 ranges and 1 metadata column
#           seqnames                ranges          strand     gene_id
#              <str>             <IRanges> <ndarray[int8]>      <list>
#         1       19 -58345182 - -58336872               * |         1
#         2       12   -9067707 - -9019495               * |         2
#         2       12   -9067707 - -9019185               * |         2
#                ...                   ...             ... |       ...
# 116804918       11 121024101 - 121191490               * | 116804918
# 117779438        1   20154213 - 20160568               * | 117779438
# 118142757        6   42155405 - 42180056               * | 118142757
# ------
# seqinfo(369 sequences): 1 10 10_GL383545v1_alt ... X_KI270913v1_alt Y Y_KZ208924v1_fix

Note

This project has been set up using BiocSetup and PyScaffold.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orgdb-0.0.1.tar.gz (31.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orgdb-0.0.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file orgdb-0.0.1.tar.gz.

File metadata

  • Download URL: orgdb-0.0.1.tar.gz
  • Upload date:
  • Size: 31.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for orgdb-0.0.1.tar.gz
Algorithm Hash digest
SHA256 8ca15c58fc5e6ecb5efbbee6003a3cce1e9bda4dc228143e19cf1a2fa3ae119f
MD5 a31bafe4730735bbc5d6a9c267632d80
BLAKE2b-256 a02de0989c8d0d08cef369b50f689bf778002e7c3db2c2a4aa57c5a781ad5332

See more details on using hashes here.

Provenance

The following attestation bundles were made for orgdb-0.0.1.tar.gz:

Publisher: publish-pypi.yml on BiocPy/orgdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file orgdb-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: orgdb-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for orgdb-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 59bbef942dda1537ff7c26c2d4dfe82ef03fa32800eb6cd4d0fcc736e370fa71
MD5 2571b68f73a44f3e0eee204caefbda9c
BLAKE2b-256 a45d91aad367e01f42af511c072ccfe6ee35fbff646a34f5c554b77f1a790c23

See more details on using hashes here.

Provenance

The following attestation bundles were made for orgdb-0.0.1-py3-none-any.whl:

Publisher: publish-pypi.yml on BiocPy/orgdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page