Skip to main content

A Python library that enhances biological query by expanding terms (cell types, tissues, etc..) to include subtypes and parts using ontologies, ensuring comprehensive data retrieval.

Project description

cxg-query-enhancer PyPI Downloads

The cellxgene-census library supports access to abitrary slices of the CELLxGENE corpus via filters that include cell type, tissue, developmental stage and disease.

If you use query cellxgene_census for "T cells in lung" you get 71,000 cells. This might look like a reasonable result, but it misses 630,000 cells annotated with terms for types of T-cell or parts of lung. When you filter for "macrophage," you don't automatically get "alveolar macrophage" or "Kupffer cell." Filter for "kidney" and you miss "renal cortex" and "nephron." The data is there, annotated with precise ontology terms, but simple queries can't reach it.

cxg-query-enhancer fixes this. Wrap your query in enhance() and the library automatically expands your query to include all subtypes and parts, using the Ubergraph knowledge graph built from biomedical ontologies.

Quick Example

from cxg_query_enhancer import enhance

# Your normal query—now enhanced
obs_value_filter = enhance(
    "cell_type in ['T cell'] and tissue in ['lung']",
    organism="homo_sapiens"
)
# Expands filter to include 76 T-cell type terms and 15 lung part terms used in annotation in the CxG corpus
# If used in a cellxgene_census query, returns ~700,000 cells instead of ~71,000

The enhance() function expands "T cell" to include all its subtypes (CD4+, CD8+, regulatory T cells, etc.) and "lung" to include its anatomical parts—then filters against terms actually present in CELLxGENE Census.

Complete Working Example

This example runs in under a minute and demonstrates the core value—subtypes you'd otherwise miss:

import cellxgene_census
from cxg_query_enhancer import enhance

with cellxgene_census.open_soma(census_version="latest") as census:
    adata = cellxgene_census.get_anndata(
        census=census,
        organism="Homo sapiens",
        var_value_filter="feature_id in ['ENSG00000161798', 'ENSG00000188229']",
        obs_value_filter=enhance(
            "sex == 'female' and cell_type in ['medium spiny neuron']",
            organism="Homo sapiens",
        ),
        obs_column_names=[
            "assay",
            "cell_type",
            "tissue",
            "tissue_general",
            "suspension_type",
            "disease",
        ],
    )

print(adata.obs)

Output: ~5,400 cells across three cell types—the parent term plus both subtypes:

assay cell_type tissue disease
10x 3' v3 indirect pathway medium spiny neuron caudate nucleus normal
10x 3' v3 direct pathway medium spiny neuron caudate nucleus normal
10x 3' v3 medium spiny neuron cerebral cortex normal

Without enhance(), a query for just "medium spiny neuron" misses the pathway-specific subtypes entirely.

What It Expands

Category Example Expands To Include
Cell types macrophage alveolar macrophage, Kupffer cell, microglial cell...
Tissues kidney renal cortex, nephron, kidney blood vessel...
Diseases diabetes mellitus type 1 diabetes, type 2 diabetes...
Dev stages adult 25-year-old, 40-year-old...

Supported ontologies:

Installation

pip install cxg-query-enhancer

Requires Python 3.10 or 3.11.

Usage

Basic: Wrap Your Existing Query

import cellxgene_census
from cxg_query_enhancer import enhance

with cellxgene_census.open_soma(census_version="latest") as census:
    adata = cellxgene_census.get_anndata(
        census=census,
        organism="Homo sapiens",
        obs_value_filter=enhance(
            "sex == 'female' and cell_type in ['medium spiny neuron']",
            organism="Homo sapiens",
        ),
    )

Flexible Input

The library accepts terms as:

  • Labels: 'neuron', 'kidney'
  • Ontology IDs: 'CL:0000540', 'UBERON:0002113'
  • Synonyms

Control Census Filtering

By default, expanded terms are filtered against the latest CELLxGENE Census (only terms actually in the data are included).

# Use a specific Census version for reproducibility
enhance(query, organism="homo_sapiens", census_version="2024-12-01")

# Disable Census filtering (pure ontology expansion)
enhance(query, census_version=None)

Multiple Categories

query = """
    cell_type in ['medium spiny neuron']
    and tissue in ['kidney']
    and disease in ['diabetes mellitus']
"""

enhanced = enhance(query, organism="homo_sapiens")

# Expands all three categories simultaneously

A Note on Organism and Development Stage

The organism parameter is critical when querying developmental stages for non-human data.

Why: Human and mouse use different stage ontologies (HsapDv vs MmusDv). A query for "adult" in human expands to "25-year-old human stage," "40-year-old human stage," etc. The same query in mouse expands to "8-week-old stage," "6-month-old stage," and so on.

The default: If you don't specify organism, the library assumes homo_sapiens and logs a warning when expanding developmental stages. This prevents silent mismatches—but if you're querying mouse data, you'll get the wrong stages unless you specify:

# Critical for non-human developmental stage queries
enhance(
    "development_stage in ['adult'] and cell_type in ['neuron']",
    organism="mus_musculus"  # Without this, you get human stages
)

For cell types and tissues, the organism parameter is used for Census filtering (ensuring expanded terms exist in your target species), but the ontology expansion itself is species-agnostic.

Function Reference

enhance(query_filter, categories=None, organism=None, census_version="latest")

Parameter Type Description
query_filter str Your original query string
categories list or None Categories to expand. Default: auto-detect from query. Options: "cell_type", "tissue", "tissue_general", "disease", "development_stage"
organism str "homo_sapiens" or "mus_musculus". Required for Census filtering.
census_version str or None Census version for filtering. Default: "latest". Set to None to disable.

Returns: Enhanced query string with expanded terms.

How It Works

  1. Parse: Identifies terms in your query that can be expanded
  2. Expand: Queries Ubergraph for all subclasses and part-of relationships
  3. Filter: Keeps only terms present in CELLxGENE Census (unless disabled)
  4. Rewrite: Returns your query with expanded term lists

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cxg_query_enhancer-0.2.3.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cxg_query_enhancer-0.2.3-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file cxg_query_enhancer-0.2.3.tar.gz.

File metadata

  • Download URL: cxg_query_enhancer-0.2.3.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cxg_query_enhancer-0.2.3.tar.gz
Algorithm Hash digest
SHA256 e11d3ab3cc923e4a20339afd4f070901a3d1670367413dd0f821a7e310f4b584
MD5 1a5b99d7bc9329afa2411d8ad27d460b
BLAKE2b-256 1ede1f59dcde69389df5d0b16b99161ddcb52d9066f1eba805974d37bb52b085

See more details on using hashes here.

File details

Details for the file cxg_query_enhancer-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for cxg_query_enhancer-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0b346a620db0c815a2ca039d7168046097e3eee1646a488f1b060a3d98b8820c
MD5 b1d793115a777d73d9b5a6e7ce84c404
BLAKE2b-256 1fdd7df45fc6179f5225248a6d43a644f6133ea4087376d7582155b52af0ab5b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page