Skip to main content

A Python library that enhances biological query by expanding terms (cell types, tissues, etc..) to include subtypes and parts using ontologies, ensuring comprehensive data retrieval.

Project description

cxg-query-enhancer PyPI Downloads

The cellxgene-census library supports access to abitrary slices of the CELLxGENE corpus via filters that include cell type, tissue, developmental stage and disease.

If you use query cellxgene_census for "T cells in lung" you get 71,000 cells. This might look like a reasonable result, but it misses 630,000 cells annotated with terms for types of T-cell or parts of lung. When you filter for "macrophage," you don't automatically get "alveolar macrophage" or "Kupffer cell." Filter for "kidney" and you miss "renal cortex" and "nephron." The data is there, annotated with precise ontology terms, but simple queries can't reach it.

cxg-query-enhancer fixes this. Wrap your query in enhance() and the library automatically expands your query to include all subtypes and parts, using the Ubergraph knowledge graph built from biomedical ontologies.

Quick Example

from cxg_query_enhancer import enhance

# Your normal query—now enhanced
obs_value_filter = enhance(
    "cell_type in ['T cell'] and tissue in ['lung']",
    organism="homo_sapiens"
)
# Expands filter to include 76 T-cell type terms and 15 lung part terms used in annotation in the CxG corpus
# If used in a cellxgene_census query, returns ~700,000 cells instead of ~71,000

The enhance() function expands "T cell" to include all its subtypes (CD4+, CD8+, regulatory T cells, etc.) and "lung" to include its anatomical parts—then filters against terms actually present in CELLxGENE Census.

Complete Working Example

This example runs in under a minute and demonstrates the core value—subtypes you'd otherwise miss:

import cellxgene_census
from cxg_query_enhancer import enhance

with cellxgene_census.open_soma(census_version="latest") as census:
    adata = cellxgene_census.get_anndata(
        census=census,
        organism="Homo sapiens",
        var_value_filter="feature_id in ['ENSG00000161798', 'ENSG00000188229']",
        obs_value_filter=enhance(
            "sex == 'female' and cell_type in ['medium spiny neuron']",
            organism="Homo sapiens",
        ),
        obs_column_names=[
            "assay",
            "cell_type",
            "tissue",
            "tissue_general",
            "suspension_type",
            "disease",
        ],
    )

print(adata.obs)

Output: ~5,400 cells across three cell types—the parent term plus both subtypes:

assay cell_type tissue disease
10x 3' v3 indirect pathway medium spiny neuron caudate nucleus normal
10x 3' v3 direct pathway medium spiny neuron caudate nucleus normal
10x 3' v3 medium spiny neuron cerebral cortex normal

Without enhance(), a query for just "medium spiny neuron" misses the pathway-specific subtypes entirely.

What It Expands

Category Example Expands To Include
Cell types macrophage alveolar macrophage, Kupffer cell, microglial cell...
Tissues kidney renal cortex, nephron, kidney blood vessel...
Diseases diabetes mellitus type 1 diabetes, type 2 diabetes...
Dev stages adult 25-year-old, 40-year-old...

Supported ontologies:

Installation

pip install cxg-query-enhancer

Requires Python 3.10 or 3.11.

Usage

Basic: Wrap Your Existing Query

import cellxgene_census
from cxg_query_enhancer import enhance

with cellxgene_census.open_soma(census_version="latest") as census:
    adata = cellxgene_census.get_anndata(
        census=census,
        organism="Homo sapiens",
        obs_value_filter=enhance(
            "sex == 'female' and cell_type in ['medium spiny neuron']",
            organism="Homo sapiens",
        ),
    )

Flexible Input

The library accepts terms as:

  • Labels: 'neuron', 'kidney'
  • Ontology IDs: 'CL:0000540', 'UBERON:0002113'
  • Synonyms

Control Census Filtering

By default, expanded terms are filtered against the latest CELLxGENE Census (only terms actually in the data are included).

# Use a specific Census version for reproducibility
enhance(query, organism="homo_sapiens", census_version="2024-12-01")

# Disable Census filtering (pure ontology expansion)
enhance(query, census_version=None)

Multiple Categories

query = """
    cell_type in ['medium spiny neuron']
    and tissue in ['kidney']
    and disease in ['diabetes mellitus']
"""

enhanced = enhance(query, organism="homo_sapiens")

# Expands all three categories simultaneously

A Note on Organism and Development Stage

The organism parameter is critical when querying developmental stages for non-human data.

Why: Human and mouse use different stage ontologies (HsapDv vs MmusDv). A query for "adult" in human expands to "25-year-old human stage," "40-year-old human stage," etc. The same query in mouse expands to "8-week-old stage," "6-month-old stage," and so on.

The default: If you don't specify organism, the library assumes homo_sapiens and logs a warning when expanding developmental stages. This prevents silent mismatches—but if you're querying mouse data, you'll get the wrong stages unless you specify:

# Critical for non-human developmental stage queries
enhance(
    "development_stage in ['adult'] and cell_type in ['neuron']",
    organism="mus_musculus"  # Without this, you get human stages
)

For cell types and tissues, the organism parameter is used for Census filtering (ensuring expanded terms exist in your target species), but the ontology expansion itself is species-agnostic.

Function Reference

enhance(query_filter, categories=None, organism=None, census_version="latest")

Parameter Type Description
query_filter str Your original query string
categories list or None Categories to expand. Default: auto-detect from query. Options: "cell_type", "tissue", "tissue_general", "disease", "development_stage"
organism str "homo_sapiens" or "mus_musculus". Required for Census filtering.
census_version str or None Census version for filtering. Default: "latest". Set to None to disable.

Returns: Enhanced query string with expanded terms.

How It Works

  1. Parse: Identifies terms in your query that can be expanded
  2. Expand: Queries Ubergraph for all subclasses and part-of relationships
  3. Filter: Keeps only terms present in CELLxGENE Census (unless disabled)
  4. Rewrite: Returns your query with expanded term lists

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cxg_query_enhancer-0.2.1.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cxg_query_enhancer-0.2.1-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file cxg_query_enhancer-0.2.1.tar.gz.

File metadata

  • Download URL: cxg_query_enhancer-0.2.1.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cxg_query_enhancer-0.2.1.tar.gz
Algorithm Hash digest
SHA256 34fb692de7f05aceefd4c8a33e03b552ca89a92e65063cca7e37012decf0c4be
MD5 4860bd900035f714c450c67df69542f3
BLAKE2b-256 bd52c5def522b6a6c9fdf92d35ade26776c607f117b6bf3d38e6643b6b7a3e35

See more details on using hashes here.

File details

Details for the file cxg_query_enhancer-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for cxg_query_enhancer-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ba3155bc090cb5ddb4058f14a45d6462ec6abb02df54d6cd246caec63fef16d2
MD5 0e8c71fed1f247d35de1434a088307fa
BLAKE2b-256 896b2002a577c2073b6a5b8656ff7427a67b9be908976c3267d7c20ce2f351ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page