A Python library that enhances biological query by expanding terms (cell types, tissues, etc..) to include subtypes and parts using ontologies, ensuring comprehensive data retrieval.

These details have not been verified by PyPI

Project links

Project description

cxg-query-enhancer

The cellxgene-census library supports access to abitrary slices of the CELLxGENE corpus via filters that include cell type, tissue, developmental stage and disease.

If you use query cellxgene_census for "T cells in lung" you get 71,000 cells. This might look like a reasonable result, but it misses 630,000 cells annotated with terms for types of T-cell or parts of lung. When you filter for "macrophage," you don't automatically get "alveolar macrophage" or "Kupffer cell." Filter for "kidney" and you miss "renal cortex" and "nephron." The data is there, annotated with precise ontology terms, but simple queries can't reach it.

cxg-query-enhancer fixes this. Wrap your query in enhance() and the library automatically expands your query to include all subtypes and parts, using the Ubergraph knowledge graph built from biomedical ontologies.

Quick Example

from cxg_query_enhancer import enhance

# Your normal query—now enhanced
obs_value_filter = enhance(
    "cell_type in ['T cell'] and tissue in ['lung']",
    organism="homo_sapiens"
)
# Expands filter to include 76 T-cell type terms and 15 lung part terms used in annotation in the CxG corpus
# If used in a cellxgene_census query, returns ~700,000 cells instead of ~71,000

The enhance() function expands "T cell" to include all its subtypes (CD4+, CD8+, regulatory T cells, etc.) and "lung" to include its anatomical parts—then filters against terms actually present in CELLxGENE Census.

Complete Working Example

This example runs in under a minute and demonstrates the core value—subtypes you'd otherwise miss:

import cellxgene_census
from cxg_query_enhancer import enhance

with cellxgene_census.open_soma(census_version="latest") as census:
    adata = cellxgene_census.get_anndata(
        census=census,
        organism="Homo sapiens",
        var_value_filter="feature_id in ['ENSG00000161798', 'ENSG00000188229']",
        obs_value_filter=enhance(
            "sex == 'female' and cell_type in ['medium spiny neuron']",
            organism="Homo sapiens",
        ),
        obs_column_names=[
            "assay",
            "cell_type",
            "tissue",
            "tissue_general",
            "suspension_type",
            "disease",
        ],
    )

print(adata.obs)

Output: ~5,400 cells across three cell types—the parent term plus both subtypes:

assay	cell_type	tissue	disease
10x 3' v3	indirect pathway medium spiny neuron	caudate nucleus	normal
10x 3' v3	direct pathway medium spiny neuron	caudate nucleus	normal
10x 3' v3	medium spiny neuron	cerebral cortex	normal

Without enhance(), a query for just "medium spiny neuron" misses the pathway-specific subtypes entirely.

What It Expands

Category	Example	Expands To Include
Cell types	`macrophage`	alveolar macrophage, Kupffer cell, microglial cell...
Tissues	`kidney`	renal cortex, nephron, kidney blood vessel...
Diseases	`diabetes mellitus`	type 1 diabetes, type 2 diabetes...
Dev stages	`adult`	25-year-old, 40-year-old...

Supported ontologies:

Cell Ontology (CL) for cell types
Uberon for anatomy
MONDO for diseases
HsapDv / MmusDv for developmental stages

Installation

pip install cxg-query-enhancer

Requires Python 3.10 or 3.11.

Usage

Basic: Wrap Your Existing Query

import cellxgene_census
from cxg_query_enhancer import enhance

with cellxgene_census.open_soma(census_version="latest") as census:
    adata = cellxgene_census.get_anndata(
        census=census,
        organism="Homo sapiens",
        obs_value_filter=enhance(
            "sex == 'female' and cell_type in ['medium spiny neuron']",
            organism="Homo sapiens",
        ),
    )

Flexible Input

The library accepts terms as:

Labels: 'neuron', 'kidney'
Ontology IDs: 'CL:0000540', 'UBERON:0002113'
Synonyms

Control Census Filtering

By default, expanded terms are filtered against the latest CELLxGENE Census (only terms actually in the data are included).

# Use a specific Census version for reproducibility
enhance(query, organism="homo_sapiens", census_version="2024-12-01")

# Disable Census filtering (pure ontology expansion)
enhance(query, census_version=None)

Multiple Categories

query = """
    cell_type in ['medium spiny neuron']
    and tissue in ['kidney']
    and disease in ['diabetes mellitus']
"""

enhanced = enhance(query, organism="homo_sapiens")

# Expands all three categories simultaneously

A Note on Organism and Development Stage

The organism parameter is critical when querying developmental stages for non-human data.

Why: Human and mouse use different stage ontologies (HsapDv vs MmusDv). A query for "adult" in human expands to "25-year-old human stage," "40-year-old human stage," etc. The same query in mouse expands to "8-week-old stage," "6-month-old stage," and so on.

The default: If you don't specify organism, the library assumes homo_sapiens and logs a warning when expanding developmental stages. This prevents silent mismatches—but if you're querying mouse data, you'll get the wrong stages unless you specify:

# Critical for non-human developmental stage queries
enhance(
    "development_stage in ['adult'] and cell_type in ['neuron']",
    organism="mus_musculus"  # Without this, you get human stages
)

For cell types and tissues, the organism parameter is used for Census filtering (ensuring expanded terms exist in your target species), but the ontology expansion itself is species-agnostic.

Function Reference

`enhance(query_filter, categories=None, organism=None, census_version="latest")`

Parameter	Type	Description
`query_filter`	str	Your original query string
`categories`	list or None	Categories to expand. Default: auto-detect from query. Options: `"cell_type"`, `"tissue"`, `"tissue_general"`, `"disease"`, `"development_stage"`
`organism`	str	`"homo_sapiens"` or `"mus_musculus"`. Required for Census filtering.
`census_version`	str or None	Census version for filtering. Default: `"latest"`. Set to `None` to disable.

Returns: Enhanced query string with expanded terms.

How It Works

Parse: Identifies terms in your query that can be expanded
Expand: Queries Ubergraph for all subclasses and part-of relationships
Filter: Keeps only terms present in CELLxGENE Census (unless disabled)
Rewrite: Returns your query with expanded term lists

Acknowledgments

Ubergraph for the ontology knowledge graph
CellXGene Census for single-cell reference data
Built by the Cellular Semantics team at the Wellcome Sanger Institute

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.3

Feb 10, 2026

This version

0.2.1

Feb 9, 2026

0.2.0

Jun 5, 2025

0.1.0

May 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cxg_query_enhancer-0.2.1.tar.gz (14.1 kB view details)

Uploaded Feb 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cxg_query_enhancer-0.2.1-py3-none-any.whl (13.0 kB view details)

Uploaded Feb 9, 2026 Python 3

File details

Details for the file cxg_query_enhancer-0.2.1.tar.gz.

File metadata

Download URL: cxg_query_enhancer-0.2.1.tar.gz
Upload date: Feb 9, 2026
Size: 14.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cxg_query_enhancer-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`34fb692de7f05aceefd4c8a33e03b552ca89a92e65063cca7e37012decf0c4be`
MD5	`4860bd900035f714c450c67df69542f3`
BLAKE2b-256	`bd52c5def522b6a6c9fdf92d35ade26776c607f117b6bf3d38e6643b6b7a3e35`

See more details on using hashes here.

File details

Details for the file cxg_query_enhancer-0.2.1-py3-none-any.whl.

File metadata

Download URL: cxg_query_enhancer-0.2.1-py3-none-any.whl
Upload date: Feb 9, 2026
Size: 13.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cxg_query_enhancer-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba3155bc090cb5ddb4058f14a45d6462ec6abb02df54d6cd246caec63fef16d2`
MD5	`0e8c71fed1f247d35de1434a088307fa`
BLAKE2b-256	`896b2002a577c2073b6a5b8656ff7427a67b9be908976c3267d7c20ce2f351ca`

See more details on using hashes here.

cxg-query-enhancer 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cxg-query-enhancer

Quick Example

Complete Working Example

What It Expands

Installation

Usage

Basic: Wrap Your Existing Query

Flexible Input

Control Census Filtering

Multiple Categories

A Note on Organism and Development Stage

Function Reference

enhance(query_filter, categories=None, organism=None, census_version="latest")

How It Works

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`enhance(query_filter, categories=None, organism=None, census_version="latest")`