Map MapMyCells ABA taxonomy IDs to Cell Ontology (CL) terms

These details have not been verified by PyPI

Project links

Project description

MapMyCells2CL

Annotate MapMyCells output with Cell Ontology (CL) terms.

MapMyCells assigns cells to Allen Brain Atlas (ABA) taxonomy nodes (e.g. CS20230722_SUBC_053). This library maps those IDs to CL or Provisional Cell Ontology (PCL) terms and selects the most specific CL term using information-content (IC) ranking — ready for CELLxGENE schema compliance.

Quick start

pip install mapmycells2cl

# Annotate a MapMyCells CSV
mmc2cl annotate results.csv
# → results_annotated.csv

# Annotate an h5ad file (CxG-compliant obs columns)
mmc2cl annotate-h5ad results.csv cells.h5ad
# → cells_annotated.h5ad

Installation

pip install mapmycells2cl
# or
uv add mapmycells2cl

Installing the package places two equivalent commands in your PATH: mmc2cl (short form) and mapmycells2cl.

From source (development)

Requires uv.

git clone https://github.com/Cellular-Semantics/MapMyCells2CL.git
cd MapMyCells2CL
uv sync

# Run via uv (no venv activation needed)
uv run mmc2cl annotate results.csv

# Or activate the venv once and use the command directly
source .venv/bin/activate
mmc2cl annotate results.csv

CLI reference

`annotate`

Annotate a MapMyCells CSV or JSON output file with CL terms.

mmc2cl annotate INPUT_FILE [OPTIONS]

Option	Description
`-o, --output PATH`	Output file path. Defaults to `<input>_annotated.<ext>`
`--mapping PATH`	Path to a custom mapping JSON (default: bundled `mapping.json`)

Examples:

# Annotate CSV
mmc2cl annotate results.csv

# Annotate JSON
mmc2cl annotate results.json

# Specify output path
mmc2cl annotate results.csv -o /data/annotated.csv

CSV output columns — added after each {level}_label column using the CAP/HCA double-dash convention:

Column	Content	When
`{level}--cell_type_ontology_term_id`	Most specific CL CURIE (IC-ranked)	Always
`{level}--cell_type`	Label for the above	Always
`{level}--cell_type_pcl_ontology_term_id`	PCL exact match CURIE	PCL exact only
`{level}--cell_type_pcl`	PCL exact label	PCL exact only
`{level}--cell_type_cl_broad_ontology_term_ids`	All CL broad CURIEs, `\|`-joined	PCL exact only

Example (subclass level, PCL exact match):

subclass_label                              → CS20230722_SUBC_053
subclass--cell_type_ontology_term_id        → CL:4023017
subclass--cell_type                         → sst GABAergic cortical interneuron
subclass--cell_type_pcl_ontology_term_id    → PCL:0110113
subclass--cell_type_pcl                     → Sst Gaba sst GABAergic cortical interneuron (Mmus)
subclass--cell_type_cl_broad_ontology_term_ids → CL:4023017|CL:4023069

JSON output — cell_type_ontology_term_id, cell_type, and (for PCL) cell_type_pcl_ontology_term_id, cell_type_pcl, cell_type_cl_broad_ontology_term_ids are added to each level's assignment dict.

`annotate-h5ad`

Annotate an AnnData h5ad file with CL terms from a MapMyCells CSV. Adds CL columns directly to adata.obs, including the unprefixed cell_type_ontology_term_id / cell_type pair required by the CELLxGENE schema.

mmc2cl annotate-h5ad MMC_CSV H5AD_IN [OPTIONS]

Option	Description
`-o, --output PATH`	Output h5ad path. Defaults to `<input>_annotated.h5ad`
`--cxg-level TEXT`	Taxonomy level used for unprefixed CxG columns (default: `cluster`)
`--mapping PATH`	Path to a custom mapping JSON

Examples:

# Annotate h5ad — output written to cells_annotated.h5ad
mmc2cl annotate-h5ad results.csv cells.h5ad

# Use supertype level for the CxG cell_type columns
mmc2cl annotate-h5ad results.csv cells.h5ad --cxg-level supertype

obs columns added:

Column	Content
`cell_type_ontology_term_id`	IC-best CL CURIE from `--cxg-level` (CxG required)
`cell_type`	Label for the above (CxG required)
`{level}--cell_type_ontology_term_id`	Per-level IC-best CL CURIE
`{level}--cell_type`	Per-level label
`{level}--cell_type_pcl_ontology_term_id`	PCL CURIE (PCL exact only)
`{level}--cell_type_pcl`	PCL label (PCL exact only)
`{level}--cell_type_cl_broad_ontology_term_ids`	`\|`-joined broad CL CURIEs (PCL exact only)

Cells present in the h5ad but absent from the mmc CSV get empty strings.

`update-mappings`

Download the latest pcl.owl and regenerate the bundled mapping.json. Pass --cl-owl to include IC-ranked best-CL data (strongly recommended).

mmc2cl update-mappings [OPTIONS]

Option	Description
`--owl PATH`	Use a local `pcl.owl` instead of downloading
`--cl-owl PATH`	Path to base `cl.owl` for IC computation. Downloads if omitted
`--output PATH`	Output path (default: bundled `src/mapmycells2cl/data/mapping.json`)

Examples:

# Download latest pcl.owl and regenerate (no IC)
mmc2cl update-mappings

# With IC ranking (recommended) — requires cl.owl (~63 MB)
mmc2cl update-mappings --cl-owl cl.owl

# Use locally cached files
mmc2cl update-mappings --owl pcl.owl --cl-owl cl.owl

Note: cl.owl is large (~63 MB). The PURL http://purl.obolibrary.org/obo/cl.owl redirects to GitHub; download it manually if needed and pass the path with --cl-owl.

Python API

`CellTypeMapper`

from mapmycells2cl import CellTypeMapper

mapper = CellTypeMapper()              # bundled mapping
print(mapper.mapping_version)          # e.g. "2025-07-07"
print(mapper.has_ic)                   # True when mapping includes IC data

Single lookup

result = mapper.lookup("CS20230722_SUBC_313")

result.found            # True
result.exact_id         # "CL:4300353"
result.exact_label      # "Purkinje cell (Mmus)"
result.ontology         # "CL"
result.broad            # [] — already CL, no broad match needed
result.best_cl_id       # "CL:4300353" — IC-ranked most specific CL term
result.best_cl_label    # "Purkinje cell (Mmus)"
result.best_cl_ic       # IC score (higher = more specific)
result.mapping_version  # "2025-07-07"

result = mapper.lookup("CS20230722_SUBC_053")

result.exact_id     # "PCL:0110113"
result.ontology     # "PCL"
result.best_cl_id   # "CL:4023017"  — IC-ranked best CL broad match
result.broad        # [BroadMatch(id="CL:4023017", ...), BroadMatch(id="CL:4023069", ...)]

for b in result.broad:
    print(b.id, b.label, b.via)

result = mapper.lookup("CS20230722_UNKNOWN_999")
result.found        # False
result.best_cl_id   # ""

Batch lookup

results = mapper.lookup_many([
    "CS20230722_SUBC_313",
    "CS20230722_SUBC_053",
    "CS20230722_CLUS_0768",
])
# Returns List[MatchResult] in the same order

Annotator (programmatic use)

from pathlib import Path
from mapmycells2cl import CellTypeMapper
from mapmycells2cl.annotator import annotate_csv, annotate_json, annotate_h5ad

mapper = CellTypeMapper()

# CSV / JSON
annotate_csv(Path("results.csv"), Path("results_annotated.csv"), mapper)
annotate_json(Path("results.json"), Path("results_annotated.json"), mapper)

# h5ad — CxG-compliant obs columns
annotate_h5ad(
    Path("results.csv"),
    Path("cells.h5ad"),
    Path("cells_annotated.h5ad"),
    mapper,
    cxg_level="cluster",   # level used for unprefixed cell_type columns
)

How it works

Exact matches

Extracted from owl:equivalentClass axioms in pcl.owl:

CL/PCL_class ≡ CL_0000000 ∧ (RO_0015001 hasValue <ABA_individual>)

Every ABA taxonomy ID maps to either a CL term (direct Cell Ontology entry) or a PCL term (Provisional Cell Ontology — finer-grained types not yet promoted to CL).

Broad matches

For PCL exact matches, the library walks rdfs:subClassOf edges upward until CL terms are reached. Because the hierarchy is a DAG (not a tree), a single PCL term may yield multiple CL broad matches (polyhierarchy).

IC-ranked best CL term

When multiple CL broad matches exist, the most specific is selected using structure-based Information Content computed over the base CL hierarchy (no PCL):

IC(c) = -log2(|distinct leaf descendants of c| / |total CL leaves|)

Higher IC = more specific. This is pre-computed at update-mappings time and stored in mapping.json, so there is no runtime CL dependency.

Coverage (CCN20230722 taxonomy)

Level	→ CL	→ PCL	Total
CLAS (class)	3	24	27
SUBC (subclass)	15	230	245
SUPT (supertype)	32	983	1,015
CLUS (cluster)	80	5,234	5,314
Total	130	6,471	6,601

Data sources

pcl.owl — Provisional Cell Ontology; primary mapping source
cl.owl — Base Cell Ontology (no imports); used for IC computation

Both large OWL files are excluded from the repo. The bundled mapping.json is versioned with the PCL release date and includes all pre-computed IC scores.

Development

uv sync --dev

uv run mypy src/                          # type check
uv run ruff check --fix src/ tests/       # lint
uv run ruff format src/ tests/            # format

uv run pytest -m unit --cov              # unit tests (fast, no external deps)
uv run pytest -m integration             # integration tests (requires test_resources/)

CI runs mypy, ruff, and unit tests on every PR via GitHub Actions.

Known gaps

Basal Ganglia ABA mappings are absent from CL — fix planned for a future CL release.
oaklib integration deferred (not yet needed for current use cases).

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mapmycells2cl-0.1.0.tar.gz (169.4 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mapmycells2cl-0.1.0-py3-none-any.whl (183.3 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file mapmycells2cl-0.1.0.tar.gz.

File metadata

Download URL: mapmycells2cl-0.1.0.tar.gz
Upload date: Apr 13, 2026
Size: 169.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mapmycells2cl-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3a4fd8d753e856534ff48ad5423851e1cd38889fc3593683224703ce72bb4641`
MD5	`7351659b097406db631152911b6fb82f`
BLAKE2b-256	`263fecc91b44163838e75e49c58b6ce3833956446856051668d6b5ea8f589b3e`

See more details on using hashes here.

File details

Details for the file mapmycells2cl-0.1.0-py3-none-any.whl.

File metadata

Download URL: mapmycells2cl-0.1.0-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 183.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mapmycells2cl-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d371ff84ab89fd941e9cf5bd489cb34c018098703ea83808cd7ad7e9bb34070a`
MD5	`8f5df044c3268af9921053c497cf6518`
BLAKE2b-256	`501c2ed1c5b3b7fe7d4eae98ecf01b89aeccc813b51a4e4798ae676f98c312b2`

See more details on using hashes here.

mapmycells2cl 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MapMyCells2CL

Quick start

Installation

From source (development)

CLI reference

annotate

annotate-h5ad

update-mappings

Python API

CellTypeMapper

Single lookup

Batch lookup

Annotator (programmatic use)

How it works

Exact matches

Broad matches

IC-ranked best CL term

Coverage (CCN20230722 taxonomy)

Data sources

Development

Known gaps

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`annotate`

`annotate-h5ad`

`update-mappings`

`CellTypeMapper`