Skip to main content

Single-cell cross-species cell type annotation tool using knowledge graph.

Project description

Xener

This is the public version, containing only the necessary code.

A cross-species single-cell cell type annotation tool using knowledge graph.

Installation

pip install .
# or
pip install xener

Quick Start

With a YAML config file

from xener import Xener

annor = Xener()
cluster2celltype, _, debug_params = annor.run_from_yaml('config.yaml')

config.yaml example:

cluster_key: leiden
model_species:
- Brassica_rapa
non_model_fasta: Arabidopsis_thaliana.fasta
non_model_h5ad: ERP132245.h5ad
organ: leaf
outdir: output/ERP132245

Programmatic API

from xener import Xener

annor = Xener()
cluster2celltype, _, debug_params = annor(
    non_model_h5ad='ERP132245.h5ad',
    cluster_key='leiden',
    outdir='output/ERP132245',
    non_model_fasta='Arabidopsis_thaliana.fasta',
    model_species=['Brassica_rapa'],
    organ='leaf',
)

Defaults for marker_weight_method, mode, decay_factor, multihomolo, top_num, etc. come from the default config and can be overridden as keyword arguments.

The third return value debug_params is a dict recording the actual parameters used in each key step, saved as debug_params.yaml in the output directory. It helps with reproducibility.

Step-by-step

The __call__ API above is the simplest way to run the full pipeline. If you need fine-grained control, you can call each step individually:

from xener import Xener
import scanpy as sc

annor = Xener()
adata = sc.read('ERP132245.h5ad')
cluster_key = 'leiden'
non_model_fasta = 'Arabidopsis_thaliana.fasta'
model_species = ['Brassica_rapa']
organ = 'leaf'
outdir = 'output/ERP132245'

marker_gene = annor.get_markers(adata, cluster_key)

marker_weight, debug_gw = annor.get_gene_weight(marker_gene)

gene_homolo_weight, debug_map = annor.mapping(marker_weight, non_model_fasta, model_species, outdir)

topk_markers, debug_topk = annor.get_topk_gene(gene_homolo_weight, top_num=30)
# Only the top 30 genes will be retained for the subsequent steps.

cluster2celltype, _, celltype_weight, debug_ann = annor.cell_annotation(
    topk_markers, outdir / 'annotation', organ)

# Collect and save debug_params for reproducibility
import yaml
debug_params = {}
debug_params['get_gene_weight'] = debug_gw
debug_params['mapping'] = debug_map
debug_params['get_topk_gene'] = debug_topk
debug_params['cell_annotation'] = debug_ann
with open(outdir / 'debug_params.yaml', 'w') as f:
    yaml.dump(debug_params, f, default_flow_style=False)

Each step function (except get_markers) returns (result, debug_params) — a dict of the actual parameters used internally. Unlike __call__, step-by-step mode requires you to collect and save them explicitly.

Output directory

outdir/
├── marker_gene.zip
├── marker_weight.zip
├── blastp_{species}.zip            # one per model species
├── gene_homolo_weight.zip
├── topk_markers.zip
├── celltype_weight.zip
├── debug_params.yaml               # actual parameters used in each step
├── config.yaml                     # from run_from_yaml only
└── annotation/
    ├── cluster_{id}_gene2celltype.xml  # The annotation path of this cluster
    └── ...

debug_params.yaml

This file records the actual parameter values used in each key step of the pipeline, making results reproducible:

cell_annotation:
  decay_factor: 0.7
  mode: path
  organ: leaf
  threshold: null
get_gene_weight:
  marker_weight_method: prod
get_topk_gene:
  multihomolo: true
  top_num: 30
mapping:
  bitscore: 200
  evalue: 0.05
  homolo_weight_key: pident
  model_species:
  - Oryza_sativa
  pident: 60

Sub-cluster refinement

cluster_id = 0
candidate_celltype = ['type1', 'type2']
# Only support the values that appear in celltype_weight[celltype_weight['cluster'] == cluster_id]['celltype'].unique()
key_added = 'xener_refine'
moranI_threshold = 0.5
# moranI_threshold used for gene screening, the effective value ranges from [-1, 1].
# The closer to 1, the stricter it is. If an invalid value is input, the screening step will be skipped.

geneCount, diffgeneCount, annotation = annor.refine_single_cluster(
    adata, topk_markers, cluster_key, cluster_id, candidate_celltype,
    key_added, organ, moranI_threshold)
# The results can be found in the returned annotation[key_added] DataFrame.

Links

Homepage: https://xenor.dcs.cloud/

PyPI: https://pypi.org/project/xener

Github: https://github.com/liushuai6bgi/Xener

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xener-0.1.8-py3-none-any.whl (33.9 kB view details)

Uploaded Python 3

File details

Details for the file xener-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: xener-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 33.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for xener-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 eed63340546d5aa3f6b42bd0ce558a1157fae402544c66809eceec1cbadaf679
MD5 aa4d9db682c6e07cecafe84325114d80
BLAKE2b-256 f894488fd77c3b70329ca8def98945fa38c36afb28f41ec647577f6c305189bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page