AnnData-native deep learning baselines for single-cell data.

These details have not been verified by PyPI

Project description

scDLKit

Train, evaluate, compare, and visualize baseline deep-learning models for single-cell data without writing PyTorch from scratch.

Choose the entrypoint by user goal:

stable embeddings and baseline models: TaskRunner
experimental labeled annotation adaptation: adapt_annotation(...)
lower-level control and custom models: Trainer plus adapters

Quick Start

Start here if you want the shortest path from AnnData to a learned embedding and, for reconstruction-capable models, predicted or reconstructed gene-expression values:

load an AnnData
fit a baseline model
get the learned embedding
optionally get predicted or reconstructed gene-expression values
continue in Scanpy

import scanpy as sc
from scdlkit import TaskRunner

adata = sc.datasets.pbmc3k_processed()

runner = TaskRunner(
    model="vae",
    task="representation",
    label_key="louvain",
    device="auto",
    epochs=20,
    batch_size=128,
    model_kwargs={"kl_weight": 1e-3},
)

runner.fit(adata)

# Cell embedding for downstream Scanpy analysis.
adata.obsm["X_scdlkit_vae"] = runner.encode(adata)

# Predicted / reconstructed gene expression for reconstruction-capable models.
predicted_expression = runner.reconstruct(adata)

Then keep the normal Scanpy path:

sc.pp.neighbors(adata, use_rep="X_scdlkit_vae")
sc.tl.umap(adata)
sc.pl.umap(adata, color="louvain")

Notes:

runner.encode(...) returns the latent embedding.
runner.reconstruct(...) returns reconstructed gene-expression values for reconstruction-capable models such as autoencoder, vae, denoising_autoencoder, and transformer_ae.
runner.predict(...) remains backward compatible, but reconstruct(...) is the clearer public path for reconstructed expression.
Classification models return class predictions instead of reconstructed expression.
Frozen scGPT in the experimental foundation path exposes embeddings only, not reconstructed expression.

What you get from this quickstart:

a learned embedding in adata.obsm
reconstructed gene-expression values when the model supports them
training metrics and saved reports
a direct continuation path into Scanpy

Related docs:

Tutorial: Scanpy PBMC quickstart
API: TaskRunner

Fine-Tuning Quickstart

If your main goal is cell-type annotation on a labeled human AnnData, the experimental scGPT wrapper path is now also a first-class quickstart:

from scdlkit import adapt_annotation

runner = adapt_annotation(
    adata,
    label_key="cell_type",
    output_dir="artifacts/scgpt_annotation",
)

runner.annotate_adata(adata, obs_key="scgpt_label", embedding_key="X_scgpt_best")
runner.save("artifacts/scgpt_annotation/best_model")

This path is designed for researchers who want:

a low-code fine-tuning workflow
frozen versus tuned strategy comparison
predictions written back into adata.obs
latent embeddings written back into adata.obsm
a saved runner they can reload later

This fine-tuning path is still experimental and intentionally narrow:

human scRNA-seq only
official scGPT whole-human checkpoint only
annotation only
default quickstart comparison is frozen_probe plus head
LoRA remains opt-in through strategies=("frozen_probe", "head", "lora")

Related docs:

Tutorial: Experimental scGPT dataset-specific annotation
API: Experimental annotation quickstart API

Start Here

Documentation site: https://uddamvathanak.github.io/scDLKit/
Primary notebook tutorial: examples/train_vae_pbmc.ipynb
Install path for tutorials: python -m pip install "scdlkit[tutorials]"
Experimental foundation path: python -m pip install "scdlkit[foundation,tutorials]"
CPU and GPU use the same notebook path through device="auto"
Core learning path: quickstart -> downstream Scanpy -> comparison -> reconstruction sanity check
Researcher shortcut for labeled data: quickstart -> experimental scGPT cell-type annotation -> experimental dataset-specific annotation
Secondary notebooks: examples/compare_models_pbmc.ipynb, examples/classification_demo.ipynb
Downstream Scanpy notebook: examples/downstream_scanpy_after_scdlkit.ipynb
Reconstruction notebook: examples/reconstruction_sanity_pbmc.ipynb
Custom model notebook: examples/custom_model_extension.ipynb
Experimental foundation notebook: examples/scgpt_pbmc_embeddings.ipynb
Experimental annotation fine-tuning notebook: examples/scgpt_cell_type_annotation.ipynb
Experimental dataset-specific wrapper notebook: examples/scgpt_dataset_specific_annotation.ipynb
API routing page: docs/api/index.md
Synthetic smoke examples: examples/first_run_synthetic.ipynb, examples/first_run_synthetic.py

Why scDLKit

AnnData-native workflow for single-cell users.
Baseline-first model zoo: AE, VAE, DAE, Transformer AE, and MLP classification.
Built-in training, evaluation, comparison, and plotting.
Reproducible reports and notebooks for portfolio-ready demonstrations.
Built-in benchmark gates on small Scanpy datasets before tutorial defaults change.
Gene-expression-focused scope while the core toolkit stabilizes.
Experimental frozen scGPT embeddings for human PBMC workflows.
Experimental scGPT annotation fine-tuning with head-only and LoRA strategies.

Supported platforms

Linux: supported
macOS: supported
Windows: supported

Installation

Primary tutorial install path:

python -m pip install "scdlkit[tutorials]"

Windows note: if you install into a deeply nested virtual environment path, Jupyter dependencies can hit Windows path-length limits. Use a short environment path such as C:\venvs\scdlkit, or enable Windows Long Paths if needed.

Optional extras:

python -m pip install "scdlkit[scanpy]"
python -m pip install "scdlkit[notebook]"
python -m pip install "scdlkit[foundation]"
python -m pip install scdlkit
python -m pip install "scdlkit[dev,docs]"

For GPU users, install the matching PyTorch build first using the official selector:

https://docs.pytorch.org/get-started/locally/

Then install scdlkit[tutorials]. The same notebook examples run on CPU or GPU with device="auto".

Scanpy Quickstart

Primary tutorial example. The notebook uses a quickstart profile by default and exposes a full profile in its first config cell:

quickstart: CPU-friendly, docs-friendly, reproducible
full: longer run for stronger qualitative separation

For the PBMC quickstart, use a light VAE KL term so the latent UMAP preserves broad cell-type structure instead of collapsing into a uniform blob. A healthy result should show broad cell-type groups as visibly separated regions rather than a single mixed cloud.

Notebook-First Examples

Most researchers should start with the Scanpy PBMC quickstart:

python -m pip install "scdlkit[tutorials]"
jupyter notebook examples/train_vae_pbmc.ipynb

This notebook:

loads PBMC data through Scanpy
trains a VAE baseline with scDLKit
writes the latent representation into adata.obsm
continues with Scanpy neighbors and UMAP
points to the downstream Scanpy and reconstruction tutorials for the next interpretation steps
explains the quickstart versus full tutorial profiles
works on CPU or GPU through device="auto"

Additional Scanpy-first notebooks:

examples/downstream_scanpy_after_scdlkit.ipynb: take the scDLKit embedding through Leiden clustering, marker ranking, dotplots, and coarse annotation
examples/compare_models_pbmc.ipynb: compare PCA, autoencoder, vae, and transformer_ae
examples/reconstruction_sanity_pbmc.ipynb: inspect reconstructed gene-expression outputs with a dedicated reconstruction baseline
examples/classification_demo.ipynb: run the mlp_classifier baseline and inspect a confusion matrix
examples/custom_model_extension.ipynb: wrap a raw PyTorch autoencoder and train it through Trainer
examples/scgpt_pbmc_embeddings.ipynb: run the experimental frozen whole-human scGPT embedding workflow and return to Scanpy through adata.obsm
examples/scgpt_cell_type_annotation.ipynb: compare PCA + logistic regression, frozen scGPT, head-only tuning, and LoRA tuning for labeled PBMC annotation
examples/scgpt_dataset_specific_annotation.ipynb: use the new wrapper-first adapt_annotation(...) flow on a second labeled PBMC dataset and save the best fitted runner

The synthetic notebook and script are still available, but they are now the smoke-test path rather than the primary researcher onboarding flow:

python -m pip install "scdlkit[notebook]"
jupyter notebook examples/first_run_synthetic.ipynb

python examples/first_run_synthetic.py

These write small reproducible artifacts to artifacts/first_run_notebook/ and artifacts/first_run/.

Optional contributor Conda environment

Conda is kept for contributors and demos. It is not the primary public install path.

Official installers:

Miniconda install guide: https://www.anaconda.com/docs/getting-started/miniconda/install
Anaconda Distribution download: https://www.anaconda.com/download

From the repo root:

conda env create -f environment.yml
conda activate scdlkit

Core APIs

High-level:

from scdlkit import TaskRunner

Lower-level:

from scdlkit import Trainer, create_model, prepare_data

Custom-model adapters:

from scdlkit.adapters import wrap_classification_module, wrap_reconstruction_module

Custom wrapped models are supported through Trainer first. TaskRunner remains the built-in high-level path for bundled scDLKit models.

Experimental foundation helpers:

from scdlkit.foundation import load_scgpt_model, prepare_scgpt_data

Experimental scGPT annotation tuning:

from scdlkit.foundation import (
    load_scgpt_annotation_model,
    prepare_scgpt_data,
    split_scgpt_data,
)

Experimental wrapper-first adaptation:

from scdlkit import adapt_annotation

runner = adapt_annotation(
    adata,
    label_key="cell_type",
    output_dir="artifacts/scgpt_annotation",
)
runner.annotate_adata(adata)
runner.save("artifacts/scgpt_annotation/best_model")

Comparison:

from scdlkit import compare_models

benchmark = compare_models(
    adata,
    models=["autoencoder", "vae", "transformer_ae"],
    task="representation",
    shared_kwargs={"epochs": 10, "label_key": "cell_type"},
    output_dir="artifacts/compare",
)

Supported models

autoencoder
vae
denoising_autoencoder
transformer_ae
mlp_classifier

Supported tasks

representation
reconstruction
classification

Current scope

Gene-expression baselines for AnnData workflows
Scanpy-first tutorial and downstream embedding usage
Built-in deep-learning baselines plus classical comparison context in notebooks
Adapter-based custom PyTorch model support through Trainer
Experimental scGPT frozen embedding support for human PBMC workflows
Experimental scGPT annotation fine-tuning for labeled human PBMC workflows through Trainer
Experimental wrapper-first scGPT dataset adaptation for users who want a simpler compare-predict-save loop

Broader foundation-model support, full-backbone fine-tuning, spatial omics, and multimodal workflows remain future work once the gene-expression toolkit quality gates stay stable.

Documentation

Project documentation is published as a Sphinx-based scientific docs site:

Docs site: https://uddamvathanak.github.io/scDLKit/
Tutorials: Scanpy-first notebook walkthroughs rendered in the docs site
API reference: docs/api/index.md
Example notebooks: examples/

GitHub Pages setup

The docs workflow expects GitHub Pages to be enabled once at the repository level.

Open Settings -> Pages for this repo: https://github.com/uddamvathanak/scDLKit/settings/pages
Under Build and deployment, set Source to GitHub Actions.
Save the setting.
Re-run the docs workflow.

Without that one-time setting, GitHub returns a 404 when actions/configure-pages or actions/deploy-pages tries to access the Pages site.

Optional automatic Pages enablement

If you want the workflow to bootstrap Pages automatically instead of doing the one-time manual setup:

Create a repository secret named PAGES_ENABLEMENT_TOKEN.
Use a Personal Access Token with repo scope or Pages write permission.
Re-run the docs workflow.

Release flow

Stage to TestPyPI first with release-testpypi.yml.
Publish the final release from a v* tag with release.yml.
Use trusted publishing instead of long-lived PyPI API tokens.
See RELEASING.md for the full checklist.

Examples

examples/train_vae_pbmc.ipynb is the primary Scanpy-first notebook tutorial.
examples/compare_models_pbmc.ipynb compares autoencoder, vae, and transformer_ae on PBMC data.
examples/classification_demo.ipynb covers the mlp_classifier workflow and confusion-matrix reporting.
examples/first_run_synthetic.ipynb is the secondary smoke-test notebook with minimal setup.
examples/first_run_synthetic.py is the secondary smoke-test script.

Roadmap

Immediate roadmap target:

keep the built-in TaskRunner story stable for bundled baselines
preserve adapter-first custom-model support through Trainer
keep the experimental scGPT frozen-embedding and annotation-tuning paths narrow and inspectable
expand experimental adaptation workflows cautiously without broadening the toolkit too early

Released so far:

v0.1

Expanded core workflow with training, evaluation, reporting, and plotting.
Staged TestPyPI and PyPI publishing.
Cross-platform smoke validation and reproducible notebooks.

Later:

broader foundation-model fine-tuning beyond annotation once the experimental scGPT path is stable
spatial baselines only after the gene-expression toolkit is stable

Citation

If you use scDLKit, cite the software entry in CITATION.cff.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.6

Mar 22, 2026

0.1.5

Mar 19, 2026

0.1.3

Mar 16, 2026

0.1.1

Mar 13, 2026

0.1.0

Mar 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scdlkit-0.1.6.tar.gz (67.2 kB view details)

Uploaded Mar 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scdlkit-0.1.6-py3-none-any.whl (68.9 kB view details)

Uploaded Mar 22, 2026 Python 3

File details

Details for the file scdlkit-0.1.6.tar.gz.

File metadata

Download URL: scdlkit-0.1.6.tar.gz
Upload date: Mar 22, 2026
Size: 67.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scdlkit-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`68d88849147d10e4417a3f3c7a61dc74e1941be3ce0614452a1e0c085213286d`
MD5	`ea4f57bec99937ef7edb8b41bd93d398`
BLAKE2b-256	`bb55645ae4f3cc5aec16f30a1f9da79833c07dc88e697ae3ef403ab7d56b5a8c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scdlkit-0.1.6.tar.gz:

Publisher: release.yml on uddamvathanak/scDLKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scdlkit-0.1.6.tar.gz
- Subject digest: 68d88849147d10e4417a3f3c7a61dc74e1941be3ce0614452a1e0c085213286d
- Sigstore transparency entry: 1155000215
- Sigstore integration time: Mar 22, 2026
Source repository:
- Permalink: uddamvathanak/scDLKit@b3f450ea521ed999010acbbe286df972293198dd
- Branch / Tag: refs/tags/v0.1.6
- Owner: https://github.com/uddamvathanak
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b3f450ea521ed999010acbbe286df972293198dd
- Trigger Event: push

File details

Details for the file scdlkit-0.1.6-py3-none-any.whl.

File metadata

Download URL: scdlkit-0.1.6-py3-none-any.whl
Upload date: Mar 22, 2026
Size: 68.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scdlkit-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`09d8bb43930440f18d8f6f59fffc397064f4656fa4ee29c877c4b26bae269ab4`
MD5	`251d2d19b59fde08a96e5fbc19fc401b`
BLAKE2b-256	`36b8342c46e6778bd3f0d96d915502f4c581fc855e2b2455b8bb9a1d7f3d87d6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scdlkit-0.1.6-py3-none-any.whl:

Publisher: release.yml on uddamvathanak/scDLKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scdlkit-0.1.6-py3-none-any.whl
- Subject digest: 09d8bb43930440f18d8f6f59fffc397064f4656fa4ee29c877c4b26bae269ab4
- Sigstore transparency entry: 1155000218
- Sigstore integration time: Mar 22, 2026
Source repository:
- Permalink: uddamvathanak/scDLKit@b3f450ea521ed999010acbbe286df972293198dd
- Branch / Tag: refs/tags/v0.1.6
- Owner: https://github.com/uddamvathanak
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b3f450ea521ed999010acbbe286df972293198dd
- Trigger Event: push

scdlkit 0.1.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

scDLKit

Quick Start

Fine-Tuning Quickstart

Start Here

Why scDLKit

Supported platforms

Installation

Scanpy Quickstart

Notebook-First Examples

Optional contributor Conda environment

Core APIs

Supported models

Supported tasks

Current scope

Documentation

GitHub Pages setup

Optional automatic Pages enablement

Release flow

Examples

Roadmap

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance