Interpretable Autoencoder with Ordinary Differential Equations for single-cell omics analysis

These details have not been verified by PyPI

Project links

Project description

iAODE: Interpretable Accessibility ODE VAE for scATAC-seq

iAODE (interpretable Accessibility ODE VAE) is a lightweight deep learning framework purpose-built for single‑cell ATAC‑seq (scATAC‑seq) data. It couples a variational autoencoder (VAE) with a Neural ODE and an interpretable bottleneck to jointly achieve:

Robust modeling of sparse count accessibility profiles (NB / ZINB likelihoods)
Continuous trajectory inference (Neural ODE pseudotime + velocity)
Interpretable latent factors (biologically aligned bottleneck)
Scalable preprocessing (TF‑IDF + highly variable peak selection) aligned with Signac / SnapATAC2 best practices

The design targets the unique characteristics of scATAC data: extreme sparsity, library size variability, heterogeneous peak accessibility kinetics, and dynamic regulatory trajectories.

Core Architecture

Raw Peak Counts  --(TF-IDF)-->  Normalized Matrix  --(HVP Selection)-->  Peak Subset
       |
       v
Encoder (MLP / Residual / Transformer) --> q(z|x)
       |                 \
       |                  +--> Bottleneck (i_dim) -> Interpretable factors
       v
   Neural ODE f(z,t) -> z_ode(t)  (pseudotime + dynamics)
       |                \
       |                 +--> Consistency Loss (q_z vs z_ode)
       v
   Decoder (NB / ZINB / MSE) -> Reconstruction x_hat

Key Components:

TF‑IDF Normalization: Stabilizes cell‑wise peak depth, emphasizes specific accessibility
Highly Variable Peaks (HVP): Variance / VMR / deviance‑based selection
NB/ZINB Likelihoods: Models over‑dispersion & zero inflation
Neural ODE: Smooth accessibility progression with pseudotime & velocity
Interpretable Bottleneck: Linear compression preserving biologically decodable axes
Multi‑Objective Regularization: β‑VAE KL, β‑TC, DIP, InfoVAE MMD, ODE consistency

Loss Function:

Loss = recon + i_recon + ODE_consistency + β·KL + dip·DIP + tc·TC + info·MMD

Installation

From PyPI (Recommended)

pip install iaode

From Source

git clone https://github.com/PeterPonyu/iAODE.git
cd iAODE
pip install -e .

Requirements

Python ≥ 3.9
PyTorch ≥ 1.10.0
AnnData ≥ 0.8.0
Scanpy ≥ 1.8.0
scvi-tools ≥ 0.16.0

See requirements.txt for complete dependencies.

Quick Start

Basic scATAC-seq Workflow

import scanpy as sc
import iaode
from iaode.utils import tfidf_normalization, select_highly_variable_peaks
from iaode.annotation import load_10x_h5_data

# 1. Download and load scATAC-seq data (auto-cached)
h5_file, gtf_file = iaode.datasets.mouse_brain_5k_atacseq()
print(f"Data downloaded to: {h5_file.parent}")

# Load peak count matrix
adata = load_10x_h5_data(str(h5_file))
adata.layers['counts'] = adata.X.copy()
print(f"Loaded: {adata.n_obs} cells × {adata.n_vars} peaks")

# 2. TF-IDF normalization (Signac/SnapATAC2 best practice)
tfidf_normalization(
    adata,
    scale_factor=1e4,
    log_tf=False,
    log_idf=True,
    inplace=True
)

# 3. Highly variable peak (HVP) selection
select_highly_variable_peaks(
    adata,
    n_top_peaks=20000,
    method='signac',
    min_accessibility=0.01,
    max_accessibility=0.95,
    inplace=True
)

# Subset to HVPs
hvp_mask = adata.var['highly_variable']
adata = adata[:, hvp_mask].copy()
print(f"Retained {adata.n_vars} highly variable peaks")

# 4. Train iAODE model
model = iaode.agent(
    adata,
    layer='counts',
    latent_dim=32,         # Higher for scATAC complexity
    hidden_dim=512,        # Deeper for regulatory patterns
    encoder_type='mlp',
    loss_mode='zinb',      # Best for sparse scATAC data
    use_ode=False
)

model.fit(epochs=400, patience=25, val_every=10)

# 5. Extract latent representation
latent = model.get_latent()
adata.obsm['X_iaode'] = latent

# 6. Visualize with UMAP
sc.pp.neighbors(adata, use_rep='X_iaode')
sc.tl.umap(adata)

# Color by QC metrics
sc.pl.umap(adata, color=['n_genes_by_counts', 'total_counts'])

scATAC-seq with Trajectory Inference

import scanpy as sc
import iaode
from iaode.utils import tfidf_normalization, select_highly_variable_peaks
from iaode.annotation import load_10x_h5_data

# 1. Load and preprocess scATAC-seq data
h5_file, gtf_file = iaode.datasets.mouse_brain_5k_atacseq()
adata = load_10x_h5_data(str(h5_file))
adata.layers['counts'] = adata.X.copy()

# TF-IDF normalization
tfidf_normalization(adata, scale_factor=1e4, log_tf=False, log_idf=True, inplace=True)

# Select highly variable peaks
select_highly_variable_peaks(adata, n_top_peaks=20000, method='signac', inplace=True)
adata = adata[:, adata.var['highly_variable']].copy()

# 2. Train iAODE with Neural ODE for trajectory inference
model = iaode.agent(
    adata,
    layer='counts',
    use_ode=True,          # Enable Neural ODE dynamics
    i_dim=16,              # Interpretable bottleneck dimension
    latent_dim=32,
    hidden_dim=512,
    encoder_type='mlp',
    loss_mode='zinb'
)

model.fit(epochs=400, patience=25, val_every=10)

# 3. Extract trajectory-related representations
latent = model.get_latent()           # Latent space (z)
iembed = model.get_iembed()           # Interpretable regulatory factors
pseudotime = model.get_pseudotime()   # ODE time parameter
velocity = model.get_velocity()       # Latent velocity field

# Store in AnnData
adata.obsm['X_iaode'] = latent
adata.obsm['X_iembed'] = iembed
adata.obs['pseudotime'] = pseudotime
adata.obsm['velocity'] = velocity

# 4. Visualize trajectory
sc.pp.neighbors(adata, use_rep='X_iaode')
sc.tl.umap(adata)

# Color UMAP by pseudotime to reveal developmental trajectory
sc.pl.umap(adata, color='pseudotime', cmap='viridis')

# Visualize velocity field (requires UMAP coordinates)
E_grid, V_grid = model.get_vfres(
    adata,
    zs_key='X_iaode',
    E_key='X_umap',
    stream=True,
    density=1.5
)

import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8, 6))
sc.pl.umap(adata, color='pseudotime', ax=ax, show=False)
ax.streamplot(E_grid[0], E_grid[1], V_grid[0], V_grid[1], 
              color='gray', density=1.5, linewidth=0.5, arrowsize=1)
plt.tight_layout()
plt.show()

scATAC-seq Peak Annotation Pipeline

Automatic data download - iAODE automatically downloads and caches datasets:

import iaode

# Downloads mouse brain 5k scATAC-seq + GENCODE vM25 GTF
# Files cached in ~/.iaode/data/ and reused on subsequent runs
h5_file, gtf_file = iaode.datasets.mouse_brain_5k_atacseq()

# Run complete annotation pipeline
adata = iaode.annotation_pipeline(
    h5_file=str(h5_file),
    gtf_file=str(gtf_file),
    promoter_upstream=2000,  # TSS upstream region
    promoter_downstream=500,  # TSS downstream region
    apply_tfidf=True,        # TF-IDF normalization
    select_hvp=True,         # Select highly variable peaks
    n_top_peaks=20000        # Number of HVPs to retain
)

# AnnData now contains:
# - adata.var['peak_type']: promoter/exonic/intronic/intergenic
# - adata.var['gene_name']: Associated gene names
# - adata.var['distance_to_tss']: Distance to nearest TSS
# - adata.var['highly_variable']: HVP selection mask
# - Preprocessed with TF-IDF and HVP selection

Available datasets:

# Mouse brain 5k scATAC-seq
h5, gtf = iaode.datasets.mouse_brain_5k_atacseq()

# Human PBMC 5k scATAC-seq
h5, gtf = iaode.datasets.human_pbmc_5k_atacseq()

# Cache management
iaode.datasets.list_cached_files()  # Show cached files
iaode.datasets.clear_cache()        # Clear all cached data

Examples

Comprehensive examples are available in the examples/ directory. See examples/README.md for detailed documentation.

Example Scripts

Script	Purpose	Best For
`scATAC_quickstart.py`	Complete scATAC pipeline (TF‑IDF → HVP → iAODE)	Real scATAC-seq analysis
`atacseq_annotation.py`	Peak-to-gene annotation + QC plots	Understanding peak annotation
`trajectory_ode.py`	Neural ODE pseudotime & velocity	Trajectory inference
`evaluation_metrics.py`	DR quality + clustering metrics	Model evaluation
`model_evaluation.py`	Benchmark vs scVI models	Comparative analysis
`basic_usage.py`	Simple scRNA-seq workflow	Getting started
`trajectory_inference.py`	Trajectory with vector field	Velocity visualization

Running Examples

cd examples

# Complete scATAC-seq workflow with auto-download
python atacseq_annotation.py

# Or use your own data
python scATAC_quickstart.py \
    --h5 filtered_peak_bc_matrix.h5 \
    --gtf gencode.vM25.annotation.gtf.gz \
    --use-ode --latent-dim 32 --i-dim 16

# Trajectory inference with synthetic data
python trajectory_ode.py

# Comprehensive benchmarking
python model_evaluation.py

All examples save outputs to examples/outputs/ with visualizations and processed data. See examples/README.md for detailed usage instructions and output descriptions.

API Reference

`iaode.agent` - Main Model Interface

Initialization:

model = iaode.agent(
    adata,                    # AnnData object
    layer='counts',           # Data layer to use
    latent_dim=10,            # Latent space dimension
    hidden_dim=128,           # Hidden layer dimension
    i_dim=None,               # Interpretable bottleneck dim (required if use_ode=True)
    use_ode=False,            # Enable Neural ODE
    loss_mode='nb',           # Loss function: 'mse', 'nb', 'zinb'
    encoder_type='mlp',       # Encoder: 'mlp', 'residual_mlp', 'transformer', 'linear'
    lr=1e-4,                  # Learning rate
    batch_size=128,           # Batch size
    beta=1.0,                 # KL divergence weight
    recon=1.0,                # Reconstruction loss weight
    tc=0.0,                   # Total correlation weight
    dip=0.0,                  # DIP weight
    info=0.0                  # InfoVAE MMD weight
)

Training:

model.fit(
    epochs=100,               # Maximum epochs
    patience=20,              # Early stopping patience
    val_every=5,              # Validation frequency
    early_stop=True           # Enable early stopping
)

Representation Extraction:

# Basic representations
latent = model.get_latent()              # Latent space (n_cells, latent_dim)
iembed = model.get_iembed()              # Interpretable factors (n_cells, i_dim)

# Trajectory-specific (requires use_ode=True)
pseudotime = model.get_pseudotime()      # ODE time parameter (n_cells,)
velocity = model.get_velocity()          # Latent velocity (n_cells, latent_dim)

# Vector field for visualization (requires UMAP in adata.obsm['X_umap'])
E_grid, V_grid = model.get_vfres(
    adata,
    zs_key='X_iaode',        # Latent representation key
    E_key='X_umap',          # Embedding key for visualization
    stream=True,             # Return streamplot-compatible format
    density=1.5              # Grid density
)

Evaluation Metrics:

# Training metrics
metrics = model.get_resource_metrics()
# Returns: {'train_time': float, 'actual_epochs': int, 'peak_memory_gb': float}

`iaode.annotation_pipeline` - scATAC-seq Preprocessing

Complete annotation and preprocessing pipeline:

adata = iaode.annotation_pipeline(
    h5_file,                    # Path to 10X H5 file
    gtf_file,                   # Path to GTF annotation
    promoter_upstream=2000,     # TSS upstream extension (bp)
    promoter_downstream=500,    # TSS downstream extension (bp)
    apply_tfidf=True,           # Apply TF-IDF normalization
    select_hvp=True,            # Select highly variable peaks
    n_top_peaks=20000,          # Number of HVPs to retain
    hvp_method='signac',        # HVP method: 'signac', 'snapatac2', 'deviance'
    min_accessibility=0.01,     # Min peak accessibility fraction
    max_accessibility=0.95      # Max peak accessibility fraction
)

Returns AnnData with:

adata.var['peak_type']: Peak annotation (promoter/exonic/intronic/intergenic)
adata.var['gene_name']: Associated gene names
adata.var['distance_to_tss']: Distance to nearest TSS
adata.var['highly_variable']: HVP selection mask
Preprocessed and normalized counts

Evaluation Functions

Dimensionality Reduction Quality

from iaode import evaluate_dimensionality_reduction

metrics = evaluate_dimensionality_reduction(
    X_high,                  # High-dimensional data (n_cells, n_features)
    X_low,                   # Low-dimensional embedding (n_cells, n_latent)
    k=10,                    # Number of neighbors
    verbose=True
)

# Returns:
# - distance_correlation: Global structure preservation (Spearman ρ)
# - Q_local: Local neighborhood quality
# - Q_global: Global structure quality
# - K_max: Local-global transition point

Latent Space Quality

from iaode import evaluate_single_cell_latent_space

metrics = evaluate_single_cell_latent_space(
    latent_space,            # Latent representation (n_cells, n_latent)
    data_type='trajectory',  # 'trajectory' or 'steady_state'
    verbose=True
)

# Returns:
# - manifold_dimensionality: Dimensional efficiency (0-1)
# - spectral_decay_rate: Eigenvalue concentration
# - participation_ratio: Dimensional balance
# - anisotropy_score: Directionality strength
# - trajectory_directionality: Dominant axis strength
# - noise_resilience: Signal-to-noise ratio
# - overall_quality: Aggregate score

Model Benchmarking

from iaode import DataSplitter, train_scvi_models, evaluate_scvi_models

# Create consistent train/val/test splits
splitter = DataSplitter(
    n_samples=adata.n_obs,
    test_size=0.15,
    val_size=0.15,
    random_state=42
)

# Train scVI family models
scvi_results = train_scvi_models(
    adata,
    splitter,
    n_latent=10,
    n_epochs=400,
    batch_size=128
)

# Evaluate all models
scvi_metrics = evaluate_scvi_models(
    scvi_results,
    adata,
    splitter.test_idx
)

Preprocessing Utilities

from iaode.utils import tfidf_normalization, select_highly_variable_peaks

# TF-IDF normalization (Signac/SnapATAC2 style)
tfidf_normalization(
    adata,
    scale_factor=1e4,
    log_tf=False,
    log_idf=True,
    inplace=True
)

# Highly variable peak selection
select_highly_variable_peaks(
    adata,
    n_top_peaks=20000,
    method='signac',         # or 'snapatac2', 'deviance'
    min_accessibility=0.01,
    max_accessibility=0.95,
    inplace=True
)

Advanced Usage

Custom Encoder Architecture

# Transformer encoder for large-scale data
model = iaode.agent(
    adata,
    encoder_type='transformer',
    encoder_num_layers=4,
    encoder_n_heads=8,
    encoder_d_model=256,
    hidden_dim=512,
    latent_dim=32
)

Multi-Objective Regularization

# Fine-tune regularization weights
model = iaode.agent(
    adata,
    recon=1.0,      # Reconstruction loss
    beta=1.0,       # KL divergence
    tc=0.5,         # Total correlation (disentanglement)
    dip=0.1,        # DIP (dimension-wise independence)
    info=0.05       # InfoVAE MMD (distribution matching)
)

Custom Training Loop

# Manual training with custom logic
for epoch in range(100):
    train_loss = model.train_epoch()
    
    if epoch % 5 == 0:
        val_loss, val_score = model.validate()
        print(f"Epoch {epoch}: Val Loss={val_loss:.4f}")
        
    if should_stop(val_loss):
        model.load_best_model()
        break

Citation

If you use iAODE in your research, please cite:

@software{iaode2025,
    author = {Zeyu Fu},
    title = {iAODE: Interpretable Accessibility ODE VAE for Single-Cell Chromatin Dynamics},
    year = {2025},
    publisher = {GitHub},
    url = {https://github.com/PeterPonyu/iAODE}
}

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit changes (git commit -m 'Add AmazingFeature')
Push to branch (git push origin feature/AmazingFeature)
Open a Pull Request

For major changes, please open an issue first to discuss proposed changes.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built upon ideas from:

scVI-tools: scVI, PEAKVI, POISSONVI architectures
Signac and SnapATAC2: scATAC-seq best practices
Neural ODE literature: Continuous latent dynamics

PyTorch, AnnData, and Scanpy ecosystems provide the foundation.

Contact

GitHub Issues: https://github.com/PeterPonyu/iAODE/issues
Email: fuzeyu99@126.com

Changelog

v0.2.0 (2025-11-20)

Documentation & Usability

Comprehensive documentation overhaul (README, examples/README, QUICKSTART)
Separated scRNA-seq and scATAC-seq workflows with clear examples
Added automatic dataset download and caching system
Created examples/data/download_data.sh helper script
Verified all API usage examples against actual implementation

Code Quality

Cleaned example scripts: removed unused imports, improved error handling
Fixed all cross-references between examples and core modules
Enhanced inline documentation and error messages
Verified method signatures: get_velocity(), get_vfres(), get_pseudotime(), etc.

Package Improvements

Complete scATAC-seq preprocessing pipeline with TF-IDF and HVP selection
Improved peak annotation with distance-to-TSS computation
Enhanced evaluation metrics (DRE + LSE frameworks)
Better handling of sparse data and zero inflation

v0.1.2 (2025-11-19)

Metadata & Contact

Updated author name to "Zeyu Fu"
Set primary contact email to fuzeyu99@126.com
Fixed BibTeX citation formatting

v0.1.1 (2025-11-19)

Bug Fixes

Added missing requests dependency for scvi-tools compatibility
Dropped Python 3.8 support (requires Python ≥3.9)
Fixed CI test imports and matrix configuration

v0.1.0 (2025-11-19)

Initial Release

VAE with Neural ODE support for trajectory inference
Complete scATAC-seq peak annotation pipeline
Comprehensive evaluation metrics
Benchmark framework vs scVI models
Multiple encoder types (MLP, Residual, Transformer, Linear)
Multiple loss modes (MSE, NB, ZINB)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Feb 2, 2026

0.2.9

Dec 6, 2025

0.2.8

Nov 25, 2025

0.2.7

Nov 23, 2025

0.2.6

Nov 22, 2025

0.2.4

Nov 22, 2025

0.2.3

Nov 21, 2025

0.2.2

Nov 21, 2025

This version

0.2.1

Nov 20, 2025

0.2.0

Nov 20, 2025

0.1.2

Nov 20, 2025

0.1.1

Nov 20, 2025

0.1.0

Nov 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iaode-0.2.1.tar.gz (57.4 kB view details)

Uploaded Nov 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iaode-0.2.1-py3-none-any.whl (55.5 kB view details)

Uploaded Nov 20, 2025 Python 3

File details

Details for the file iaode-0.2.1.tar.gz.

File metadata

Download URL: iaode-0.2.1.tar.gz
Upload date: Nov 20, 2025
Size: 57.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for iaode-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`dae64100b1b2740915d6ab48322f050702065520a730fdf8875fcbc8064b370c`
MD5	`26d4a924c7842551d63f995180b1d298`
BLAKE2b-256	`df199d92ff86c018a692b953c8944a8dffba1bb8cbb445fcce9fbab68a4dc735`

See more details on using hashes here.

File details

Details for the file iaode-0.2.1-py3-none-any.whl.

File metadata

Download URL: iaode-0.2.1-py3-none-any.whl
Upload date: Nov 20, 2025
Size: 55.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for iaode-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f1474c8082a5a3c0c898e2cba0811bc7678438dcffbcf93429bc55717a6ce1ca`
MD5	`deefa01fd95c11624cbfc78c3a7bd61a`
BLAKE2b-256	`30f3592a1f91f992e7ec1c2603feefdc14cf5fbf8e75dce25ca5899ca6ec383f`

See more details on using hashes here.

iaode 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

iAODE: Interpretable Accessibility ODE VAE for scATAC-seq

Core Architecture

Installation

From PyPI (Recommended)

From Source

Requirements

Quick Start

Basic scATAC-seq Workflow

scATAC-seq with Trajectory Inference

scATAC-seq Peak Annotation Pipeline

Examples

Example Scripts

Running Examples

API Reference

iaode.agent - Main Model Interface

iaode.annotation_pipeline - scATAC-seq Preprocessing

Evaluation Functions

Dimensionality Reduction Quality

Latent Space Quality

Model Benchmarking

Preprocessing Utilities

Advanced Usage

Custom Encoder Architecture

Multi-Objective Regularization

Custom Training Loop

Citation

Contributing

License

Acknowledgments

Contact

Changelog

v0.2.0 (2025-11-20)

v0.1.2 (2025-11-19)

v0.1.1 (2025-11-19)

v0.1.0 (2025-11-19)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`iaode.agent` - Main Model Interface

`iaode.annotation_pipeline` - scATAC-seq Preprocessing