Cell type Identification using Transcription factor Analysis and Chromatin accessibility

These details have not been verified by PyPI

Project links

Homepage

Project description

cellitac

Cell type Identification using Transcription factor Analysis and Chromatin accessibility

A pipeline for processing Single-Cell ATAC + RNA Multiome data and classifying cell types using Machine Learning.

What It Does

Stage	Steps	Tools
Preprocessing	RNA QC → normalization → cell-type annotation	Seurat + SingleR (R via rpy2)
Preprocessing	ATAC QC → TF-IDF → LSI	Signac (R via rpy2)
Preprocessing	RNA + ATAC integration → ML-ready CSVs	Pure Python
ML	Imbalance analysis → SMOTE → feature selection	scikit-learn, imbalanced-learn
ML	RF + XGBoost + SVM training & evaluation	scikit-learn, xgboost
ML	19 plots + JSON report + XLSX	matplotlib, seaborn, networkx

Requirements

Before installing cellitac, you need:

Linux or macOS (Ubuntu 20.04+ recommended)
Python 3.9, 3.10, or 3.11 (not 3.12 or higher)
Conda / Miniconda (download here)
~5 GB free disk space

Installation

Step 1 — Create a Conda environment

conda create -n cellitac python=3.11 -y
conda activate cellitac

Step 2 — Install R and core R libraries via conda

conda install -c conda-forge r-base=4.3.1 -y

conda install -c conda-forge -c bioconda \
  r-matrix r-hdf5r rpy2 \
  bioconductor-summarizedexperiment \
  bioconductor-singlecellexperiment \
  bioconductor-genomicranges \
  bioconductor-delayedarray \
  bioconductor-biocsingular \
  bioconductor-biocneighbors \
  bioconductor-genomicalignments \
  bioconductor-genomicfeatures \
  bioconductor-rtracklayer -y

Step 3 — Install remaining R packages (takes 10–30 min)

Rscript -e "install.packages('BiocManager', repos='https://cran.r-project.org')"

Rscript -e "BiocManager::install(c(
  'Seurat', 'Signac', 'SingleR', 'celldex',
  'EnsDb.Hsapiens.v75', 'biovizBase', 'data.table'
), ask=FALSE)"

Step 4 — Install cellitac

pip install cellitac

Step 5 — Verify installation

cellitac --help

If you see the help message, you are ready to go ✅

Quick Start

Download test data (PBMC 3k cells, ~560 MB)

mkdir -p ~/data && cd ~/data

wget https://cf.10xgenomics.com/samples/cell-arc/2.0.0/pbmc_granulocyte_sorted_3k/pbmc_granulocyte_sorted_3k_filtered_feature_bc_matrix.h5
wget https://cf.10xgenomics.com/samples/cell-arc/2.0.0/pbmc_granulocyte_sorted_3k/pbmc_granulocyte_sorted_3k_atac_fragments.tsv.gz
wget https://cf.10xgenomics.com/samples/cell-arc/2.0.0/pbmc_granulocyte_sorted_3k/pbmc_granulocyte_sorted_3k_atac_fragments.tsv.gz.tbi
wget https://cf.10xgenomics.com/samples/cell-arc/2.0.0/pbmc_granulocyte_sorted_3k/pbmc_granulocyte_sorted_3k_atac_peaks.bed
wget https://cf.10xgenomics.com/samples/cell-arc/2.0.0/pbmc_granulocyte_sorted_3k/pbmc_granulocyte_sorted_3k_per_barcode_metrics.csv

Run the pipeline

conda activate cellitac
cellitac --input ~/data --output ~/results

Full Dataset (PBMC 10k)

mkdir -p ~/data && cd ~/data

wget https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_unsorted_10k/pbmc_unsorted_10k_filtered_feature_bc_matrix.h5
wget https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_unsorted_10k/pbmc_unsorted_10k_per_barcode_metrics.csv
wget https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_unsorted_10k/pbmc_unsorted_10k_atac_fragments.tsv.gz
wget https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_unsorted_10k/pbmc_unsorted_10k_atac_fragments.tsv.gz.tbi
wget https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_unsorted_10k/pbmc_unsorted_10k_atac_peaks.bed

Note: cellitac auto-detects file names — your files do not need to follow the 10x naming convention.

Usage

Command Line

# Full pipeline (preprocessing + ML)
cellitac --input ~/data --output my_results

# Preprocessing only
cellitac-preprocess --input ~/data --output my_results

# ML only (if preprocessing already done)
cellitac-model --data my_results/python_ready_data --output my_results/ml

Python API

from cellitac import run_full_pipeline, run_preprocessing, run_model

# Full pipeline
run_full_pipeline(input_dir="~/data", output_dir="my_results")

# Preprocessing only
run_preprocessing(input_dir="~/data", output_dir_python="python_ready_data")

# ML only
run_model(data_dir="python_ready_data", output_dir="ml_results")

# Use the ML class directly
from cellitac.mainModel import scATACMLPipeline
pipeline = scATACMLPipeline(data_dir="python_ready_data", output_dir="ml_results")
pipeline.run_complete_pipeline()

Input Files

File	Extension	Required
Feature-barcode matrix	`.h5`	✅ Yes
ATAC fragments	`.tsv.gz`	✅ Yes
Fragments index	`.tsv.gz.tbi`	✅ Yes
Peaks BED file	`.bed`	✅ Yes
Per-barcode QC metrics	`.csv`	⭕ Optional

Output Files

File	Description
`ml_pipeline_report.json`	Full JSON report
`model_performance_summary.csv`	Accuracy / F1 / AUC per model
`detailed_model_results.xlsx`	Per-class metrics, CV results
`model_performance_comparison.png`	Bar chart comparison
`confusion_matrices.png`	Confusion matrices
`class_distribution_analysis.png`	Cell type distribution
`class_balancing_comparison.png`	Before/after SMOTE
`feature_importance.png`	RF + XGBoost top 20 features
`simple_feature_heatmap.png`	Feature importance heatmap
`overfitting_analysis.png`	CV train vs validation
`learning_curves.png`	Learning curves per model
`performance_radar.png`	Radar chart
`feature_distributions.png`	Violin plots
`class_separation_pca.png`	PCA scatter
`basic_tf_network.png`	Feature–cell-type network

Package Structure

cellitac/
├── src/cellitac/
│   ├── __init__.py          # Public API
│   ├── config.py            # Parameters (paths, QC thresholds, ML hyperparams)
│   ├── pipeline.py          # run_preprocessing, run_model, run_full_pipeline
│   ├── preprocessing.py     # R preprocessing via rpy2
│   ├── mainModel.py         # scATACMLPipeline class (19-step ML pipeline)
│   ├── cli.py               # cellitac / cellitac-preprocess / cellitac-model
│   └── rscripts/
│       ├── team1_rna.R      # Seurat + SingleR
│       └── team2_atac.R     # Signac
├── tests/
│   └── test_model.py
├── pyproject.toml
└── README.md

Troubleshooting

Problem	Solution
`conda activate cellitac` not working	Run `conda init` then restart terminal
R packages fail to install	Make sure you installed from conda first (Step 2) before BiocManager (Step 3)
`hdf5r` error	Run `conda install -c conda-forge hdf5 r-hdf5r -y`
`peak_region_fragments not found`	Normal for some datasets — pipeline continues automatically
`slot` deprecated error	Make sure you have the latest cellitac version: `pip install --upgrade cellitac`

Tests

pip install cellitac[dev]
pytest tests/ -v

Contributors

📧 1. Rana H. Abu-Zeid — ranahamed2111@gmail.com 📧 2. Syrus Semawule — semawulesyrus@gmail.com 📧 3. Emmanuel Aroma — emmatitusaroma@gmail.com 📧 4. Toheeb Jumah — jumahtoheeb@gmail.com 📧 5. Derek Reiman, Ph.D. — dreiman@ttic.edu 📧 6. Olaitan I. Awe, Ph.D. — laitanawe@gmail.com

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.6

Apr 3, 2026

1.0.5

Mar 27, 2026

This version

1.0.4

Mar 1, 2026

1.0.3

Mar 1, 2026

1.0.2

Feb 28, 2026

1.0.1

Feb 21, 2026

1.0.0

Feb 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cellitac-1.0.4.tar.gz (32.5 kB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cellitac-1.0.4-py3-none-any.whl (31.6 kB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file cellitac-1.0.4.tar.gz.

File metadata

Download URL: cellitac-1.0.4.tar.gz
Upload date: Mar 1, 2026
Size: 32.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for cellitac-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`2abc0de03e3728ecf2eb62a86baf93d672d8b79f155bc835cd115ae4695b1bf2`
MD5	`8fb39b9a26af8604048a09bff4540c19`
BLAKE2b-256	`b26fe8ec892e8b99b9ed74fb1c62510f3a8f2e71fa27b3f0b444866c8f1742ff`

See more details on using hashes here.

File details

Details for the file cellitac-1.0.4-py3-none-any.whl.

File metadata

Download URL: cellitac-1.0.4-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 31.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for cellitac-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c16a1da8bb29c559d3f76d72e491e8d4f94055957db212b29f01605bf1a1dd0b`
MD5	`f8bb0875bd98b89a697f67312cae4c77`
BLAKE2b-256	`10661dc33f982bae78efcc7f2e761f981a9e0b82508b52f2d0e28d26548f0d29`

See more details on using hashes here.

cellitac 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cellitac

What It Does

Requirements

Installation

Step 1 — Create a Conda environment

Step 2 — Install R and core R libraries via conda

Step 3 — Install remaining R packages (takes 10–30 min)

Step 4 — Install cellitac

Step 5 — Verify installation

Quick Start

Download test data (PBMC 3k cells, ~560 MB)

Run the pipeline

Full Dataset (PBMC 10k)

Usage

Command Line

Python API

Input Files

Output Files

Package Structure

Troubleshooting

Tests

Contributors

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes