Cell type identification using Transcription factor Analysis and Chromatin accessibility
Project description
cellitac
Single-Cell ATAC + RNA Multiome Processing & ML Classification Pipeline
What It Does
| Stage | Steps | Tools |
|---|---|---|
| Preprocessing | RNA QC → normalization → cell-type annotation | Seurat + SingleR (R via rpy2) |
| Preprocessing | ATAC QC → TF-IDF → LSI | Signac (R via rpy2) |
| Preprocessing | RNA + ATAC integration → ML-ready CSVs | Pure Python |
| ML | Imbalance analysis → SMOTE → feature selection | scikit-learn, imbalanced-learn |
| ML | RF + XGBoost + SVM training & evaluation | scikit-learn, xgboost |
| ML | 19 plots + JSON report + XLSX | matplotlib, seaborn, networkx |
Install R packages (run once inside R)
Rscript -e " install.packages('BiocManager') BiocManager::install(c( 'Seurat', 'Signac', 'SingleR', 'celldex', 'SingleCellExperiment', 'GenomicRanges', 'EnsDb.Hsapiens.v75', 'biovizBase', 'hdf5r' )) "
Install Python package
pip install -e ".[dev]"
### Option B – PyPI
```bash
pip install cellitac
# R must be installed separately
Option C – Docker (recommended for full reproducibility)
docker build -t cellitac:1.0.0 -f docker/Dockerfile .
docker run --rm \
-v /your/data:/data \
-v $(pwd)/results:/results \
cellitac:1.0.0 \
--input /data --output /results
Data Download
https://www.10xgenomics.com/datasets/pbmc-from-a-healthy-donor-no-cell-sorting-10-k-1-standard-1-0-0
Required files (place in your --input directory):
pbmc_unsorted_10k_filtered_feature_bc_matrix.h5
pbmc_unsorted_10k_per_barcode_metrics.csv
pbmc_unsorted_10k_atac_fragments.tsv.gz
pbmc_unsorted_10k_atac_fragments.tsv.gz.tbi
pbmc_unsorted_10k_atac_peaks.bed
Usage
Command Line
# Full pipeline (preprocessing + ML)
cellitac --input ~/singlecell/ATAC --output my_results
# Preprocessing only (generates python_ready_data/)
cellitac-preprocess --input ~/singlecell/ATAC --output my_results
# ML only (if you already have python_ready_data/)
cellitac-model --data my_results/python_ready_data --output my_results/ml
Python API
from cellitac import run_full_pipeline, run_preprocessing, run_model
# Full pipeline
run_full_pipeline(input_dir="~/singlecell/ATAC", output_dir="my_results")
# Preprocessing only
run_preprocessing(input_dir="~/singlecell/ATAC", output_dir_python="python_ready_data")
# ML only
run_model(data_dir="python_ready_data", output_dir="ml_results")
# Use the ML class directly for more control
from cellitac.mainModel import scATACMLPipeline
pipeline = scATACMLPipeline(data_dir="python_ready_data", output_dir="ml_results")
pipeline.run_complete_pipeline()
Environment Variables
export SCATAC_INPUT_DIR=~/singlecell/ATAC
export SCATAC_OUT_ML=ml_results
cellitac
Output Files
ml_results/
| File | Description |
|---|---|
ml_pipeline_report.json |
Full JSON report |
model_performance_summary.csv |
Accuracy/F1/AUC per model |
detailed_model_results.xlsx |
Per-class metrics, CV results |
model_performance_comparison.png |
Bar chart comparison |
confusion_matrices.png |
Confusion matrices |
class_distribution_analysis.png |
Cell type distribution |
class_balancing_comparison.png |
Before/after SMOTE |
feature_importance.png |
RF + XGBoost top 20 features |
simple_feature_heatmap.png |
Feature importance heatmap |
overfitting_analysis.png |
CV train vs validation |
learning_curves.png |
Learning curves per model |
performance_radar.png |
Radar chart |
feature_distributions.png |
Violin plots |
class_separation_pca.png |
PCA scatter |
basic_tf_network.png |
Feature–cell-type network |
Package Structure
cellitac/
├── src/cellitac/
│ ├── __init__.py # Public API
│ ├── _version.py
│ ├── config.py # All parameters (paths, QC thresholds, ML hyperparams)
│ ├── pipeline.py # run_preprocessing, run_model, run_full_pipeline
│ ├── preprocessing.py # R preprocessing via rpy2
│ ├── mainModel.py # scATACMLPipeline class (19-step ML pipeline)
│ ├── cli.py # cellitac / cellitac-preprocess / cellitac-model
│ └── rscripts/
│ ├── team1_rna.R # Exact Seurat + SingleR code
│ └── team2_atac.R # Exact Signac code
├── tests/
│ └── test_model.py
├── pyproject.toml
└── README.md
Tests
pip install -e ".[dev]"
pytest tests/ -v
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cellitac-1.0.1.tar.gz.
File metadata
- Download URL: cellitac-1.0.1.tar.gz
- Upload date:
- Size: 30.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f42631dcea7620116223fca63a82fe33b08548574dc4d20d9cf6ab73798fa98
|
|
| MD5 |
649f663e3b31ea648c005a61eddfeb6d
|
|
| BLAKE2b-256 |
b6d83884746d236e183564ff26fd00dbb9f67e239e5ba3f9cee59f550c3d8caa
|
File details
Details for the file cellitac-1.0.1-py3-none-any.whl.
File metadata
- Download URL: cellitac-1.0.1-py3-none-any.whl
- Upload date:
- Size: 30.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2845506775d2515af899d4a7e5987748dd8ee9770f3a5c3e438d0d69a490548a
|
|
| MD5 |
c09f090f6396e0aa2120ed11b0c0b106
|
|
| BLAKE2b-256 |
ff73ba747326663bbb702971c6e896cf91a2637e770eb7e30068de0c08f2033a
|