Skip to main content

A package for transcriptional regulatory network analysis

Project description

genecircuitry

A Python package for transcriptional regulatory network analysis.

Installation

genecircuitry requires Python >=3.9, <3.11. Most dependencies are available on conda-forge and bioconda. Two optional analysis engines — CellOracle and hotspotsc — are only available via pip and must be installed as a separate step after the conda environment is set up.


Option 1 — Pixi (recommended)

Pixi manages conda and pip dependencies together in a single reproducible environment. It is the easiest and cleanest way to get a fully working installation.

# Install pixi (one-time, see https://prefix.dev/docs/pixi/installation)
curl -fsSL https://pixi.sh/install.sh | bash

# Clone the repository
git clone https://github.com/samuelecancellieri/genecircuitry.git
cd genecircuitry

# Create the environment and install all dependencies (conda + pip) in one step
pixi install

# Run the pipeline inside the pixi environment (genecircuitry --help)
pixi run run

# Run the pipeline inside the pixi environment (genecircuitry test pipeline)
pixi run genecircuitry

# Or drop into an interactive shell
pixi shell

Developer environment (adds pytest, black, flake8, mypy):

pixi install -e dev
pixi run -e dev test

Option 2 — Conda

Install genecircuitry and its conda-available dependencies from bioconda and conda-forge, then install the pip-only dependencies manually.

# 1. Create a fresh environment (Python 3.9 is recommended)
conda create -n genecircuitry python=3.9
conda activate genecircuitry

# 2. Install genecircuitry and all conda-available dependencies
conda install -c bioconda -c conda-forge genecircuitry

# 3. Install the pip-only optional analysis engines
#    (CellOracle for GRN inference, hotspotsc for gene modules)
pip install celloracle==0.18.0 hotspotsc==1.1.3

Skip step 3 if you only need preprocessing/QC and do not require GRN inference or gene module analysis.


Option 3 — pip / venv

# Clone the repository
git clone https://github.com/samuelecancellieri/genecircuitry.git
cd genecircuitry

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install the package with all optional dependencies
pip install -e ".[grn,hotspot]"

# Or install core only (no CellOracle / hotspotsc)
pip install -e .

# Install with development dependencies
pip install -e ".[dev]"

Option 4 — Docker

A pre-built image is available that ships all dependencies (including CellOracle and hotspotsc) and works out of the box:

# Pull and run (bind-mount your data and output directories)
docker run --rm \
    -v /path/to/your/data:/data \
    -v /path/to/output:/output \
    zanathos/genecircuitry:latest \
    --input /data/your_data.h5ad --output /output

# Check available options
docker run --rm zanathos/genecircuitry:latest --help

Build the image locally from source:

git clone https://github.com/samuelecancellieri/genecircuitry.git
cd genecircuitry
docker build -t genecircuitry .
docker run --rm genecircuitry --help

Quick Start

Complete Analysis Pipeline

Run the full analysis pipeline from preprocessing through CellOracle and Hotspot:

# Run with example dataset (default)
python run_complete_analysis.py

# Or use the script directly
python examples/complete_pipeline.py

# Run with your own data
python run_complete_analysis.py --input your_data.h5ad --output results

# Skip specific analyses
python run_complete_analysis.py --skip-celloracle  # Skip GRN inference
python run_complete_analysis.py --skip-hotspot      # Skip module identification

# Custom parameters
python run_complete_analysis.py --seed 123 --n-jobs 16 --min-genes 300

# See all options
python run_complete_analysis.py --help

Modular Execution & Parallel Processing

NEW: The pipeline now supports modular execution and parallel processing:

# Run only specific steps
python examples/complete_pipeline.py \
    --input data.h5ad \
    --output results \
    --steps load preprocessing clustering

# Stratified analysis in parallel (multiple cell types/clusters)
python examples/complete_pipeline.py \
    --input data.h5ad \
    --output results \
    --cluster-key-stratification celltype \
    --parallel \
    --n-jobs 4

# Resume from checkpoints
python examples/complete_pipeline.py \
    --input data.h5ad \
    --output results \
    --steps celloracle hotspot  # Skips preprocessing if checkpoint exists

Available step names: load, preprocessing, stratification, clustering, celloracle, hotspot, grn_analysis, summary

Parallel benefits:

  • Process multiple stratifications simultaneously
  • Linear speedup with number of workers
  • Automatic checkpoint integration
  • See Controller Guide for details

The complete pipeline includes:

  1. Data Loading - Load h5ad/h5 files or use example dataset
  2. Quality Control - Cell and gene filtering with QC metrics
  3. Preprocessing - Normalization, HVG selection, PCA, clustering
  4. CellOracle GRN Inference - Gene regulatory network prediction
  5. Hotspot Module Analysis - Spatially autocorrelated gene modules
  6. Summary Report - Comprehensive analysis summary with output files

Output structure:

output/
├── preprocessed_adata.h5ad        # Preprocessed dataset
├── analysis_summary.txt           # Analysis report
├── celloracle/
│   ├── oracle_object.celloracle.oracle
│   └── grn_links.celloracle.links
└── hotspot/
    ├── autocorrelation_results.csv
    ├── significant_genes.csv
    └── gene_modules.csv
figures/
├── qc/                            # QC plots
└── grn_analysis/                  # GRN visualizations

Usage

Configuration and Reproducibility

Set random seed and configure default parameters:

from genecircuitry import set_random_seed, config

# Set random seed for reproducibility
set_random_seed(42)

# View all configuration parameters
config.print_config()

# Update specific parameters
config.update_config(
    QC_MIN_GENES=300,
    QC_MIN_COUNTS=1000,
    PLOT_DPI=600
)

Quality Control

Perform comprehensive quality control on single-cell RNA-seq data:

import scanpy as sc
from genecircuitry import config
from genecircuitry.preprocessing import perform_qc, plot_qc_violin, plot_qc_scatter

# Load your data
adata = sc.read_h5ad('your_data.h5ad')

# Perform QC - uses config defaults automatically
adata_qc = perform_qc(adata)
# Equivalent to: min_genes=200, min_counts=500, pct_counts_mt_max=20.0, min_cells=10

# Or override specific parameters
adata_qc = perform_qc(
    adata,
    min_genes=300,      # Override
    min_counts=1000     # Override
    # Other params use config defaults
)

Complete Workflow Example

import scanpy as sc
from genecircuitry import set_random_seed, config
from genecircuitry.preprocessing import perform_qc, perform_grn_pre_processing

# 1. Set up reproducibility
set_random_seed(42)

# 2. Optionally customize config
config.update_config(QC_MIN_GENES=300, QC_MIN_COUNTS=1000)

# 3. Load and process data
adata = sc.read_h5ad('your_data.h5ad')
adata = perform_qc(adata)  # Uses config defaults

# 4. Normalize
sc.pp.normalize_total(adata, target_sum=config.NORMALIZE_TARGET_SUM)
sc.pp.log1p(adata)

# 5. GRN preprocessing
adata = perform_grn_pre_processing(adata)  # Uses config defaults

CellOracle Integration

Perform gene regulatory network inference using CellOracle:

from genecircuitry.celloracle_processing import (
    create_oracle_object,
    run_PCA,
    run_KNN,
    run_links
)

# Note: Requires CellOracle installation
# pip install celloracle

# Create Oracle object
oracle = create_oracle_object(
    adata=adata,
    cluster_column_name='leiden',
    embedding_name='X_umap',
    raw_count_layer='raw_counts'
)

# Perform PCA and KNN imputation
oracle = run_PCA(oracle)
run_KNN(oracle, n_comps=50)

# Infer regulatory links
links = run_links(
    oracle,
    cluster_column_name='leiden',
    p_cutoff=0.001
)

# Save results
oracle.to_hdf5('oracle_object.celloracle.oracle')
links.to_hdf5('grn_links.celloracle.links')

Running Examples

# Activate environment
source venv/bin/activate

# Run QC example
python examples/example_qc.py

# Run configuration example
python examples/config_example.py

# Run CellOracle workflow (requires celloracle)
python examples/celloracle_workflow.py

# Run quick demo
python examples/quick_demo.py

# Test config integration
python examples/test_config_integration.py

Features

  • Configuration Management: Centralized configuration for reproducibility
    • Global random seed setting
    • Default parameters for all analysis steps
    • Easy parameter updates
    • Configuration profiles for different analysis types
  • Quality Control: Comprehensive QC with multiple visualization options
    • Cell filtering based on gene count, total counts, and mitochondrial percentage
    • Automated QC metrics calculation
    • Before/after filtering comparison plots
    • Violin and scatter plots for detailed inspection
  • Data Preprocessing: Complete preprocessing pipeline
    • Normalization and scaling for single-cell RNA-seq data
    • Highly variable genes selection
    • PCA and dimensionality reduction
    • Neighborhood graph construction
  • CellOracle Integration: Gene regulatory network inference
    • Oracle object creation with raw or normalized counts
    • Automated PCA component selection
    • KNN imputation for noise reduction
    • Regulatory link inference with statistical filtering
    • Network visualization and quality metrics
  • Gene Regulatory Network Analysis: Network construction and analysis tools
    • TF-target gene relationship inference
    • Network topology analysis
    • Cluster-specific GRN construction

Documentation

Development

Running Tests

# Run all tests
pytest tests/

# Run specific test file
pytest tests/test_config.py -v
pytest tests/test_preprocessing.py -v
pytest tests/test_celloracle.py -v

# Run with coverage
pytest tests/ --cov=genecircuitry --cov-report=html

Code Quality

# Format code
black genecircuitry/

# Check linting
flake8 genecircuitry/

# Type checking
mypy genecircuitry/

Testing Notes

  • CellOracle tests use mocking when CellOracle is not installed
  • Some tests are skipped if CellOracle is not available
  • Use pytest -v for verbose output
  • Tests cover all major functionality with both unit and integration tests

License

MIT License

Authors

Samuele Cancellieri

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genecircuitry-0.1.9.tar.gz (159.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genecircuitry-0.1.9-py3-none-any.whl (96.2 kB view details)

Uploaded Python 3

File details

Details for the file genecircuitry-0.1.9.tar.gz.

File metadata

  • Download URL: genecircuitry-0.1.9.tar.gz
  • Upload date:
  • Size: 159.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for genecircuitry-0.1.9.tar.gz
Algorithm Hash digest
SHA256 de4326a13722df1dd547aae074b2ef451529b7a20fd220e323d68640efce6b6f
MD5 851dcc84a0abe45910a8a42ea3bf28bf
BLAKE2b-256 1b3fafe819fc96156ce86916f44c2f1e65f8c489520ba9d391b9432fe4987a26

See more details on using hashes here.

Provenance

The following attestation bundles were made for genecircuitry-0.1.9.tar.gz:

Publisher: publish.yml on samuelecancellieri/GeneCircuitry

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file genecircuitry-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: genecircuitry-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 96.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for genecircuitry-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 26d79f12b8a3e37b17bc9e69ecaa0cebb6df63a5d8d211cc11b5b480b9bee4fe
MD5 ebaec3525b447cfa929501b85c648d35
BLAKE2b-256 9cd7f17448e7b18d06e11a550700c81d8b4588856c507bccd86f3e8cba0d6ca4

See more details on using hashes here.

Provenance

The following attestation bundles were made for genecircuitry-0.1.9-py3-none-any.whl:

Publisher: publish.yml on samuelecancellieri/GeneCircuitry

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page