Skip to main content

Immuno-Oncology Biological Research tools in Python

Project description

PyPI Conda Docker

IOBRpy

IOBRpy is a command-line toolkit for bulk RNA-seq tumor microenvironment (TME) analysis. It wires together FASTQ QC, quantification (Salmon or STAR), matrix assembly, signature scoring, immune deconvolution, clustering, and ligand–receptor scoring.

IOBRpy logo


Documentation

A complete documentation for IOBRpy can be found at https://iobr.github.io/IOBRpy/.


Agent Bootstrap

If you want coding agents to discover iobrpy-cli without repeating it in every prompt, install the packaged agent integrations once:

# Codex: install the bundled global skill plus MCP registration
iobrpy-cli agent install --client codex

# Claude Code: install the managed global memory plus MCP registration
iobrpy-cli agent install --client claude-code

# Configure every supported client in one pass
iobrpy-cli agent install --client all

# Inspect what is already installed without changing anything
iobrpy-cli agent status

By default these commands print a short human-readable summary. Add --json when you want the full machine-readable payload for automation or another agent.

For path-driven agent work, start with:

iobrpy-cli map --path /path/to/data --json

This stage map tells the agent whether the directory is still raw FASTQ input, partially processed, or already ready for downstream TPM/TME analysis, so it can ask whether you want to continue, rerun the current stage, or rerun the full pipeline. The JSON output also includes a scenario card and roadmap position summary so an agent can explain, in plain language, what has already been done and what the next sensible choices are.

What gets installed:

  • Codex: a global iobrpy-fastpath skill under ~/.codex/skills/ plus an MCP server entry in ~/.codex/config.toml.
  • Claude Code: a managed global memory import in ~/.claude/CLAUDE.md, the managed memory file at ~/.claude/iobrpy/CLAUDE.md, and a user-scoped MCP server via claude mcp add ....

The registered server launches iobrpy-cli-mcp through the current Python environment, so agents can call native iobrpy workflows as tools instead of guessing from source files.


Installation

Quick install

# Method 1 : PyPI
pip install iobrpy

# Method 2 : Conda (bioconda via conda-forge + bioconda)
conda install -c conda-forge -c bioconda iobrpy=0.1.8

# Method 3 : Docker
docker pull hhn123123/iobrpy:latest

PyPI

Show full PyPI steps
# Creating a virtual environment is recommended
conda create -n iobrpy python=3.11 -y
conda activate iobrpy
# Update pip
python -m pip install --upgrade pip
# Install iobrpy
pip install iobrpy
# Install fastp, salmon, STAR and MultiQC
# Recommended: use mamba for faster solves (if available)
mamba install -y -c conda-forge -c bioconda \
  fastp \
  salmon \
  star \
  trust4

# If you don't have mamba, use conda instead
conda install -y -c conda-forge -c bioconda \
  fastp \
  salmon \
  star \
  trust4

Conda

Prerequisite (Conda): Please install Miniconda or Anaconda first. We recommend Miniconda.

Show full Conda steps
# Creating a virtual environment is recommended
conda create -n iobrpy python=3.11 -y
conda activate iobrpy
# Install iobrpy 0.1.8 (from bioconda via conda-forge + bioconda)
# Recommended: use mamba for faster solves (if available)
mamba install -y -c conda-forge -c bioconda iobrpy=0.1.8

# If you don't have mamba, use conda instead
conda install -y -c conda-forge -c bioconda iobrpy=0.1.8

Docker

Docker Hub website: Docker Hub

Show Docker pull
# Option 1: Pull the latest image from Docker Hub
docker pull hhn123123/iobrpy:latest
# Option 2: Offline install (from GitHub Release)
# 1) Download iobrpy.tar.gz from https://github.com/IOBR/IOBRpy/releases/tag/v1.0.0
# 2) Change to the directory where the archive is saved and load the image
cd /path/to/iobrpy.tar.gz
docker load -i iobrpy.tar.gz

Features

End-to-End Pipeline Runner

  • runall — A single command that wires the full Salmon or STAR pipeline end-to-end and writes the standardized layout: The pipeline creates the following directories, in order: 01-qc/, 02-salmon/ or 02-star/, 03-tpm/, 04-signatures/, 05-tme/, and 06-LR_cal/.

All-in-one TME profiling

  • tme_profile - A single command that inputs a TPM (genes×samples) matrix, performs signature scoring, runs six immune deconvolution methods, merges their outputs, and computes ligand–receptor scores, using the functions calculate_sig_score, cibersort, IPS, estimate, mcpcounter, quantiseq, epic, and LR_cal.

Preprocessing

  • fastq_qc — Parallel FASTQ QC/trimming via fastp, with per-sample HTML/JSON and an optional MultiQC summary report under 01-qc/multiqc_report/. Resume-friendly and prints output paths first.

Salmon submodule (quantification, merge, and TPM)

  • batch_salmon — Batch salmon quant on paired-end FASTQs; safe R1/R2 inference; per-sample quant.sf; progress and preflight checks (salmon version, index meta).
  • merge_salmon — Recursively collect per-sample quant.sf and produce two matrices: TPM and NumReads.
  • prepare_salmon — Clean up Salmon outputs into a TPM matrix; strip version suffixes; keep symbol/ENSG/ENST identifiers.

STAR submodule (alignment, counts, and TPM)

  • batch_star_count — Batch STAR alignment with --quantMode GeneCounts, sorted BAM + _ReadsPerGene.out.tab; resume-friendly summary.
  • merge_star_count — Merge multiple _ReadsPerGene.out.tab into one wide count matrix.
  • count2tpm — Convert counts to TPM (supports Ensembl/Entrez/Symbol/MGI; optional effective length CSV).

Expression Annotation & Mouse to Human Mapping & log2(x+1) (Optional)

  • anno_eset — Harmonize/annotate an expression matrix (choose symbol/probe columns; deduplicate; aggregation method).
  • mouse2human_eset — Convert mouse gene symbols to human gene symbols. Supports two modes: matrix mode (rows = genes) or table mode (input contains a symbol column).
  • log2_eset — Apply log2(x+1) to a genes × samples expression matrix.

Pathway / signature scoring

  • calculate_sig_score — Sample‑level signature scores via pca, zscore, ssgsea, or integration. Supports the following signature groups (space‑ or comma‑separated), or all to merge them:
    • go_bp, go_cc, go_mf
    • signature_collection, signature_tme, signature_sc, signature_tumor, signature_metabolism
    • kegg, hallmark, reactome

Immune deconvolution and scoring

  • cibersort — CIBERSORT wrapper/implementation with permutations, quantile normalization, absolute mode.
  • quantiseq — quanTIseq deconvolution with lsei or robust norms (hampel, huber, bisquare); tumor‑gene filtering; mRNA scaling.
  • epic — EPIC cell fractions using TRef/BRef references.
  • estimate — ESTIMATE immune/stromal/tumor purity scores.
  • mcpcounter — MCPcounter infiltration scores.
  • IPS — Immunophenoscore (AZ/SC/CP/EC + total).
  • deside — Deep learning–based deconvolution (requires pre‑downloaded model; supports pathway‑masked mode via KEGG/Reactome GMTs).

Clustering / decomposition

  • tme_cluster — k‑means with automatic k via KL index (Hartigan–Wong), feature selection and standardization.
  • nmf — NMF‑based clustering (auto‑selects k; excludes k=2) with PCA plot and top features.

Ligand–receptor

  • LR_cal — Ligand–receptor interaction scoring using cancer‑type specific networks.

Input Requirements

  • FASTQ layout: paired-end by default. Filenames end with *_1.fastq.gz / *_2.fastq.gz (configurable via --suffix1).
  • Expression matrix orientation: genes × samples by default.
  • Output file delimiters: automatically inferred from the file extension; .csv and .tsv/.txt are recommended.

Command‑line usage

From FASTQ to TME - runall

How runall passes options

runall defines a small set of top-level options (e.g., --mode/--outdir/--fastq/--threads/--batch_size). Any unrecognized options are forwarded to the corresponding sub-steps. This keeps runall flexible as sub-commands evolve.

Below are two fully wired workflows handled by iobrpy runall.

Salmon mode

iobrpy runall \
  --mode salmon \
  --outdir "/path/to/outdir" \
  --fastq "/path/to/fastq" \
  --threads 8 \
  --batch_size 1 \
  --index "/path/to/salmon/index" \
  --project MyProj

STAR mode

iobrpy runall \
  --mode star \
  --outdir "/path/to/outdir" \
  --fastq "/path/to/fastq" \
  --threads 8 \
  --batch_size 1 \
  --index "/path/to/star/index" \
  --project MyProj

Option legend for the runall examples

Common options

Flag Purpose
--mode {salmon / star} Select backend (Salmon quant vs. STAR align+count)
--outdir <DIR> Root output directory (creates the standardized layout)
--fastq <DIR> Raw FASTQ dir
--index <DIR> Salmon : path to Salmon index; STAR : path to STAR index
--project <STR> Prefix for merged outputs
--threads <INT> / --batch_size <INT> Global concurrency/batching

Expected layout

# Salmon mode:
/path/to/outdir
|-- 01-qc
|   |-- <sample>_1.fastq.gz
|   |-- <sample>_2.fastq.gz
|   |-- <sample>_fastp.html
|   |-- <sample>_fastp.json
|   |-- <sample>.task.complete
|   `-- multiqc_report
|       `-- multiqc_fastp_report.html
|-- 02-salmon
|   |-- <sample>
|   |   `-- quant.sf
|   |-- MyProj_salmon_count.tsv.gz
|   `-- MyProj_salmon_tpm.tsv.gz
|-- 03-tpm
|   |-- prepare_salmon.csv
|   `-- tpm_matrix.csv
|-- 04-signatures
|   `-- calculate_sig_score.csv
|-- 05-tme
|   |-- cibersort_results.csv
|   |-- epic_results.csv
|   |-- quantiseq_results.csv
|   |-- IPS_results.csv
|   |-- estimate_results.csv
|   |-- mcpcounter_results.csv
|   `-- deconvo_merged.csv
`-- 06-LR_cal
    `-- lr_cal.csv
# STAR mode:
/path/to/outdir
|-- 01-qc
|   |-- <sample>_1.fastq.gz
|   |-- <sample>_2.fastq.gz
|   |-- <sample>_fastp.html
|   |-- <sample>_fastp.json
|   |-- <sample>.task.complete
|   `-- multiqc_report
|       `-- multiqc_fastp_report.html
|-- 02-star
|   |-- <sample>/
|   |-- <sample>__STARgenome/
|   |-- <sample>__STARpass1/
|   |-- <sample>_STARtmp/
|   |-- <sample>_Aligned.sortedByCoord.out.bam
|   |-- <sample>_Log.final.out
|   |-- <sample>_Log.out
|   |-- <sample>_Log.progress.out
|   |-- <sample>_ReadsPerGene.out.tab
|   |-- <sample>_SJ.out.tab
|   |-- <sample>.task.complete
|   |-- .batch_star_count.done
|   |-- .merge_star_count.done
|   `-- MyProj.STAR.count.tsv.gz
|-- 03-tpm
|   |-- count2tpm.csv
|   `-- tpm_matrix.csv
|-- 04-signatures
|   `-- calculate_sig_score.csv
|-- 05-tme
|   |-- cibersort_results.csv
|   |-- epic_results.csv
|   |-- quantiseq_results.csv
|   |-- IPS_results.csv
|   |-- estimate_results.csv
|   |-- mcpcounter_results.csv
|   `-- deconvo_merged.csv
`-- 06-LR_cal
    `-- lr_cal.csv

Output Reference

Standard layout (produced by iobrpy runall)

  • 01-qc/ — fastp outputs; a resume flag .fastq_qc.done is written when the step completes.
  • 02-salmon/ or 02-star/ — quantification/alignment + merged matrices; resume flags like .batch_salmon.done, .merge_salmon.done, or .merge_star_count.done.
  • 03-tpm/ — unified TPM matrix tpm_matrix.csv. For Salmon mode it comes from prepare_salmon; for STAR mode it comes from count2tpm.
  • 04-signatures/ — signature scoring results (file: calculate_sig_score.csv).
  • 05-tme/ — deconvolution outputs from multiple methods + deconvo_merged.csv.
  • 06-LR_cal/ — ligand–receptor results lr_cal.csv.

Salmon mode (02-salmon/)

  • Per-sample Salmon folders containing quant.sf (from batch_salmon). A .batch_salmon.done flag is written after completion.
  • Merged matrices (from merge_salmon):
    • <PROJECT>_salmon_tpm.tsv[.gz]
    • <PROJECT>_salmon_count.tsv[.gz]
      A .merge_salmon.done flag is written after completion.
  • 03-tpm/prepare_salmon.csv — cleaned genes × samples TPM matrix produced by prepare_salmon (default --return_feature symbol unless overridden).
  • 03-tpm/tpm_matrix.csvlog2(x+1) matrix produced by log2_eset from prepare_salmon.csv.

STAR mode (02-star/)

  • Per-sample STAR outputs (BAM, logs, *_ReadsPerGene.out.tab, etc.).
  • Merged counts (from merge_star_count):
    • <PROJECT>.STAR.count.tsv.gz . A .merge_star_count.done flag is written after completion.
  • 03-tpm/count2tpm.csv — TPM matrix produced by count2tpm from the merged STAR ReadPerGene/count matrix.
  • 03-tpm/tpm_matrix.csvlog2(x+1) matrix produced by log2_eset from count2tpm.csv.

Signatures (04-signatures/)

  • calculate_sig_score.csv — per-sample pathway/signature scores. Columns correspond to the selected signature set and method (integration, pca, zscore, or ssgsea).

Deconvolution (05-tme/)

Each method writes a single table named <method>_results.csv:

  • cibersort_results.csv — columns suffixed with _CIBERSORT. Note whether --perm and --QN were used.
  • quantiseq_results.csv — quanTIseq fractions. Document the chosen --method {lsei|hampel|huber|bisquare} and flags like --arrays, --tumor, --scale_mrna, --signame.
  • epic_results.csv — EPIC fractions; record the reference profile used (--reference {TRef|BRef|both}).
  • estimate_results.csv — ESTIMATE immune/stromal/purity scores; columns suffixed _estimate.
  • mcpcounter_results.csv — MCPcounter scores; columns suffixed _MCPcounter.
  • IPS_results.csv — IPS sub-scores and total score.

Merged table

  • deconvo_merged.csv — produced by runall after all deconvolution methods finish; normalizes the sample index to a column named ID and outer-joins by sample ID across methods.

Ligand–receptor (06-LR_cal/)

  • lr_cal.csv — ligand–receptor scoring table from LR_cal. Record the --data_type {count|tpm} and the --id_type you used.

Contact / Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iobrpy-0.1.8.tar.gz (52.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iobrpy-0.1.8-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (58.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

File details

Details for the file iobrpy-0.1.8.tar.gz.

File metadata

  • Download URL: iobrpy-0.1.8.tar.gz
  • Upload date:
  • Size: 52.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for iobrpy-0.1.8.tar.gz
Algorithm Hash digest
SHA256 60046891042a276d7bc6426c608920db44be9e1dc1df8f5fd4f3f28faca51104
MD5 62e9a1f7d5890d468df4778e97adfa23
BLAKE2b-256 f386b02fb35bb804b2eb420d94e1f6059d914819bd1f70fd4c97191d89d72f46

See more details on using hashes here.

File details

Details for the file iobrpy-0.1.8-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for iobrpy-0.1.8-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 0c6a80ea9dec0e88c61960cc4f9f8d39c5844a0e709976285f098ba86b1047c3
MD5 28fadc925af9f96b4a8637048dc1814d
BLAKE2b-256 a3bb5df939425ff55b938d392798307c644d50113e7bfdf19b955130a9f08177

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page