Python pipeline for Nanostring nCounter data analysis

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

ncountr

A Python pipeline for analyzing Nanostring nCounter gene expression data — from raw instrument files to differential expression, pathway scoring, and publication-ready figures.

11 R packages exist for nCounter data, but until now there was no Python option. ncountr fills that gap.

The problem ncountr solves

The Nanostring nCounter is a mid-throughput gene expression platform used in translational research, clinical trials, and biomarker studies. Unlike RNA-seq, it directly counts individual mRNA molecules for a pre-selected panel of genes (typically 100–800), without amplification or library preparation. This makes the data simpler to work with, but it still requires platform-specific normalization and quality control before you can trust the results.

Each nCounter run produces one .RCC file per sample — a proprietary text format containing raw molecule counts for your target genes, along with built-in controls:

Positive controls (synthetic RNA spikes at known concentrations) to verify assay linearity
Negative controls (no-template probes) to measure background noise
Housekeeping genes (stably expressed reference genes) to correct for RNA input differences

Most researchers handle these steps with NanoString's nSolver software (Windows GUI) or the NanoStringNorm R package. ncountr provides a Python alternative that runs from the command line or as a library, produces all standard QC and normalization steps, and adds differential expression, gene set scoring, and optional cross-platform validation in a single reproducible config file.

Installation

pip install ncountr

# With cross-platform validation support (adds scanpy/anndata):
pip install ncountr[crossplatform]

Requires Python 3.9+.

What's new

to_anndata() export — convert nCounter data to AnnData for scanpy/scverse integration
ncountr fetch-geo GSE275334 — download RCC files directly from NCBI GEO
sample_id_from="filename" — extract sample IDs from filenames when internal RCC IDs are inconsistent
Documentation — Sphinx docs with API reference and CLI docs
5 validated vignettes — reproduced published results from GEO datasets (see below)

Validated on real-world data

ncountr has been tested against 5 published nCounter datasets totaling 1,458 samples:

Dataset	Panel	Samples	Key result
GSE275334	Immune Exhaustion (773 genes)	47	Long COVID / ME/CFS 3-group design
GSE140901	PanCancer Immune (730 genes)	24	ICI responder vs non-responder
GSE117751	Human Immunology (579 genes)	42	AIR vs RP vs Control (also in NanoTube tutorials)
GSE268012	Human Metabolism (748 genes)	24	IFN factorial: 232 DE genes for IFNβ
GSE74821	PAM50 Custom (50 genes)	1,321	Stress test: 1,321 samples parsed in 3s

Typical workflow

A standard nCounter analysis follows four steps. ncountr handles all of them, either through a single config-driven pipeline or as individual commands.

1. Parse the raw data

nCounter instruments write one .RCC file per sample. ncountr reads a directory of these files and organizes them into a structured experiment object:

ncountr parse --rcc-dir /path/to/RCC/ --id-pattern '(\d+)'

The --id-pattern regex extracts sample IDs from filenames (e.g., Sample_01_Lung.RCC becomes sample 01).

2. Quality control

Before trusting any results, you need to verify that the assay worked:

ncountr qc --counts raw_counts.csv

This checks four things per sample and produces a 4-panel summary figure:

FOV ratio — Did the scanner image enough fields of view? (threshold: >75%)
Positive control linearity — Do the spike-in controls track their expected concentrations? (R² > 0.95)
Negative background — How much non-specific signal is present?
Housekeeping stability — Are the reference genes consistent across samples?

Samples failing these checks may need to be excluded or flagged.

3. Normalize

Raw nCounter counts vary between samples due to differences in RNA input, hybridization efficiency, and imaging. Normalization corrects for these technical factors in two stages:

Positive control normalization — scales each sample so the synthetic spike-ins match their expected ratios, correcting for assay-level variation
Housekeeping normalization — further adjusts for RNA input differences using stably expressed reference genes

ncountr normalize --counts raw_counts.csv --method pos_hk

Three methods are available: pos_only, pos_hk (recommended default), and pos_hk_bg (adds background subtraction based on negative controls).

4. Differential expression and downstream analysis

With normalized counts, you can compare groups:

ncountr de --counts normalized.csv --groups treated:S1,S2,S3 control:S4,S5,S6

This runs a per-gene statistical test (Mann-Whitney U by default, or t-test), applies FDR correction, and generates a volcano plot. Genes of interest — such as interferon pathway genes — can be highlighted on the plot for quick visual assessment.

Running the full pipeline

Rather than running each step separately, you can define everything in a single YAML config:

# Generate a starter config
ncountr init > my_config.yaml

# Edit it with your sample info, then run
ncountr run my_config.yaml

A minimal config looks like this:

input:
  rcc_dirs:
    - /path/to/RCC/files
  file_pattern: '*.RCC'
  sample_id_pattern: '(\d+)'

output:
  directory: ./results
  figure_format: png
  figure_dpi: 200

samples:
  metadata:
    'S1': { group: treated }
    'S2': { group: treated }
    'S3': { group: control }
    'S4': { group: control }
  group_column: group
  comparison: [treated, control]

normalization:
  method: pos_hk

de:
  test: mannwhitneyu
  correction: fdr_bh

gene_sets:
  IFN_JAKSTAT: builtin

This will parse your RCC files, run QC, normalize, test for differential expression between treated and control, score each sample for IFN/JAK-STAT pathway activity, and save all results and figures to ./results/.

Python API

For integration into notebooks or custom scripts:

import ncountr

# Load data
experiment = ncountr.read_rcc("/path/to/RCC/", sample_id_pattern=r"(\d+)")

# QC and normalize
ncountr.qc(experiment)
ncountr.normalize(experiment, method="pos_hk")

# Differential expression
de_results = ncountr.de(
    experiment,
    group_a=["S1", "S2", "S3"],
    group_b=["S4", "S5", "S6"],
)

# Gene set scoring
scores = ncountr.score_gene_set(experiment, gene_set="IFN_JAKSTAT")

# Access built-in gene sets
ncountr.list_gene_sets()          # ['IFN_JAKSTAT']
ncountr.get_gene_set("IFN_JAKSTAT")  # list of 48 genes

Plotting

from ncountr.plotting import plot_qc, plot_volcano, plot_pathway_scores

# 4-panel QC summary
plot_qc(experiment, output="qc_summary.png")

# Volcano with IFN pathway genes highlighted
plot_volcano(
    de_results,
    highlight_genes=ncountr.get_gene_set("IFN_JAKSTAT"),
    highlight_label="IFN/JAK-STAT genes",
    highlight_color="gold",
    output="volcano.png",
)

# Pathway scores by group
plot_pathway_scores(scores, output="ifn_scores.png")

Download data from GEO

ncountr fetch-geo GSE275334 -o data/

Or from Python:

from ncountr.io.geo import fetch_geo
rcc_dir = fetch_geo("GSE275334", output_dir="data/")

Export to AnnData (scverse integration)

adata = ncountr.to_anndata(experiment)
# adata.X = normalized counts (samples x genes)
# adata.layers["raw"] = raw counts
# adata.obs = sample metadata + QC results + lane info
# adata.var["housekeeping"] = housekeeping gene flag

This enables seamless downstream analysis with scanpy, squidpy, decoupler, and other scverse tools.

Cross-platform validation

When you have expression data from another platform (e.g., single-cell RNA-seq) on the same samples, ncountr can assess how well the two platforms agree. This is useful for confirming that nCounter results are not artifacts of the technology.

Enable it in your config:

cross_platform:
  enabled: true
  external_data:
    path: /path/to/adata.h5ad     # scanpy h5ad, or CSV/TSV
    format: h5ad
    pseudobulk_group_by: Sample   # aggregate single cells by sample
  sample_mapping:
    'S1': 'S1'                    # nCounter ID → external ID
    'S2': 'S2'
  negative_control_samples: ['697']  # non-target species controls

This adds:

Per-sample correlation — Spearman/Pearson r between platforms for shared genes
DE concordance — what fraction of differentially expressed genes change in the same direction on both platforms
Cell composition proxy — how well nCounter marker gene expression tracks cell type proportions from the other platform
Cross-reactivity assessment — identifies genes with non-specific binding using negative control samples

Install the optional dependencies: pip install ncountr[crossplatform]

Built-in gene sets

Name	Genes	Description
`IFN_JAKSTAT`	48	Interferon and JAK-STAT signaling pathway (MX1, IFIT1-3, ISG15, STAT1/2, IRF1/7, OAS1-3, CXCL9-11, GBP1-5, JAK1/2, ...)

Cell type markers are also available for composition estimation:

Cell type	Markers
T cells	CD3D, CD3E, CD3G
CD4 T	CD4
CD8 T	CD8A, CD8B
Monocytes	CD14, FCGR3A, CD68
B cells	MS4A1, CD19, CD79A
NK cells	GNLY, NKG7, KLRD1

Custom gene sets can be defined directly in the config or passed as Python lists.

Configuration reference

Full list of config options:

input:
  rcc_dirs: []              # directories containing .RCC files
  file_pattern: '*.RCC'     # glob pattern for RCC files
  sample_id_pattern: '(\d+)'  # regex with one capture group for sample ID

output:
  directory: ./results
  figure_format: png        # png, pdf, svg
  figure_dpi: 200

samples:
  metadata:                 # per-sample key-value pairs
    'S1': { group: treated, batch: A }
  group_column: group       # metadata field used for comparisons
  comparison: [treated, control]  # [group_a, group_b] for DE
  exclude: []               # sample IDs to skip
  negative_control_samples: []  # for cross-reactivity (e.g., NSG controls)

qc:
  fov_ratio_threshold: 0.75
  positive_control_r2_threshold: 0.95

normalization:
  method: pos_hk            # pos_only | pos_hk | pos_hk_bg

de:
  test: mannwhitneyu        # mannwhitneyu | ttest
  correction: fdr_bh        # FDR method from statsmodels

volcano:
  highlight_genes: IFN_JAKSTAT  # builtin name or list of genes
  highlight_label: 'IFN/JAK-STAT genes'
  highlight_color: gold

gene_sets:
  IFN_JAKSTAT: builtin
  my_custom_set: [GENE1, GENE2, GENE3]

cross_platform:             # optional, requires scanpy
  enabled: false
  external_data:
    path: /path/to/data
    format: h5ad            # h5ad | csv | tsv
    pseudobulk_group_by: Sample
  sample_mapping: {}

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

zw3672

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jul 7, 2026

0.1.0

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncountr-0.2.0.tar.gz (69.1 kB view details)

Uploaded Jul 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ncountr-0.2.0-py3-none-any.whl (56.1 kB view details)

Uploaded Jul 7, 2026 Python 3

File details

Details for the file ncountr-0.2.0.tar.gz.

File metadata

Download URL: ncountr-0.2.0.tar.gz
Upload date: Jul 7, 2026
Size: 69.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ncountr-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`9062bef1d369081970fc822b5a86b3df1f4557d1aded89ff5eda78720fe1d831`
MD5	`65208891c7eb9402400530d61ae77285`
BLAKE2b-256	`49d105abf23a74c8ff7163a71cf25426432186bce4f724861aa283ac7f0905a8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ncountr-0.2.0.tar.gz:

Publisher: publish.yml on princello/ncountr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ncountr-0.2.0.tar.gz
- Subject digest: 9062bef1d369081970fc822b5a86b3df1f4557d1aded89ff5eda78720fe1d831
- Sigstore transparency entry: 2105652351
- Sigstore integration time: Jul 7, 2026
Source repository:
- Permalink: princello/ncountr@d05edda0b70a0bd540fc92ca09ab905b367de7d5
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/princello
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d05edda0b70a0bd540fc92ca09ab905b367de7d5
- Trigger Event: release

File details

Details for the file ncountr-0.2.0-py3-none-any.whl.

File metadata

Download URL: ncountr-0.2.0-py3-none-any.whl
Upload date: Jul 7, 2026
Size: 56.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ncountr-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b18e765ea57e6f2a7b9e5331061d19d60e698cccc2546c695d5941ecb178ea1`
MD5	`1cc045c8cca5150ef66a81f71559ae4a`
BLAKE2b-256	`aa0cdee7a8515581b127babf86445d4ea90f83e57c5a30ab7ebf97f9362a8d4b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ncountr-0.2.0-py3-none-any.whl:

Publisher: publish.yml on princello/ncountr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ncountr-0.2.0-py3-none-any.whl
- Subject digest: 3b18e765ea57e6f2a7b9e5331061d19d60e698cccc2546c695d5941ecb178ea1
- Sigstore transparency entry: 2105652403
- Sigstore integration time: Jul 7, 2026
Source repository:
- Permalink: princello/ncountr@d05edda0b70a0bd540fc92ca09ab905b367de7d5
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/princello
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d05edda0b70a0bd540fc92ca09ab905b367de7d5
- Trigger Event: release

ncountr 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ncountr

The problem ncountr solves

Installation

What's new

Validated on real-world data

Typical workflow

1. Parse the raw data

2. Quality control

3. Normalize

4. Differential expression and downstream analysis

Running the full pipeline

Python API

Plotting

Download data from GEO

Export to AnnData (scverse integration)

Cross-platform validation

Built-in gene sets

Configuration reference

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance