Skip to main content

API and CLI to discover, download, and assemble recount3 resources.

Project description

The Clear BSD License with Extra Clause Current library version Supported Python versions CodeCov GitHub project

recount3 is a typed Python library and command-line tool for the recount3 data repository, a uniformly processed collection of RNA-seq studies spanning tens of thousands of human and mouse samples from SRA, GTEx, and TCGA. It discovers, downloads, and assembles recount3 resources into analysis-ready objects. These resources include gene, exon, and junction count matrices, sample metadata, genome annotations, and BigWig coverage files.

The package provides two interfaces:

  • A Python library that assembles count matrices, sample metadata, and genomic coordinates into BiocPy SummarizedExperiment and RangedSummarizedExperiment objects, with recount3-compatible scaling and normalization utilities (approximate read counts, AUC- and mapped-reads-based scaling, and TPM).

  • A command-line tool (recount3) that implements a discover -> manifest -> materialize workflow for scripts and pipelines. It emits JSONL/TSV manifests and materializes resources to a directory or a .zip archive with parallel downloads.

Installation

The core package requires Python 3.10 or newer and depends on NumPy, pandas, and SciPy. Two optional extras enable additional features:

python3 -m pip install recount3                   # core
python3 -m pip install "recount3[biocpy]"         # + SummarizedExperiment builders
python3 -m pip install "recount3[bigwig]"         # + BigWig coverage access
python3 -m pip install "recount3[biocpy,bigwig]"  # everything
  • biocpy (biocframe, genomicranges, summarizedexperiment) is required for create_rse and every helper that returns or operates on a BiocPy object.

  • bigwig (pyBigWig) is required only for BigWig coverage access.

On Windows, substitute py -3 -m pip install .... Upgrade an existing installation with python3 -m pip install --upgrade recount3.

Note: The optional extras have platform constraints. The bigwig extra can be difficult or impossible to install on Windows and macOS. The biocpy extra can be difficult or impossible to install on Windows. The core package and the command-line workflow do not depend on either extra and work on all supported platforms.

The three-layer API

recount3 exposes the same workflow at three levels of abstraction, so a single project can be assembled in one call while multi-project or custom workflows retain full control:

  • High level. create_rse() builds one project into a RangedSummarizedExperiment and performs discovery, downloading, metadata merging, and range assembly in a single call.

  • Mid level. R3ResourceBundle is a filterable container of resources for combining multiple projects, selecting subsets, and stacking matrices.

  • Low level. R3Resource represents a single file and manages its URL, cache entry, and parser.

See the Tutorial for a complete walkthrough.

Quickstart

Python API

Assemble a project into a RangedSummarizedExperiment (requires the recount3[biocpy] extra):

>>> from recount3 import create_rse
>>>
>>> rse = create_rse(
...     project="SRP009615",
...     organism="human",
...     annotation_label="gencode_v26",
... )
>>> rse.shape
(63856, 12)

For multi-project or custom workflows, use the bundle layer to filter resources and stack matrices directly:

>>> from recount3 import R3ResourceBundle
>>>
>>> bundle = R3ResourceBundle.discover(
...     organism="human",
...     data_source="sra",
...     project="SRP009615",
... )
>>> print(f"Found {len(bundle.resources)} resources.")
Found 10 resources.
>>>
>>> gene_counts = bundle.filter(
...     resource_type="count_files_gene_or_exon",
...     genomic_unit="gene",
... ).stack_count_matrices(compat="feature")
>>> gene_counts.shape
(63856, 12)

Command-line tool

Discover resources, write a JSONL manifest, and download in parallel:

# Search for gene-level count files and write a manifest.
recount3 search gene-exon \
    organism=human data_source=sra genomic_unit=gene project=SRP009615 \
    --format=jsonl > manifest.jsonl

# Materialize all resources from the manifest (8 parallel jobs).
recount3 download --from=manifest.jsonl --dest=./downloads --jobs=8

Because both subcommands operate on JSONL via standard streams, search and download compose into a single pipeline without an intermediate file:

recount3 search annotations \
    organism=human genomic_unit=gene annotation_extension=G026 \
    --format=jsonl | \
recount3 download --from=- --dest=./annotations

The bundle subcommands assemble analysis-ready outputs without a Python session. Supported outputs are a stacked count matrix (TSV, gzip-compressed TSV, or Parquet) and a pickled SummarizedExperiment or RangedSummarizedExperiment:

recount3 bundle rse --from=manifest.jsonl --genomic-unit=gene --out=rse.pkl

Note: Read the full documentation on Pages for the complete API reference, the CLI guide, and worked examples.

Data mirrors

recount3 publishes the same relative file layout on several interchangeable public mirrors. recount3 targets this layout rather than any single host, so selecting a different mirror requires only a change to the base URL (the RECOUNT3_URL environment variable, the --base-url CLI flag, or the base_url field of recount3.config.Config):

Mirror

Base URL

Duffel load balancer (default)

http://duffel.rail.bio/recount3/

AWS Open Data

https://recount-opendata.s3.amazonaws.com/recount3/release/

JHU IDIES (Dataverse)

https://data.idies.jhu.edu/recount3/data/

Dependencies

Core (installed automatically):

numpy>=2.0
pandas>=2.2
scipy>=1.13

Optional: BiocPy integration (recount3[biocpy]):

biocframe>=0.7
genomicranges>=0.8
summarizedexperiment>=0.6

Optional: BigWig support (recount3[bigwig]):

pybigwig>=0.3.18

Questions, Feature Requests, and Bug Reports

Please submit questions, feature requests, and bug reports on Issues.

License

This package is distributed under The Clear BSD License with Extra Clause.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recount3-1.1.0.tar.gz (172.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

recount3-1.1.0-py3-none-any.whl (109.5 kB view details)

Uploaded Python 3

File details

Details for the file recount3-1.1.0.tar.gz.

File metadata

  • Download URL: recount3-1.1.0.tar.gz
  • Upload date:
  • Size: 172.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for recount3-1.1.0.tar.gz
Algorithm Hash digest
SHA256 f35f5599b4f5477f23ac773574450e3ab852f2f7dc8ac09d8ee9bb85986d281c
MD5 3b18fb21a51cd1cf6131895d2b3e9db3
BLAKE2b-256 7bddca5df645a80542b8c6b622ef6f43b511ebba149249b13682327eb8437190

See more details on using hashes here.

File details

Details for the file recount3-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: recount3-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 109.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for recount3-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea47ffe34a4e1174ada825dd922af2e5c8ad00066538a2478bd9d069bd00f4a8
MD5 cc284e219a1ce50b74d9ec8c542fc298
BLAKE2b-256 eaf1c475e252548ec803243b3a452f6534702944c8a8c140e3348c81f1581d17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page