Mitochondrial ribosome profiling analysis pipeline.

These details have not been verified by PyPI

Project links

Project description

MitoRiboPy

MitoRiboPy is a Python package for mitochondrial ribosome profiling (mt-Ribo-seq) analysis. Starting in v0.3.0 it spans the full pipeline from raw FASTQ through translation-efficiency integration with paired RNA-seq:

mitoribopy align — FASTQ → BAM → BED6 + per-sample read counts (cutadapt + bowtie2 + umi_tools + pysam)
mitoribopy rpf — BED/BAM → offsets, translation-profile, codon usage, coverage plots
mitoribopy rnaseq — DE table (DESeq2 / Xtail / Anota2Seq) + rpf outputs → TE and ΔTE tables + plots, SHA256 reference-consistency gate
mitoribopy all — end-to-end orchestrator with a shared config file and a composed run_manifest.json

Highlights

Subcommand CLI (align / rpf / rnaseq / all) with shared --config, --dry-run, --threads, --log-level
Config files in JSON, YAML, or TOML (auto-detected by path suffix)
Kit-aware FASTQ trimming: truseq_smallrna, nebnext_smallrna, nebnext_ultra_umi, qiaseq_mirna, or explicit --adapter
Adapter auto-detection (--adapter-detection auto, default): scans the head of the FASTQ against every known preset and either picks the matching one or (in strict mode) hard-fails on mismatch — catches the silent failure where a wrong --kit-preset drops ~99% of reads as "too long"
Strand-aware mt-transcriptome alignment (--library-strandedness forward by default) so ND5 / ND6 antisense overlap is resolved by construction on Path A (transcriptome reference)
Deduplication safe by default: --dedup-strategy auto picks UMI-aware when UMIs are present and skips otherwise; mark-duplicates is behind a long confirmation flag because coordinate-only dedup destroys codon-occupancy signal on low-complexity mt-Ribo-seq libraries
BAM input to rpf via pysam (no samtools / bedtools PATH dependency)
SHA256 reference-consistency gate on rnaseq: Ribo-seq and RNA-seq sides must be aligned to the identical transcript reference; mismatches are a hard fail
Strain presets (-s h / -s y / -s vm / -s ym / -s custom): human + yeast ship a built-in annotation; vm / ym / custom pick up the matching codon table but require user-supplied --annotation_file and an explicit -rpf range
Footprint-class defaults (--footprint_class monosome|disome|custom): monosome uses the canonical 28-34 nt (vertebrate) / 37-41 nt (yeast) RPF window; disome widens to 60-90 nt / 65-95 nt for collided-ribosome studies
End-specific 5'/3' offset selection, P-site vs A-site workflows, bicistronic ATP8/ATP6 and ND4L/ND4 handling
Custom organism support via --annotation_file, --codon_tables_file, --codon_table_name, --start_codons
Persistent per-run logging in <output>/mitoribopy.log
Consistent terminal + file progress reporting for align and rpf
Provenance: every stage writes a run_settings.json; mitoribopy all composes them into run_manifest.json

Installation

From the repository root:

python -m pip install -e .

For development and tests:

python -m pip install -e ".[dev]"

Then confirm the CLI is available:

mitoribopy --help

If you prefer not to install the package yet:

PYTHONPATH=src python -m mitoribopy --help

Quick Start

Starting a new project (zero to working config)

# 1. Conda env with cutadapt / bowtie2 / umi_tools / samtools / pysam.
conda env create -f docs/environment/environment.yml
conda activate mitoribopy

# 2. Drop a working YAML template next to your data and fill in the paths.
mitoribopy all --print-config-template > pipeline_config.yaml

# 3. Inspect a stage's flag list without running it.
mitoribopy all --show-stage-help align
mitoribopy all --show-stage-help rpf

# 4. Dry-run to see the resolved argv per stage, then actually run.
mitoribopy all --config pipeline_config.yaml --output results/ --dry-run
mitoribopy all --config pipeline_config.yaml --output results/

Strain presets

`-s`	Organism / codon table	Ships annotation?	Ships `-rpf` default?
`h`	Human mt (`vertebrate_mitochondrial`)	yes	yes (28-34 nt monosome)
`y`	Yeast mt (`yeast_mitochondrial`)	yes	yes (37-41 nt monosome)
`vm`	Any vertebrate mt (`vertebrate_mitochondrial`)	no	no — pass `--annotation_file` + `-rpf`
`ym`	Any fungus with yeast-mito code (`yeast_mitochondrial`)	no	no — pass `--annotation_file` + `-rpf`
`custom`	Fully user-specified	no	no — also requires `--codon_tables_file` or `--codon_table_name`

Pair -s with --footprint_class:

`--footprint_class`	RPF window default	`--unfiltered_read_length_range` default	Use for
`monosome` (default)	h/vm: 28-34, y/ym: 37-41	15-50	Standard single-ribosome footprints
`disome`	h/vm: 60-90, y/ym: 65-95	40-110	Collided-ribosome studies (e.g. eIF5A depletion, stalling)
`custom`	user must pass `-rpf`	unchanged	Any non-standard footprint class

An explicit -rpf MIN MAX or --unfiltered_read_length_range MIN MAX always wins over the footprint-class default.

`mitoribopy rpf` — BED/BAM through the analysis pipeline

mitoribopy rpf \
  -s h \
  -f <reference.fa> \
  --directory <ribo_bed_dir> \
  -rpf 29 34 \
  --output <results_dir>

Plain mitoribopy <flags> still works in v0.3.x but routes to rpf with a deprecation warning. Use the explicit subcommand form.

`mitoribopy align` — FASTQ → BAM + BED

mitoribopy align \
  --kit-preset nebnext_smallrna \
  --library-strandedness forward \
  --fastq-dir <fastqs_dir> \
  --contam-index <bowtie2_rRNA_index_prefix> \
  --mt-index <bowtie2_mt_transcriptome_index_prefix> \
  --output <align_results_dir>

Use --kit-preset custom --adapter <SEQ> when your library isn't one of the built-in presets. External tools (cutadapt, bowtie2, umi_tools) must be on $PATH; see docs/environment/environment.yml for a ready-made bioconda env.

`mitoribopy rnaseq` — DE table + rpf → TE / ΔTE

mitoribopy rnaseq \
  --de-table <deseq2_or_xtail_or_anota2seq_output.tsv> \
  --gene-id-convention hgnc \
  --ribo-dir <rpf_results_dir> \
  --reference-gtf <shared_reference.fa> \
  --condition-map <samples_to_conditions.tsv> \
  --condition-a control --condition-b knockdown \
  --output <rnaseq_results_dir>

--gene-id-convention is required (no default). The reference-consistency gate will hard-fail unless the hash of --reference-gtf matches the hash the prior rpf run recorded.

`mitoribopy all` — end-to-end orchestrator

mitoribopy all --config pipeline_config.yaml --output <run_root>

Where pipeline_config.yaml has align:, rpf:, and optional rnaseq: sections; each section's keys correspond to the subcommand's CLI flag names. See docs/tutorials/01_end_to_end_fastq.md for a worked example.

Useful details for mitoribopy all:

mitoribopy all --help shows only orchestrator-level flags. For full stage help, use:
- mitoribopy all --show-stage-help align
- mitoribopy all --show-stage-help rpf
- mitoribopy all --show-stage-help rnaseq
When align and rpf both run, all auto-wires:
- rpf.directory -> <run_root>/align/bed
- rpf.read_counts_file -> <run_root>/align/read_counts.tsv
When rpf and rnaseq both run, all auto-wires:
- rnaseq.ribo_dir -> <run_root>/rpf

Logs and progress

mitoribopy align writes <output>/mitoribopy.log and emits per-sample stage updates for trim, contaminant filtering, mt alignment, MAPQ filtering, deduplication, and BED export.
mitoribopy rpf writes <output>/mitoribopy.log and emits numbered pipeline-step progress plus downstream plotting/profile progress.
The same status lines are written to both the terminal and the log file.

Built-In References

MitoRiboPy ships with packaged reference data for:

Human mitochondrial translation using the vertebrate_mitochondrial codon table
Yeast mitochondrial translation using the yeast_mitochondrial codon table

Built-in annotation tables are stored as CSV and built-in codon tables are stored as JSON under src/mitoribopy/data.

For bicistronic transcript regions:

Titles stay consistent as ATP8/ATP6 and ND4L/ND4
The default sequence baselines are ATP6 and ND4
You can switch them with --atp8_atp6_baseline ATP8|ATP6 and --nd4l_nd4_baseline ND4L|ND4

Legacy FASTA/BED identifiers such as ATP86 and ND4L4 are still recognized through built-in aliases.

Custom Organisms

Custom organisms are supported through:

--annotation_file
--codon_tables_file
--codon_table_name
--start_codons

For --strain custom, provide an explicit RPF range as well:

mitoribopy \
  -s custom \
  -f <reference.fa> \
  --directory <ribo_bed_dir> \
  -rpf 28 34 \
  --annotation_file examples/custom_reference/annotation_template.csv \
  --codon_tables_file examples/custom_reference/codon_tables_template.json \
  --codon_table_name custom_example \
  --start_codons ATG GTG \
  --output <results_dir>

Example templates are included here:

CLI Parameters

Required parameters

-f, --fasta: reference FASTA

Usually required for a normal run

These are not all technically mandatory in the parser, but they are the recommended minimum for a reproducible run:

-s, --strain
--directory
-rpf <min> <max>
--output

Additional required parameters for `--strain custom`

--annotation_file
--codon_tables_file or --codon_table_name
-rpf <min> <max>

Common optional parameters

--align start|stop
--offset_type 5|3
--offset_site p|a
--offset_pick_reference p_site|selected_site
--min_5_offset, --max_5_offset
--min_3_offset, --max_3_offset
--offset_mask_nt
--read_counts_file
--read_counts_sample_col
--read_counts_reference_col
--read_counts_reads_col
--unfiltered_read_length_range <min> <max>
--rpm_norm_mode total|mt_mrna
--plot_format png|pdf|svg
-m, --merge_density
--structure_density
--cor_plot
--use_rna_seq

Example Usage

Human or yeast with default-style analysis

mitoribopy rpf \
  -s h \
  -f <reference.fa> \
  --directory <ribo_bed_dir> \
  -rpf 29 34 \
  --align stop \
  --offset_type 5 \
  --offset_site p \
  --offset_pick_reference p_site \
  --offset_mask_nt 5 \
  --min_5_offset 10 \
  --max_5_offset 22 \
  --min_3_offset 10 \
  --max_3_offset 22 \
  --plot_format svg \
  --output <results_dir> \
  -m

Run with read-count normalization

mitoribopy \
  -s h \
  -f <reference.fa> \
  --directory <ribo_bed_dir> \
  -rpf 29 34 \
  --read_counts_file <read_counts.csv> \
  --read_counts_sample_col sample \
  --read_counts_reads_col reads \
  --read_counts_reference_col reference \
  --rpm_norm_mode mt_mrna \
  --mrna_ref_patterns mt_genome \
  --output <results_dir>

Inspect broader read-length QC ranges

mitoribopy rpf \
  -s h \
  -f <reference.fa> \
  --directory <ribo_bed_dir> \
  -rpf 29 34 \
  --unfiltered_read_length_range 15 60 \
  --output <results_dir>

This keeps the filtered analysis range at 29-34 nt while broadening the unfiltered QC tables and heatmaps so longer footprints remain visible.

Run optional downstream modules

mitoribopy \
  -s h \
  -f <reference.fa> \
  --directory <ribo_bed_dir> \
  -rpf 29 34 \
  --structure_density \
  --cor_plot \
  --base_sample <sample_name> \
  --output <results_dir>

Run a custom organism

mitoribopy \
  -s custom \
  -f <reference.fa> \
  --directory <ribo_bed_dir> \
  -rpf 28 34 \
  --annotation_file <annotation.csv> \
  --codon_tables_file <codon_tables.json> \
  --codon_table_name <table_name> \
  --start_codons ATG GTG \
  --output <results_dir>

Input Files

BED

Expected columns:

chrom
start
end

Additional BED columns are tolerated. Coordinates are treated as standard 0-based, end-exclusive intervals.

FASTA

FASTA headers should match the annotation sequence_name or one of its sequence_aliases.

Annotation CSV

Required columns:

transcript
l_tr
l_utr5
l_utr3

Optional columns:

l_cds
sequence_name
sequence_aliases
display_name

Meaning:

transcript is the logical CDS name used in frame and codon outputs
sequence_name is the FASTA/BED sequence ID that the row maps onto
sequence_aliases contains alternate FASTA/BED names separated by semicolons
display_name controls plot titles and grouped transcript labels

If l_cds is omitted, it is computed as l_tr - l_utr5 - l_utr3.

Codon-Table JSON

Two formats are supported:

One flat 64-codon mapping
A dictionary of named 64-codon mappings

When multiple named tables are present, choose one with --codon_table_name.

Read-Count Table

.csv, .tsv, and .txt are supported. Column matching is flexible and case-insensitive, with fallback to positional matching:

first column: sample
second column: reference
third column: read count

Output Overview

Typical output structure:

<output>/
  mitoribopy.log
  plots_and_csv/
  <sample>/
    footprint_density/
    translating_frame/
    codon_usage/
    debug_csv/
  coverage_profile_plots/
  structure_density/      # if --structure_density
  codon_correlation/      # if --cor_plot
  rna_seq_results/        # if --use_rna_seq

Key outputs include:

offset enrichment CSVs and plots
selected offset tables by read length
footprint-density CSVs for P-site, A-site, and E-site
frame-usage summaries
transcript-level and total codon-usage summaries
RPM and raw coverage-profile plots
CDS-aware codon-binned coverage plots (*_codon/, 3 nt combined per codon)
optional structure-density exports from footprint-density tables

Important Runtime Notes

--offset_type 5|3: downstream site placement from the read 5' or 3' end
--offset_site p|a: whether reported offsets represent P-site or A-site positions
--offset_pick_reference p_site|selected_site: how the best offset is chosen
--min_5_offset, --max_5_offset, --min_3_offset, --max_3_offset: recommended end-specific selection bounds
--offset_mask_nt: mask near-anchor bins from enrichment summaries and plots
--rpm_norm_mode total|mt_mrna: read-count normalization mode
--structure_density: export log2 and scaled density values from footprint-density tables

For the full interface, run:

mitoribopy --help

Troubleshooting

~99 % of reads disappear at the trim step, post_trim is a tiny fraction of total_reads in read_counts.tsv. The named --kit-preset has the wrong 3' adapter for your library. mitoribopy align runs adapter detection by default (--adapter-detection auto); the [ADAPTER] INFO line in the log tells you which preset the data actually looks like. Re-run with that preset, or add --adapter-detection strict to fail-fast on the mismatch instead of silently continuing.

Filtered BED is empty → "no data remained after BED filtering". Either your RPF range does not cover the actual read-length distribution (open the per-sample *_read_length_distribution.svg; the shaded band shows the currently selected window) or every mapped read has been filtered earlier by MAPQ / contaminant subtraction. Widen -rpf MIN MAX, try --footprint_class disome if you are studying collided ribosomes, or lower --mapq.

Offset selection produced no rows → p_site_offsets_*.csv is empty. The --min_5_offset / --max_5_offset window (default 10-22 nt) did not overlap the enrichment peak. Re-open the offset_enrichment_heatmap_*.svg and widen the window explicitly.

RPM is 0 for every sample. Either --read_counts_file was not passed, or the file does not contain entries for the sample name(s) the pipeline inferred from the BED filenames. The [QC] WARNING: No total read-count entry found for sample(s): ... log line lists the samples that missed the lookup. Add a matching row to the counts file and re-run.

--show-stage-help output is too dense. It is the full argparse --help for that stage. Pair it with mitoribopy all --print-config-template to get a pre-populated YAML and only override the keys you care about.

Reference-consistency gate failure in rnaseq. The reference FASTA you just passed does not hash-match the one the prior rpf stage recorded in its run_settings.json. You must re-align both sides against the identical transcript set.

Development

Run the test suite with:

PYTHONPATH=src pytest

This repository also includes package migration notes and release materials under docs/README.md.

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.1

Apr 24, 2026

This version

0.3.0

Apr 24, 2026

0.2.0

Apr 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mitoribopy-0.3.0.tar.gz (152.8 kB view details)

Uploaded Apr 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mitoribopy-0.3.0-py3-none-any.whl (140.1 kB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file mitoribopy-0.3.0.tar.gz.

File metadata

Download URL: mitoribopy-0.3.0.tar.gz
Upload date: Apr 24, 2026
Size: 152.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for mitoribopy-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`e2abc4eaa060264dc4a41a8ba7d2ebc6de07bc0da513157e56e3a21211f39eba`
MD5	`4dc1e1dc9a723fd7c3b1a93741cb71a9`
BLAKE2b-256	`76b129c5f4ff3dc086b61417a887fc227c03505f1070150e1e740ad60254f12e`

See more details on using hashes here.

File details

Details for the file mitoribopy-0.3.0-py3-none-any.whl.

File metadata

Download URL: mitoribopy-0.3.0-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 140.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for mitoribopy-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`54c0840255c1f0f9cd5de0df1d9aa578fb70eb0253e528c5cac5c69f801d9797`
MD5	`0119ec549f3f24d7ee75127a26e3ab75`
BLAKE2b-256	`33e37dea868001d33d07cf3bb9f80b5e041eae5dd3704a19d4a2f9a1d6c1e12e`

See more details on using hashes here.

MitoRiboPy 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MitoRiboPy

Highlights

Installation

Quick Start

Starting a new project (zero to working config)

Strain presets

mitoribopy rpf — BED/BAM through the analysis pipeline

mitoribopy align — FASTQ → BAM + BED

mitoribopy rnaseq — DE table + rpf → TE / ΔTE

mitoribopy all — end-to-end orchestrator

Logs and progress

Built-In References

Custom Organisms

CLI Parameters

Required parameters

Usually required for a normal run

Additional required parameters for --strain custom

Common optional parameters

Example Usage

Human or yeast with default-style analysis

Run with read-count normalization

Inspect broader read-length QC ranges

Run optional downstream modules

Run a custom organism

Input Files

BED

FASTA

Annotation CSV

Codon-Table JSON

Read-Count Table

Output Overview

Important Runtime Notes

Troubleshooting

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`mitoribopy rpf` — BED/BAM through the analysis pipeline

`mitoribopy align` — FASTQ → BAM + BED

`mitoribopy rnaseq` — DE table + rpf → TE / ΔTE

`mitoribopy all` — end-to-end orchestrator

Additional required parameters for `--strain custom`