Skip to main content

Metagene Profiling Analysis and Visualization

Project description

Metagene

Pypi Releases Downloads

Metagene Profiling Analysis and Visualization

A Python package for performing metagene analysis on genomic sites. This tool allows you to analyze the distribution of genomic features relative to gene regions (5'UTR, CDS, 3'UTR) and create publication-ready metagene profile plots.

Installation

Install metagene using pip:

pip install metagene

Or using uv:

uv add metagene

Quick Start

Command Line Interface

Basic metagene analysis using a built-in reference:

# Using built-in human genome reference (GRCh38)
metagene -i sites.tsv.gz -r GRCh38 --with-header -m 1,2,3 -w 5 \
         -o output.tsv -s scores.tsv -p plot.png

Using a custom GTF file:

# Using custom GTF annotation
metagene -i sites.bed -g custom.gtf.gz -m 1,2,3 -w 5 \
         -o output.tsv -s scores.tsv -p plot.png

Python API

import polars as pl
from metagene import (
    load_sites, load_reference, map_to_transcripts, 
    normalize_positions, simple_metagene_plot
)

# Load your genomic sites
sites_df = load_sites("sites.tsv.gz", with_header=True, meta_col_index=[0, 1, 2])

# Load reference genome annotation
reference = load_reference("GRCh38")  # or load_gtf("custom.gtf.gz")

# Perform metagene analysis
annotated_df = map_to_transcripts(sites_df, reference)
final_df, gene_splits = normalize_positions(annotated_df, strategy="median")

# Generate plot
simple_metagene_plot(final_df, gene_splits, "metagene_plot.png")

print(f"Analyzed {len(final_df)} sites")
print(f"Gene splits - 5'UTR: {gene_splits[0]:.3f}, CDS: {gene_splits[1]:.3f}, 3'UTR: {gene_splits[2]:.3f}")

Input Formats

TSV Format (Tab-separated values)

ref	pos	strand	score	pvalue
chr1	1000000	+	0.85	0.001
chr1	2000000	-	0.72	0.005

BED Format

chr1	999999	1000000	score1	0.85	+
chr1	1999999	2000000	score2	0.72	-

Column Specification

  • Use -m/--meta-columns to specify coordinate columns (1-based indexing)
  • Use -w/--weight-columns to specify score/weight columns
  • Use --with-header if your file has a header line

Built-in References

Metagene includes pre-processed gene annotations for major model organisms:

Species Assembly Reference
Human GRCh38/hg38 GRCh38, hg38
GRCh37/hg19 GRCh37, hg19
Mouse GRCm39/mm39 GRCm39, mm39
GRCm38/mm10 GRCm38, mm10
mm9/NCBIM37 mm9, NCBIM37
Arabidopsis TAIR10 TAIR10
Rice IRGSP-1.0 IRGSP-1.0
Model Organisms Various dm6, ce11, WBcel235, sacCer3, etc.

Managing References

List all available references:

metagene --list

This will show all 23+ available references organized by species:

Human:
  GRCh37          - Human genome GRCh37 (Ensembl release 75)
  GRCh38          - Human genome GRCh38 (Ensembl release 110)
  hg19            - Human genome hg19 (UCSC 2021)
  hg38            - Human genome hg38 (UCSC 2022)

Mouse:
  GRCm38          - Mouse genome GRCm38 (Ensembl release 102)
  GRCm39          - Mouse genome GRCm39 (Ensembl release 110)
  mm10            - Mouse genome mm10 (UCSC 2021)
  mm39            - Mouse genome mm39 (UCSC 2024)
  mm9             - Mouse genome mm9 (UCSC 2020)

... and more

Download a specific reference:

metagene --download GRCh38

Download all references (requires ~10GB disk space):

metagene --download all

CLI Examples

Basic Analysis

# Analyze sites with built-in human reference
metagene -i sites.tsv.gz -r GRCh38 --with-header \
         -m 1,2,3 -w 5 -o output.tsv -p plot.png

Note: References are automatically downloaded on first use.

Advanced Options

# Full analysis with custom parameters
metagene -i sites.bed -r GRCh38 \
         -m 1,2,3 -w 5,6 -n "score1,score2" \
         --bins 200 --region all \
         --score-transform log2 --normalize \
         -o annotated.tsv -s statistics.tsv -p metagene.pdf

Custom GTF Reference

# Use your own GTF annotation
metagene -i sites.tsv.gz -g annotation.gtf.gz --with-header \
         -m 1,2,3 -w 4 -o output.tsv -p plot.png

API Reference

Core Functions

  • load_sites(file, with_header=False, meta_col_index=[0,1,2]) - Load genomic sites
  • load_reference(name) - Load built-in reference genome
  • load_gtf(file) - Load custom GTF annotation
  • map_to_transcripts(sites, reference) - Annotate sites with gene information
  • normalize_positions(annotated_sites, strategy="median") - Normalize to relative positions
  • simple_metagene_plot(data, gene_splits, output_file) - Generate metagene plot

Analysis Workflow

# 1. Load data
sites = load_sites("input.tsv", with_header=True, meta_col_index=[0,1,2])
reference = load_reference("GRCh38")

# 2. Annotate and normalize  
annotated = map_to_transcripts(sites, reference)
normalized, splits = normalize_positions(annotated)

# 3. Visualize
simple_metagene_plot(normalized, splits, "output.png")

Demo

Metagene Profile

The plot shows the distribution of genomic sites across normalized gene regions:

  • 5'UTR (0.0 - first split): 5' untranslated region
  • CDS (first split - second split): Coding sequence
  • 3'UTR (second split - 1.0): 3' untranslated region

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metagene-0.0.1.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metagene-0.0.1-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file metagene-0.0.1.tar.gz.

File metadata

  • Download URL: metagene-0.0.1.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for metagene-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f5f477b057d800a0ed79aae8ae7f3f3d202a792c09c99308b8e191bb5815c11b
MD5 5a5a009786f98bfac82a132310d1d692
BLAKE2b-256 426ecf26626ba79a39b1092bfbc964cb439f5cc7035755abde50c122095c3bb1

See more details on using hashes here.

Provenance

The following attestation bundles were made for metagene-0.0.1.tar.gz:

Publisher: publish.yml on y9c/metagene

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file metagene-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: metagene-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for metagene-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5d9bb885210f4ed8ac3826f251744ef6964983166b8e39243fd797dae0b8121f
MD5 c281b5ae9be6c61f1a25f5bb49eabcd9
BLAKE2b-256 0adddb055e902cf943d2a141d42d9ccf6cd5247a4b0ecc49ccd7f297f8cbb0e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for metagene-0.0.1-py3-none-any.whl:

Publisher: publish.yml on y9c/metagene

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page