Metagene Profiling Analysis and Visualization
Project description
Metagene
Metagene Profiling Analysis and Visualization
This tool allows you to analyze metagene, the distribution of genomic features relative to gene regions (5'UTR, CDS, 3'UTR) and create publication-ready metagene profile plots.
Installation
Install metagene using pip:
pip install metagene
minimal python version requirement: 3.12
Quick Start
Command Line Interface
Basic metagene analysis using a built-in reference:
# Using built-in human genome reference (GRCh38)
metagene -i sites.tsv.gz -r GRCh38 --with-header -m 1,2,3 -w 5 \
-o output.tsv -s scores.tsv -p plot.png
Using a custom GTF file:
# Using custom GTF annotation
metagene -i sites.bed -g custom.gtf.gz -m 1,2,3 -w 5 \
-o output.tsv -s scores.tsv -p plot.png
Python API
from metagene import (
load_sites, load_reference, map_to_transcripts,
normalize_positions, plot_profile
)
# Load your genomic sites
sites_df = load_sites("sites.tsv.gz", with_header=True, meta_col_index=[0, 1, 2])
# Load reference genome annotation
reference_df = load_reference("GRCh38") # or load_gtf("custom.gtf.gz")
# Perform metagene analysis
annotated_df = map_to_transcripts(sites_df, reference_df)
gene_bins, gene_stats, gene_splits = normalize_positions(
annotated_df, split_strategy="median", bin_number=100
)
# Generate plot
plot_profile(gene_bins, gene_splits, "metagene_plot.png")
print(f"Analyzed {gene_bins['count'].sum()} sites")
print(f"Gene splits - 5'UTR: {gene_splits[0]:.3f}, CDS: {gene_splits[1]:.3f}, 3'UTR: {gene_splits[2]:.3f}")
print(f"Gene statistics - 5'UTR: {gene_stats['5UTR']}, CDS: {gene_stats['CDS']}, 3'UTR: {gene_stats['3UTR']}")
Input Formats
TSV Format
ref pos strand score pvalue
chr1 1000000 + 0.85 0.001
chr1 2000000 - 0.72 0.005
BED Format
chr1 999999 1000000 score1 0.85 +
chr1 1999999 2000000 score2 0.72 -
Column Specification
- Use
-m/--meta-columnsto specify coordinate columns (1-based indexing) - Use
-w/--weight-columnsto specify score/weight columns - Use
-H/--with-headerif your file has a header line
Built-in References
Metagene includes pre-processed gene annotations for major model organisms:
| Species | Assembly | Reference |
|---|---|---|
| Human | GRCh38/hg38 | GRCh38, hg38 |
| GRCh37/hg19 | GRCh37, hg19 |
|
| Mouse | GRCm39/mm39 | GRCm39, mm39 |
| GRCm38/mm10 | GRCm38, mm10 |
|
| mm9/NCBIM37 | mm9, NCBIM37 |
|
| Arabidopsis | TAIR10 | TAIR10 |
| Rice | IRGSP-1.0 | IRGSP-1.0 |
| Model Organisms | Various | dm6, ce11, WBcel235, sacCer3, etc. |
Managing References
List all available references:
metagene --list
This will show all 23+ available references organized by species:
Human:
GRCh37 - Human genome GRCh37 (Ensembl release 75)
GRCh38 - Human genome GRCh38 (Ensembl release 110)
hg19 - Human genome hg19 (UCSC 2021)
hg38 - Human genome hg38 (UCSC 2022)
Mouse:
GRCm38 - Mouse genome GRCm38 (Ensembl release 102)
GRCm39 - Mouse genome GRCm39 (Ensembl release 110)
mm10 - Mouse genome mm10 (UCSC 2021)
mm39 - Mouse genome mm39 (UCSC 2024)
mm9 - Mouse genome mm9 (UCSC 2020)
... and more
Download a specific reference:
metagene --download GRCh38
Download all references (requires ~10GB disk space):
metagene --download all
CLI Options
Usage: metagene [OPTIONS]
Run metagene analysis on genomic sites.
Options:
--version Show the version and exit.
-i, --input PATH Input file path (BED, GTF, TSV or CSV, etc.)
-o, --output PATH Output file path (TSV, CSV)
-s, --output-score PATH Output file for binned score statistics
-p, --output-figure PATH Output file for metagene plot
-r, --reference TEXT Built-in reference genome to use (e.g.,
GRCh38, GRCm39)
-g, --gtf PATH GTF/GFF file path for custom reference
--region Region to analyze (default: all)
-b, --bins INTEGER Number of bins for analysis (default: 100)
-H, --with-header Input file has header line
-S, --separator TEXT Separator for input file (default: tab)
-m, --meta-columns TEXT Input column indices (1-based) for genomic
coordinates. The columns should contain
Chromosome,Start,End,Strand or
Chromosome,Site,Strand
-w, --weight-columns TEXT Input column indices (1-based) for
weight/score values
-n, --weight-names TEXT Names for weight columns
--score-transform
Transform to apply to scores (default: none)
--normalize Normalize scores by transcript length
--list List all available built-in references and
exit
--download TEXT Download a specific reference (e.g., GRCh38)
or 'all' for all references
-h, --help Show this message and exit.
API Reference (Core Functions)
load_sites(file, with_header=False, meta_col_index=[0,1,2])- Load genomic sitesload_reference(name)- Load built-in reference genomeload_gtf(file)- Load custom GTF annotationmap_to_transcripts(sites, reference)- Annotate sites with gene informationnormalize_positions(annotated_sites, strategy="median")- Normalize to relative positionsplot_profile(data, gene_splits, output_file)- Generate metagene plot
Demo
The plot shows the distribution of genomic sites across normalized gene regions:
- 5'UTR (0.0 - first split): 5' untranslated region
- CDS (first split - second split): Coding sequence
- 3'UTR (second split - 1.0): 3' untranslated region
TODO:
- How to 100k sites on human genome in less than 10s?
- The core function should be move into variant
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file metagene-0.0.16.tar.gz.
File metadata
- Download URL: metagene-0.0.16.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
557d12633cd7c85ec9ec086c6468acb3569183d50ba106a4d080c6f1f5307df3
|
|
| MD5 |
6a54dc47d6cf726bd69f54c92a1be5d3
|
|
| BLAKE2b-256 |
332a4e412d4bb73b6b69d94a7283095ae2b3ddcacae8ef35080fb5c474b2c211
|
Provenance
The following attestation bundles were made for metagene-0.0.16.tar.gz:
Publisher:
publish.yml on y9c/metagene
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
metagene-0.0.16.tar.gz -
Subject digest:
557d12633cd7c85ec9ec086c6468acb3569183d50ba106a4d080c6f1f5307df3 - Sigstore transparency entry: 657720396
- Sigstore integration time:
-
Permalink:
y9c/metagene@48adc06989c8963b6097aa0d9a13c9dbce64bb50 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/y9c
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@48adc06989c8963b6097aa0d9a13c9dbce64bb50 -
Trigger Event:
push
-
Statement type:
File details
Details for the file metagene-0.0.16-py3-none-any.whl.
File metadata
- Download URL: metagene-0.0.16-py3-none-any.whl
- Upload date:
- Size: 31.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a3801890cd5a5867cf82a6e75b8aeea52dc68ad02c3b678ad32d5c521d5c22e
|
|
| MD5 |
2d3396e4ba5d061c84ad962f282598ef
|
|
| BLAKE2b-256 |
c6730cdf2077595be76bed77f99c84bb57c0e28b2070db6152414eb35aaab9ab
|
Provenance
The following attestation bundles were made for metagene-0.0.16-py3-none-any.whl:
Publisher:
publish.yml on y9c/metagene
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
metagene-0.0.16-py3-none-any.whl -
Subject digest:
8a3801890cd5a5867cf82a6e75b8aeea52dc68ad02c3b678ad32d5c521d5c22e - Sigstore transparency entry: 657720403
- Sigstore integration time:
-
Permalink:
y9c/metagene@48adc06989c8963b6097aa0d9a13c9dbce64bb50 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/y9c
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@48adc06989c8963b6097aa0d9a13c9dbce64bb50 -
Trigger Event:
push
-
Statement type: