Generate modbed track files for visualization on WashU Epigenome Browser
Project description
modbedtools
Requires Python >= 3.6
A python command line tool to generate modbed files for visualization on the WashU Epigenome Browser.
This tools has 2 modules/subcommands:
- parse MM/ML tag from BAM files generated from 3rd generation sequencing platform like Oxford Nanopore and PacBio devices using the pysam package.
- add background canonical base positions given modified bases.
installation
Install through pypi modbedtools project page (version number might change):
$ pip install modbedtools
Collecting modbedtools
Downloading modbedtools-0.1.0-py3-none-any.whl (6.8 kB)
Requirement already satisfied: pysam in /opt/apps/python3/lib/python3.7/site-packages (from modbedtools) (0.19.1)
Installing collected packages: modbedtools
Successfully installed modbedtools-0.1.0
modbed format
chr11 5173273 5195306 read_id score + -110,-266,-1459,-1780,-1840,-1842,-1848,-1865,-1928,-1936,... -396,-1543,-3222,-4195,-4319,-4692,-5352,-5366,-5523,-5838,...
chr11 5174507 5194585 read_id score + 223,605,607,613,630,693,701,936,1761,3369,... 307,544,1280,2017,2859,2994,3116,3249,3790,3935,...
chr11 5174543 5196481 read_id score + 187,271,508,570,576,593,901,1729,2826,3216,... 568,656,664,1985,2961,3083,3703,4115,4286,4882,...
Each row in this bed-based format is a long read, the columns are:
- chromosome
- start position of this read
- end position of this read
- read name or id or something to tag this read
- score (number), this can be used to sort the reads from top to bottom when viewing in Browser, can use 0 if no need to sort
- strand (+ or - for mapping direction)
- methylated/modified base positions, relative to start, a dot
.
can be used if there is no modified bases - unmethylated/unmodified/canonical base positions, relative to start, a dot
.
can be used if there is no unmodified bases
All positions are 0 based.
8 columns of data need be provided, 4th column can be read identifiers or use chrom:start-end
. 5th column is score which is used to sort reads vertically in the view region.
commands
$ modbedtools -h
usage: modbedtools [-h] [--version] {bam2mod,addbg} ...
Python command line tool to generate modbed files for visualization on WashU Epigenome Browser.
optional arguments:
-h, --help show this help message and exit
--version, -v show program's version number and exit
subcommands:
valid subcommands
{bam2mod,addbg} additional help
bam2mod convert bam to modbed
addbg add backgroud bases given modified bases and reference sequence
(files for testing can be found in the test folder in this repository)
bam2mod
convert bam files with MM/ML tags to modbed format.
$ modbedtools bam2mod -h
usage: modbedtools bam2mod [-h] [-b [{C,A,c,a}]] [-g] [-c CUTOFF] [-o OUTPUT] bamfile
positional arguments:
bamfile bam file with MM/ML tags
optional arguments:
-h, --help show this help message and exit
-b [{C,A,c,a}], --base [{C,A,c,a}]
modification base, case in-sensitive, C/c are same. (default: C)
-g, --cpg output for both C/G bases in CpG, only applys when base is C
-c CUTOFF, --cutoff CUTOFF
methylation cutoff, >= cutoff as methylated. default: 0.5
-o OUTPUT, --output OUTPUT
output file name, a suffix .modbed will be added. default: output
examples:
modbedtools bam2mod hifi-test.bam -o hifi
modbedtools bam2mod remora-test.bam -o remora
addbg
For data provided methylated bases, given a reference genome fasta sequence, add the unmethylated bases from genome sequence as background, this assumes all other specified bases from genome are unmethylated/unmodified.
The input file should be in bed format, the last 2 columns save the comma separated relative base positions with modifications (0 based).
example input:
chr11 5193360 5212743 {middle columns can be anything or none} 21,273,296,307,440,461,475,688,689,694,863...
The example data below is adopted from one of the Fiber-seq data from John Stamatoyannopoulos lab.
modbedtools addbg -b A GSM4411218_tracks_m6A_DS75167.dm6.bed.gz dm6.fa.gz -o GSM4411218_tracks_m6A_DS75167
misc scripts
Convert NanoMethPhase example data to modbed format.
python3 ../misc/parse_nanomethphase.py NA19240_chr21_39000000-40000000.bam NA19240_chr21_39000000-40000000_MethylationCalls.tsv
If need support of other methylation callers please submit an issue request.
track formating
Tabix is used to compress and index the modbed files generated in last steps.
example:
bgzip hifi.modbed
tabix -p bed hifi.modbed.gz
Then the .gz and .gz.tbi files can be placed into any web server for hosting and the URL to the .gz file can be used for Visualization in WashU Epigenome Browser.
visualization
In this tutorial, and we will use hifi-test.modbed.gz for the next step by step tutorial.
First we will go to the Browser by navigating your web browser to https://epigenomegateway.wustl.edu/browser/, click hg38
for the genome.
In the test data, we will check methylation signal over KDM2A gene, we will use the gene search function, type in KDM2A
and choose the first hit in refGene:
Go to Tracks menu, click Remote Tracks:
Choose modbed from the track type dropdown list, paste the URL above:
This is the default view after you submit this modbed file, each row represents a long read, each bar on each read means methylation level, gray bar indicates there is an cytosine base but it’s unmethylated. Mouse over each bar can show the tooltip.
Zoom in 5-fold multiple times, you can see the methylation status at base pair level resolution, each filled circle means methylated, empty circle means unmethylated, orange circle above the line means it’s in + strand, blue in – strand.
Zoom out multiple times from the default view, can clearly view m6A methylation profile over each read:
Zoom out further, signals from all reads are summarized to one bar plot, gray line indicates read density, bar height means methylation level:
At each view, right click the track, can change view to heatmap style like in IGV:
Pacbio data
PacBio CpG methylation calls of circular consensus se-quencing (ccs) reads represents the predicted methylation status of the CpG site as a unit. Usually, we plotted the methylation prediction of CCS on both C base at each CpG site by enable the -g
option:
modbedtools bam2mod hifi-test.bam -o hifi -g
see the below screenshots for pacbio data visualizaed at base pair level, top is without -g
and button is with -g
option:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for modbedtools-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e2f715fbce7fd35d8a9b845730c2a5a2149f39c60f1ce1e81ff46c2e43b344f |
|
MD5 | 82e7f4091e1d2b29f7d833b482bcfd87 |
|
BLAKE2b-256 | 431ca9404d29579fc9fb57b73106054c0194da5497c7f8f3ada66ea1b700bd07 |