Skip to main content

Specific Methylation Analysis and Report Tool for BS-Seq data

Project description

Time-stamp: <2015-03-29 11:28:37 Hongbo Liu>

Introduction

It is known that DNA methylation plays important roles in regulation of cell development and differentiation. DNA methylation/unmethylation mechanisms are common in all tissue/cell. However, different cell types with the same genome have different methylomes. Recently, high-throughput sequencing combining bisulfite treatment (Bisulfite -Seq) have been used to generate DNA methylomes from a wide range of human tissue/cell types at a genome-wide perspective. To characterize the genome regions that consist of continuous CpGs with similar methylation specificity, we developed the Specific Methylation Analysis and Report Tool (SMART) based on the quantified methylation specificity, Euclidean distance and similarity entropy. SMART aims to segment the genome into different functional regions based on the methylation pattern in multiple cell types. Continuous scanning is firstly carried out to obtain the primary segments composed by CpG sites with high methylation similarity across all cell types. Then, the primary segments those localize in close proximity and share similar methylation pattern are merged into different types of segments including high specificity segment (HighSpe), low specificity segment (LowSpe) and almost no cell-specificity segment (NoSpe). At last, the cell-type-specific methylation marks were identified to facilitate the epigenetic analysis.

Install

Please check the file ‘INSTALL’ in the distribution.

Usage of SMART

usage:

SMART MethyDir CytosineDir [-h] [-n PROJECTNAME] [-o OUTPUTFOLDER] [-v]

positional arguments

MethyDir

The directory (such as /liuhb/BSSeq/) of the folder including methylation data files formated in wig.gz (such as H1.wig.gz). REQUIRED.

CytosineDir

The directory (such as /liuhb/CLoc_hg19/) of the folder including cytosine location files for all chromesomes formated in txt.gz (such as chr1.txt.gz). REQUIRED.

optional arguments

-h, –help

show this help message and exit

-n PROJECTNAME

Project name, which will be used to generate output file names. DEFAULT: “SMART”

-o OUTPUTFOLDER

If specified all output files will be written to that directory. Default: the directory named using projectname and currenttime (such as SMART20140801132559) in the current working directory.

-v, –version

show program’s version number and exit

Example

Example data

The example data can be found in the directory Example under the installation directory of SMART. It should be noted that the location of installation directory of SMART may be different in different Operating System. The Cytosines and their methylation level in 50kb regions from chr3 and chr6 were extracted for test of SMART. User can use following command to test SMART.

Example command

For Linux:

The main function SMART may be in /usr/local/bin/, and example data may be in ../python2.7/dist-packages/SMART/Example. The following referece may be useful for test of SMART:

SMART /usr/local/lib/python2.7/dist-packages/SMART/Example/BSSeq_fortest/ /usr/local/lib/python2.7/dist-packages/SMART/Example/CLoc_hg19_fortest/ -n Test -o /usr/local/lib/python2.7/dist-packages/SMART/Example/Example_Results/
For windows:

The main function SMART may be in ..\Python27\Scripts\, and example data may be in ..\Python27\Lib\site-packages\SMART\Example. The following referece may be useful for test of SMART:

cd  ..\Python27\Scripts\
python SMART ..\Python27\Lib\site-packages\SMART\Example\BSSeq_fortest\ ..\Python27\Lib\site-packages\SMART\Example\CLoc_hg19_fortest\ -n Test -o ..\Python27\Lib\site-packages\SMART\Example\Example_Results\

Output Files

  1. Folder SplitedMethy is a a output directory to store the splited Methylation data. The methylation data are stored in different chromosome sub-folders. In each sub-folder, the methylation data for all samples are included.

  2. Folder MethylationSpecificity is a output directory to store the methylation levels and specifity for each C which is common across all samples. These files are stored in chromosomes. In this folder, MethylationSpecificity.wig.gz includes the methylation specifity of all common C. And this file can be uploaded to UCSC browser for visualization.

  3. Folder MethylationSegment includes three sub-folders: GenomeSegment, GenomeSegmentMethy, and MergedGenomeSegment. The sub-folder GenomeSegment stores all small segments identified by SMART in each chromosome. And the sub-folder GenomeSegmentMethy stores the methylation levels of each small segments across all samples which may be useful for users’ local further analysis. The sub-folder MergedGenomeSegment stores the larger segments merged based on the small segments in each chromosome. The final results are generated based on these merged segments.

  4. Folder FinalResults includes all intresting results which may be concerned by users. In this folder, there are six files.

    -The first file 1SmallSegmentBed.txt.gz stores all small segments in bed format, which can be uploaded to UCSC browser for visualization.

    -The second file 2MergedSegmentBed.txt.gz stores all merged segments in bed format, which can be uploaded to UCSC browser for visualization.

    -The third file 3MergedSegment.txt stores all merged segments in txt format, which is useful for local further analysis.

    -The fourth file 4MergedSegmentwithmethylation.txt stores the methylation levels of all merged segments across all samples, which is useful for local further analysis.

    -The fifth file 5MergedHighLowSpeSegmentwithspecificity.txt stores the methylation specificity and p values of t-test for each merged HighSpe/LowSpe segement, which is useful for further analysis on cell-type-specificity for each HighSpe/LowSpe segement. The positive p value represents the segment is hyper-methylated in the corresbonding cell-type, while the negative p value represents the segment is hypo-methylated in the corresbonding cell-type.

    -The sixth file 6CellTypeSpecificMethymarkPvalue.txt is a reformated file for the fifth file. In this file, only the HighSpe/LowSpe segements which show significant hypo- or hyper-methylation in some cell-types are remained. This file is usefull for users to select and analyze cell-type-specific methylation marks including HypoMarks and HyperMarks.

Contact

For any help:

you are welcome to write to Hongbo Liu (hongbo919@gmail.com).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SMART-BS-Seq-1.4.0.20150329.tar.gz (280.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

SMART-BS-Seq-1.4.0.20150329.win-amd64.exe (531.0 kB view details)

Uploaded Source

SMART-BS-Seq-1.4.0.20150329.linux-x86_64.tar.gz (300.4 kB view details)

Uploaded Source

SMART-BS-Seq-1.4.0.20150329-1.noarch.rpm (311.0 kB view details)

Uploaded Source

File details

Details for the file SMART-BS-Seq-1.4.0.20150329.tar.gz.

File metadata

File hashes

Hashes for SMART-BS-Seq-1.4.0.20150329.tar.gz
Algorithm Hash digest
SHA256 1529cfa4da539f3229b6530eadad75abce2e0bbb65cea51951102e7d747ee1c8
MD5 370251606b627b3bbc1ca5f097cb7dcb
BLAKE2b-256 ab41cb6c9a7b90606a5a29118bdadcf7bc7f1e4efbd6921eee2057ad7afa53dc

See more details on using hashes here.

File details

Details for the file SMART-BS-Seq-1.4.0.20150329.win-amd64.exe.

File metadata

File hashes

Hashes for SMART-BS-Seq-1.4.0.20150329.win-amd64.exe
Algorithm Hash digest
SHA256 8e03c31ba72219ccbf73ecfe9a6519fa24b05c4ec18eca6cdca9873ff398bdad
MD5 e689dbca6b98bb72187d7b4fb244d7bd
BLAKE2b-256 f6bb47505236e32960009a726736b270b2b89d7f733668ac4fd989baaf87eb36

See more details on using hashes here.

File details

Details for the file SMART-BS-Seq-1.4.0.20150329.linux-x86_64.tar.gz.

File metadata

File hashes

Hashes for SMART-BS-Seq-1.4.0.20150329.linux-x86_64.tar.gz
Algorithm Hash digest
SHA256 dbb0c9eae2473cf7c98bceb724f9292e014803c4a9822cff1314ad8a2ed40dd7
MD5 3c340783e70e3413f42c28f5357099f6
BLAKE2b-256 fa4827f2c419f64a2737acb80424c48cca6fcd4b982c718647d1e694b58e271b

See more details on using hashes here.

File details

Details for the file SMART-BS-Seq-1.4.0.20150329-1.noarch.rpm.

File metadata

File hashes

Hashes for SMART-BS-Seq-1.4.0.20150329-1.noarch.rpm
Algorithm Hash digest
SHA256 62e85f1c3929bb2bc2eb132e3893873eeafafe89657872b1433961bad3fdf357
MD5 8408e478a8f3cda36afb68d1521ad3f8
BLAKE2b-256 c9a5de7d7d3454b887c8e3935ccb38ccfc73872e6b1e281409703c739d9dd265

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page