Skip to main content

Specific Methylation Analysis and Report Tool for BS-Seq data

Project description

========================
README for SMART (1.4.2)
========================
Time-stamp: <2016-12-03 15:44:02 Hongbo Liu>

Introduction
============

It is known that DNA methylation plays important roles in regulation
of cell development and differentiation. DNA methylation/unmethylation
mechanisms are common in all tissue/cell. However, different cell
types with the same genome have different methylomes. Recently,
high-throughput sequencing combining bisulfite treatment (Bisulfite
-Seq) have been used to generate DNA methylomes from a wide range of
human tissue/cell types at a genome-wide perspective. To characterize
the genome regions that consist of continuous CpGs with similar
methylation specificity, we developed the Specific Methylation Analysis
and Report Tool (SMART) based on the quantified methylation specificity,
Euclidean distance and similarity entropy, for identifying and
characterizing sets of genome segments comprising continuous CpGs with
similar methylation specificities. For a given set of multiple methylomes
profiled using BS-Seq, entropy-based procedures facilitated the quantification
of methylation specificity for each CpG and the determination of the
Euclidean distance and similar entropy for each pair of neighboring CpGs.
Subsequently, continuous scanning based on these quantified parameters
segments the genome into primary segments comprising CpG sites with high
methylation similarities across all cell types. Further, the
primary segments in close proximity and sharing similar methylation
patterns were merged into larger segments of different types, including
high specificity (HighSpe), low specificity (LowSpe) and almost no
cell-specificity (NoSpe) segments. Eventually, the High/LowSpe segments
with specific hypo-/hypermethylation in the minority of cell types,
cell-type-specific hypomethylation marks (HypoMarks) and cell-type-specific
hypermethylation marks (HyperMarks), were identified using a statistical
method. To facilitate the mining of methylation marks (MethyMarks) across
cell types and species, all algorithms used in this procedure were
integrated into a Specific Methylation Analysis and Report Tool (SMART),
which is also available at http://fame.edbc.org/smart.

Install
=======
pip install numpy
pip install scipy
pip install SMART-BS-Seq

More information can be found in the file 'INSTALL' in the distribution.

Usage of SMART
==============

:usage: SMART MethyDir CytosineDir [-h] [-n PROJECTNAME] [-o OUTPUTFOLDER] [-v]



positional arguments
-----------------------
MethyDir
```````````````
The directory (such as /liuhb/BSSeq/) of the folder including methylation data files formated in wig.gz (such as H1.wig.gz). REQUIRED.
Example data can be found in the Example Folder in the distribution and online at http://fame.edbc.org/smart/download.html

CytosineDir
``````````````````
The directory (such as /liuhb/CLoc_hg19/) of the folder including cytosine location files for all chromesomes formated in txt.gz (such as chr1.txt.gz). REQUIRED.
Example data can be found in the Example Folder in the distribution and online at http://fame.edbc.org/smart/download.html

optional arguments
----------------------
-h, --help
``````````````````
show this help message and exit

-n PROJECTNAME
`````````````````````````````
Project name, which will be used to generate output file names. DEFAULT: "SMART"

-o OUTPUTFOLDER
````````````````````````````````
If specified all output files will be written to that directory. Default: the directory named using projectname and currenttime (such as SMART20140801132559) in the current working directory.

-v, --version
```````````````````
show program's version number and exit

Example
==============

Example data
---------------

The example data can be found in the directory Example under the installation directory of SMART. It should be noted that the location of installation directory of SMART may be different in different Operating System. The Cytosines and their methylation level in 50kb regions from chr3 and chr6 were extracted for test of SMART. User can use following command to test SMART.

Example command
---------------------
:For Linux:

Run SMART.py (which may be in ../python2.7/dist-packages/SMART/) via python directly and example data may be in ../python2.7/dist-packages/SMART/Example. The following referece may be useful for test of SMART::

python /usr/local/lib/python2.7/dist-packages/SMART/SMART.py /usr/local/lib/python2.7/dist-packages/SMART/Example/BSSeq_fortest/ /usr/local/lib/python2.7/dist-packages/SMART/Example/CLoc_hg19_fortest/ -n Test -o /usr/local/lib/python2.7/dist-packages/SMART/Example/Example_Results/

If you get the message "Output directory (/usr/local/lib/python2.7/dist-packages/SMART/Example/Example_Results/) could not be created", please add sudo before python
sudo python /usr/local/lib/python2.7/dist-packages/SMART/ /usr/local/lib/python2.7/dist-packages/SMART/Example/BSSeq_fortest/ /usr/local/lib/python2.7/dist-packages/SMART/Example/CLoc_hg19_fortest/ -n Test -o /usr/local/lib/python2.7/dist-packages/SMART/Example/Example_Results/

:For windows:

Run SMART.py (which may be in ..\Python27\Lib\site-packages\SMART\) via python directly, and example data may be in ..\Python27\Lib\site-packages\SMART\Example. The following referece may be useful for test of SMART::

python ..\Python27\Lib\site-packages\SMART\SMART.py ..\Python27\Lib\site-packages\SMART\Example\BSSeq_fortest\ ..\Python27\Lib\site-packages\SMART\Example\CLoc_hg19_fortest\ -n Test -o ..\Python27\Lib\site-packages\SMART\Example\Example_Results\


Output Files
==============
1. Folder SplitedMethy is a a output directory to store the splited Methylation data.
The methylation data are stored in different chromosome sub-folders. In each
sub-folder, the methylation data for all samples are included.
2. Folder MethylationSpecificity is a output directory to store the methylation
levels and specifity for each C which is common across all samples. These files are
stored in chromosomes. In this folder, MethylationSpecificity.wig.gz includes
the methylation specifity of all common C. And this file can be uploaded to UCSC
browser for visualization.
3. Folder MethylationSegment includes three sub-folders: GenomeSegment, GenomeSegmentMethy,
and MergedGenomeSegment. The sub-folder GenomeSegment stores all small segments
identified by SMART in each chromosome. And the sub-folder GenomeSegmentMethy stores
the methylation levels of each small segments across all samples which may be useful for
users' local further analysis. The sub-folder MergedGenomeSegment stores the larger
segments merged based on the small segments in each chromosome. The final results are
generated based on these merged segments.
4. Folder FinalResults includes all intresting results which may be concerned by users.
In this folder, there are six files.

-The first file 1SmallSegmentBed.txt.gz stores all small segments in bed format, which can be uploaded to UCSC browser for visualization.

-The second file 2MergedSegmentBed.txt.gz stores all merged segments in bed format, which can be uploaded to UCSC browser for visualization.

-The third file 3MergedSegment.txt stores all merged segments in txt format, which is useful for local further analysis.

-The fourth file 4MergedSegmentwithmethylation.txt stores the methylation levels of all merged segments across all samples, which is useful for local further analysis.

-The fifth file 5MergedHighLowSpeSegmentwithspecificity.txt stores the methylation specificity and p values of t-test for each merged HighSpe/LowSpe segement, which is useful for further analysis on cell-type-specificity for each HighSpe/LowSpe segement. The positive p value represents the segment is hyper-methylated in the corresbonding cell-type, while the negative p value represents the segment is hypo-methylated in the corresbonding cell-type.

-The sixth file 6CellTypeSpecificMethymarkPvalue.txt is a reformated file for the fifth file. In this file, only the HighSpe/LowSpe segements which show significant hypo- or hyper-methylation in some cell-types are remained. This file is usefull for users to select and analyze cell-type-specific methylation marks including HypoMarks and HyperMarks.

Other useful links
==================
:Predefined C locations in various species and other resources: http://fame.edbc.org/smart/
:QDMR: http://bioinfo.hrbmu.edu.cn/qdmr/
:UCSC Genome browser: http://genome.ucsc.edu/

Citation
=========
Please cite the following paper if you use SMART software or related information:

Liu H, Liu X, Zhang S, Lv J, Li S, Shang S, Jia S, Wei Y, Wang F, Su J et al: Systematic identification and annotation of human methylation marks based on bisulfite sequencing methylomes reveals distinct roles of cell-type-specific hypomethylation in regulation of cell identify genes. Nucleic Acids Research 2016, 44(1):75-94.

Contact
==================
:For any help: you are welcome to write to Hongbo Liu (hongbo919@gmail.com).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SMART-BS-Seq-1.4.2.20150517.tar.gz (280.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

SMART-BS-Seq-1.4.2.20150517.win-amd64.exe (531.7 kB view details)

Uploaded Source

SMART-BS-Seq-1.4.2.20150517.linux-x86_64.tar.gz (300.2 kB view details)

Uploaded Source

File details

Details for the file SMART-BS-Seq-1.4.2.20150517.tar.gz.

File metadata

File hashes

Hashes for SMART-BS-Seq-1.4.2.20150517.tar.gz
Algorithm Hash digest
SHA256 5db4312132674765a060980b4d1ca8b8c4831fa619fd3c88a87b04ce77c0cf99
MD5 8202990dd73b2adeb1548fac75066dfc
BLAKE2b-256 53fea6447affbac9997d918aebb0f047eb3ea9fc405c38da89160e272e5e1526

See more details on using hashes here.

File details

Details for the file SMART-BS-Seq-1.4.2.20150517.win-amd64.exe.

File metadata

File hashes

Hashes for SMART-BS-Seq-1.4.2.20150517.win-amd64.exe
Algorithm Hash digest
SHA256 c8841c4060bac011ee598b7f985f3ba16f8635aff43f8d9fbbf6d51d05cac47a
MD5 0a5f9df95fc6aa3327414dc24ab88442
BLAKE2b-256 a7418fc72c70f133a3d7eea27198e102d8ba98dcd65e64fc52e9b6bfff56d082

See more details on using hashes here.

File details

Details for the file SMART-BS-Seq-1.4.2.20150517.linux-x86_64.tar.gz.

File metadata

File hashes

Hashes for SMART-BS-Seq-1.4.2.20150517.linux-x86_64.tar.gz
Algorithm Hash digest
SHA256 700a16d7984b1bc9d014797599048232cfe015e2719950f324a382666f56f74a
MD5 fd8013da9599fa4cdcb2ef5be0a0ace7
BLAKE2b-256 426e7c4222e711c12a789b2aabe8563b453ce2f5f036b0d7020f57c44661bba7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page