A GPU-based Tool to Map Bisulfite-Threated Reads
PLOS ONE Manuscript
GPU-BSM (standing for GPU-BiSulfite reads Mapping) is a GPU-based tool devised to map bisulfite-treated reads. It has been designed to support directional and non-directional libraries generated from both WGBS and RRBS. Basically, GPU-BSM adopts an unbiased strategy that reduces the complexity of involved sequences converting cytosines to thymines. Then, sequences represented with a simplified 3-letter nucleotide alphabet are aligned using the SOAP3-dp short-read mapping tool.
GPU-BSM creates two sequences from the original forward genomic strand. The first sequence is obtained by converting cytosines to thymines, whereas the second sequence is obtained by converting guanines to adenines. As for RRBS libraries, these sequences are generated analyzing a simplified reference genome that take into account only those genomic fragments compatible with the sequencing experiment. Directional and non-directional libraries are treated differently.
To map reads from a directional library, GPU-BSM performs two different mappings using SOAP3-dp. The first mapping is obtained by converting cytosines to thymines in the reads and then aligning them to the first sequence; the second is obtained by converting guanines to adenines in the reverse complement of the reads and then aligning them to the second sequence.
To map reads from a non-directional library, GPU-BSM performs four different mappings. In addition to the mappings performed for a directional library, GPU-BSM uses SOAP3-dp to map the reverse complement of the reads with cytosines converted to thymines to the first sequence, and the reverse complement of the reads with guanines converted to adenines to the second sequence.
Then, GPU-BSM analyzes the mapped reads, detecting and removing ambiguous reads and those that are in fact false positives. We consider ambiguous those reads for which i) exist a best match for at least two of two/four alignments performed according to the exploited library or ii) exist at least two best hits for a single alignment. GPU-BSM calculates the number of mismatches of the mapped reads using the 4-letter nucleotide alphabet. Due to the bisulfite treatment, a thymine in a read can be aligned to a cytosine in the reference sequence. Similarly, a guanine in the reverse complement of a read can be aligned to an adenine in the reference sequence.
GPU-BSM works on CUDA enabled GPU-cards. It has been tested on two families of NVIDIA GPU cards: the NVIDIA FERMI architecture based GTX 480 card, and the NVIDIA Kepler architecture based k10 and k20c cards.
GPU-BSM automatically detects the number of GPU installed in your computer and it runs in parallel the two (four) different alignments for directional (non-directional) libraries. For machine equipped with a single GPU card, GPU-BSM sequentially performs the different alignments.
GPU-BSM works on linux based systems with a custom installation of Python (release>=2.7.3) and equipped with a CUDA enabled GPU-card with cc 2.0. Moreover, GPU-BSM requires the installation of SOAP3-dp. Currently, SOAP3-dp is also available for the latest release CUDA 5.5.
To install CUDA refer to the installation instructions available at http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/ .
SOAP3-dp can be downloaded at the following addresses http://www.cs.hku.hk/2bwt-tools/soap3-dp/.
Run the following commands to extract the different programs:
% gunzip soap3-dp-<<release>>.tar.gz
% tar -xvf soap3-dp-<<release>>.tar.gz
GPU-BSM has been tested with the SOAP3-dp rel. 2.3.172
To install GPU-BSM run the following command
% sudo easy_install GPU-BSM
Synthetic libraries used to test GPU-BSM can be downloaded from ftp://fileshare.itb.cnr.it/GPUBSM/ .
Manual for this release is available at http://www.itb.cnr.it/documents/11811/167605/Manual.pdf .
20-10-2014: Bugs Fixed