RNN based assembly HELEN. It works paired with MarginPolish.
Project description
H.E.L.E.N.
H.E.L.E.N. (Homopolymer Encoded Long-read Error-corrector for Nanopore)
Pre-print of a paper describing the methods and overview of a suggested de novo assembly
pipeline is now available:
Efficient de novo assembly of eleven human genomes using PromethION sequencing and a novel nanopore toolkit
Overview
HELEN
uses a Recurrent-Neural-Network (RNN) based Multi-Task Learning (MTL) model that can predict a base and a run-length for each genomic position using the weights generated by MarginPolish
.
© 2020 Kishwar Shafin, Trevor Pesout, Benedict Paten.
Computational Genomics Lab (CGL), University of California, Santa Cruz.
Why MarginPolish-HELEN ?
MarginPolish-HELEN
outperforms other graph-based and Neural-Network based polishing pipelines.- Simple installation steps.
HELEN
can use multiple GPUs at the same time.- Highly optimized pipeline that is faster than any other available polishing tool.
- We have sequenced-assembled-polished 11 samples to ensure robustness, runtime-consistency and cost-efficiency.
- We tested GPU usage on
Amazon Web Services (AWS)
andGoogle Cloud Platform (GCP)
to ensure scalability. - Open source (MIT License).
Installation
MarginPolish-HELEN
is supported on Ubuntu 16.10/18.04
or any other Linux-based system.
Install prerequisites
sudo apt-get -y install git cmake make gcc g++ autoconf bzip2 lzma-dev zlib1g-dev \
libcurl4-openssl-dev libpthread-stubs0-dev libbz2-dev liblzma-dev libhdf5-dev \
python3-pip python3-virtualenv virtualenv
Method 1: Install MarginPolish-HELEN from GitHub
git clone https://github.com/kishwarshafin/helen.git
cd helen
make install
. ./venv/bin/activate
marginPolish --version
helen --version
helen --help
marginPolish --help
Each time you want to use it, activate the virtualenv:
source <path/to/helen/venv/bin/activate>
Method 2: Install using PyPi
python3 -m pip install helen --user
echo 'export PATH="$(python3 -m site --user-base)/bin":"$(python3 -m site --user-site)/bin":$PATH' >> ~/.bashrc
source ~/.bashrc
marginPolish --version
helen --version
helen --help
marginPolish --help
Usage
MarginPolish
requires a draft assembly and a mapping of reads to the draft assembly. We commend using Shasta
as the initial assembler and MiniMap2
for the mapping.
Step 1: Generate an initial assembly
Generate an assembly using one of the ONT assemblers:
Step 2: Create an alignment between reads and shasta assembly
We recommend using MiniMap2
to generate the mapping between the reads and the assembly.
# we recommend using FASTQ as marginPolish uses quality values
# This command can run MiniMap2 with 32 threads, you can change the number as you like.
minimap2 -ax map-ont -t 32 shasta_assembly.fa reads.fq | samtools sort -@ 32 | samtools view -hb -F 0x104 > reads_2_assembly.bam
samtools index -@32 reads_2_assembly.bam
# the -F 0x104 flag removes unaligned and secondary sequences
Step 3: Generate images using MarginPolish
You can generate images using MarginPolish by running:
marginPolish reads_2_assembly.bam \
Assembly.fa \
</path/to/model_name.json> \
-t <number_of_threads> \
-o <path/to/marginpolish_images> \
-f
You can get the params.json
from path/to/marginpolish/params/
.
Step 4: Run HELEN
Download Model
helen download_models \
--output_dir <path/to/helen_models/>
Run HELEN
helen polish \
--image_dir </path/to/marginpolish_images/> \
--model_path </path/to/model.pkl> \
--batch_size 256 \
--num_workers 4 \
--threads <num_of_threads> \
--output_dir </path/to/output_dir> \
--output_prefix <output_filename.fa> \
--gpu
If you are using CPUs
then remove the --gpu
argument.
Help
Please open a github issue if you face any difficulties.
Acknowledgement
We are thankful to Segey Koren and Karen Miga for their help with CHM13
data and evaluation.
We downloaded our data from Telomere-to-telomere consortium to evaluate our pipeline against CHM13
.
We acknowledge the work of the developers of these packages:
Fun Fact
The name "HELEN" is inspired from the A.I. created by Tony Stark in the Marvel Comics (Earth-616). HELEN was created to control the city Tony was building named "Troy" making the A.I. "HELEN of Troy".
READ MORE: HELEN
© 2020 Kishwar Shafin, Trevor Pesout, Benedict Paten.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file helen-0.0.9.tar.gz
.
File metadata
- Download URL: helen-0.0.9.tar.gz
- Upload date:
- Size: 1.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191101 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.6.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 002379a31b98c9cb2ef85646331dea20ab6a51d2d1686b1565e5e5a847e28684 |
|
MD5 | 34cc261d6cca40466bfd732769a96822 |
|
BLAKE2b-256 | 7cf988acd9d387bae0be1d5462739b8f721144c8204066a80d56e0ee5494060f |
File details
Details for the file helen-0.0.9-cp36-cp36m-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: helen-0.0.9-cp36-cp36m-macosx_10_9_x86_64.whl
- Upload date:
- Size: 525.1 kB
- Tags: CPython 3.6m, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191101 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.6.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ba1302cd45e75fba4648a9b32f2b3ea3ceb6564b752f53e5c9951dfb1c671c1 |
|
MD5 | 993ff6b78f7f713d7ca6e6b51bb861e6 |
|
BLAKE2b-256 | 394ee56acccb35e5776445c00406a2e49a050421fa0a1aaffaf71dca4e81c88e |