Skip to main content

RNN based assembly HELEN. It works paired with MarginPolish.

Project description

H.E.L.E.N.

H.E.L.E.N. (Homopolymer Encoded Long-read Error-corrector for Nanopore)

Build Status


Pre-print of a paper describing the methods and overview of a suggested de novo assembly pipeline is now available:

Efficient de novo assembly of eleven human genomes using PromethION sequencing and a novel nanopore toolkit


Overview

HELEN uses a Recurrent-Neural-Network (RNN) based Multi-Task Learning (MTL) model that can predict a base and a run-length for each genomic position using the weights generated by MarginPolish.

© 2020 Kishwar Shafin, Trevor Pesout, Benedict Paten.
Computational Genomics Lab (CGL), University of California, Santa Cruz.

Why MarginPolish-HELEN ?

  • MarginPolish-HELEN outperforms other graph-based and Neural-Network based polishing pipelines.
  • Simple installation steps.
  • HELEN can use multiple GPUs at the same time.
  • Highly optimized pipeline that is faster than any other available polishing tool.
  • We have sequenced-assembled-polished 11 samples to ensure robustness, runtime-consistency and cost-efficiency.
  • We tested GPU usage on Amazon Web Services (AWS) and Google Cloud Platform (GCP) to ensure scalability.
  • Open source (MIT License).

Installation

MarginPolish-HELEN is supported on Ubuntu 16.10/18.04 or any other Linux-based system.

Install prerequisites
sudo apt-get -y install git cmake make gcc g++ autoconf bzip2 lzma-dev zlib1g-dev \
libcurl4-openssl-dev libpthread-stubs0-dev libbz2-dev liblzma-dev libhdf5-dev \
python3-pip python3-virtualenv virtualenv
Method 1: Install MarginPolish-HELEN from GitHub
git clone https://github.com/kishwarshafin/helen.git
cd helen
make install
. ./venv/bin/activate

marginPolish --version
helen --version
helen --help
marginPolish --help

Each time you want to use it, activate the virtualenv:

source <path/to/helen/venv/bin/activate>
Method 2: Install using PyPi
python3 -m pip install helen --user

python3 -m marginpolish --help
python3 -m helen --help
echo 'export PATH="$(python3 -m site --user-base)/bin":$PATH' >> ~/.bashrc
source ~/.bashrc
python3 -m pip install update pip
python3 -m pip install update helen

Usage

MarginPolish requires a draft assembly and a mapping of reads to the draft assembly. We commend using Shasta as the initial assembler and MiniMap2 for the mapping.

Step 1: Generate an initial assembly

Generate an assembly using one of the ONT assemblers:

Step 2: Create an alignment between reads and shasta assembly

We recommend using MiniMap2 to generate the mapping between the reads and the assembly.

# we recommend using FASTQ as marginPolish uses quality values
# This command can run MiniMap2 with 32 threads, you can change the number as you like.
minimap2 -ax map-ont -t 32 shasta_assembly.fa reads.fq | samtools sort -@ 32 | samtools view -hb -F 0x104 > reads_2_assembly.bam
samtools index -@32 reads_2_assembly.bam

#  the -F 0x104 flag removes unaligned and secondary sequences

Step 3: Generate images using MarginPolish

You can generate images using MarginPolish by running:

marginPolish reads_2_assembly.bam \
Assembly.fa \
</path/to/model_name.json> \
-t <number_of_threads> \
-o <path/to/marginpolish_images> \
-f

You can get the params.json from path/to/marginpolish/params/.

Step 4: Run HELEN

Download Model
helen download_models \
--output_dir <path/to/helen_models/>
Run HELEN
helen polish \
--image_dir </path/to/marginpolish_images/> \
--model_path </path/to/model.pkl> \
--batch_size 256 \
--num_workers 4 \
--threads <num_of_threads> \
--output_dir </path/to/output_dir> \
--output_prefix <output_filename.fa> \
--gpu

If you are using CPUs then remove the --gpu argument.

Help

Please open a github issue if you face any difficulties.

Acknowledgement

We are thankful to Segey Koren and Karen Miga for their help with CHM13 data and evaluation.

We downloaded our data from Telomere-to-telomere consortium to evaluate our pipeline against CHM13.

We acknowledge the work of the developers of these packages:

Fun Fact

guppy235 guppy235

The name "HELEN" is inspired from the A.I. created by Tony Stark in the Marvel Comics (Earth-616). HELEN was created to control the city Tony was building named "Troy" making the A.I. "HELEN of Troy".

READ MORE: HELEN

© 2020 Kishwar Shafin, Trevor Pesout, Benedict Paten.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helen-0.0.13.tar.gz (1.9 MB view details)

Uploaded Source

Built Distributions

helen-0.0.13-py3.6-macosx-10.9-x86_64.egg (601.1 kB view details)

Uploaded Source

helen-0.0.13-cp36-cp36m-macosx_10_9_x86_64.whl (525.1 kB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

File details

Details for the file helen-0.0.13.tar.gz.

File metadata

  • Download URL: helen-0.0.13.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191101 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.6.2

File hashes

Hashes for helen-0.0.13.tar.gz
Algorithm Hash digest
SHA256 bf6f0de2aa998d774ffeef12b26bf0880526ee44cfdee9f6cb9cca1e30f503c4
MD5 55fc8f501602b57971b10d19cd5d894f
BLAKE2b-256 3e3d6793b545265629fef9c9565080d59a17ebfedd24b330d7e007632f946d75

See more details on using hashes here.

File details

Details for the file helen-0.0.13-py3.6-macosx-10.9-x86_64.egg.

File metadata

  • Download URL: helen-0.0.13-py3.6-macosx-10.9-x86_64.egg
  • Upload date:
  • Size: 601.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191101 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.6.2

File hashes

Hashes for helen-0.0.13-py3.6-macosx-10.9-x86_64.egg
Algorithm Hash digest
SHA256 0184a7d645e9dc42ba96b8a95dd4a6d5f7a28deb4863430413e02ff3267b5cdc
MD5 427761b46d13a731bd5e69b2664df84a
BLAKE2b-256 f5c5722fb51751d641cf66a6d6c6d6edc2cc0761a1fe9401196dc5bb34fbf66e

See more details on using hashes here.

File details

Details for the file helen-0.0.13-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: helen-0.0.13-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 525.1 kB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191101 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.6.2

File hashes

Hashes for helen-0.0.13-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e6b43597bc4f64008e8b9c97483c074aad2fffe326e5b410a65b4afdc5fbdcee
MD5 1ee59d4d4d810ca74bd7bb7e4d912e7f
BLAKE2b-256 12a5d49da6b7b66f9a4fe7e3236f87b7d41b74c597b3c961c1c3b76f6aade545

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page