Skip to main content

Internal BFSSI package for assembling prokaryotic genomes from short reads

Project description

ProkaryoteAssembly

Two simple scripts to assemble prokaryotic genomes using paired-end reads.

Pipeline Overview

  1. QC on reads with bbduk.sh (adapter trimming/quality filtering)
  2. Error-correction of reads with tadpole.sh
  3. Assembly of reads with skesa
  4. Alignment of error-corrected reads against draft assembly with bbmap.sh
  5. Polishing of assembly with pilon

Installation

pip install ProkaryoteAssembly

Usage

The first script, prokaryote_assemble.py, operates on a single sample at a time.

Usage: prokaryote_assemble.py [OPTIONS]

Options:
  -1, --fwd_reads PATH  Path to forward reads (R1) (gzipped FASTQ).
                        [required]
  -2, --rev_reads PATH  Path to reverse reads (R2) (gzipped FASTQ).
                        [required]
  -o, --out_dir PATH    Root directory to store all output files.  [required]
  -m, --memory TEXT     Amount of memory to allocate to job. e.g. "8g".
                        Defaults to 8g.
  --cleanup             Specify this flag to delete all intermediary files
                        except the resulting FASTA assembly.
  --version             Specify this flag to print the version and exit.
  --help                Show this message and exit.

The second script, prokaryote_assemble_dir.py, will detect all *.fastq.gz files in a directory and run the assembly pipeline on each sample it can pair.

Usage: prokaryote_assemble_dir.py [OPTIONS]

Options:
  -i, --input_dir PATH  Directory containing all *.fastq.gz files to
                        assemble.NOTE: Files must be gzipped in order to be
                        detected.  [required]
  -o, --out_dir PATH    Root directory to store all output files.  [required]
  -f, --fwd_id TEXT     Pattern to detect forward reads. Defaults to "_R1".
  -r, --rev_id TEXT     Pattern to detect reverse reads. Defaults to "_R2".
  -m, --memory TEXT     Memory to allocate to pilon call. Defaults to 8g (i.e.
                        pilon -Xmx8g). May need to provide a large amount of
                        memory for large read sets/assemblies.
  --cleanup             Specify this flag to delete all intermediary files
                        except the resulting FASTA assembly.
  --version             Specify this flag to print the version and exit.
  --help                Show this message and exit.

Python (3.6) Dependencies

  • click

External Dependencies

NOTE: All external dependencies must be available via PATH.

Versions confirmed to work are in brackets.

  • skesa (SKESA v.2.1-SVN_551987:557549M)
  • BBMap (BBMap version 38.22)
  • samtools (samtools 1.8 using htslib 1.8)
  • pilon (Pilon version 1.22)

Note: Strongly recommend installing pilon via conda e.g. https://bioconda.github.io/recipes/pilon/README.html

conda install pilon

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ProkaryoteAssembly-0.1.6.tar.gz (6.1 kB view details)

Uploaded Source

File details

Details for the file ProkaryoteAssembly-0.1.6.tar.gz.

File metadata

  • Download URL: ProkaryoteAssembly-0.1.6.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for ProkaryoteAssembly-0.1.6.tar.gz
Algorithm Hash digest
SHA256 360e41b7ce57e0930a36c7fcbd50ac6e0ef73098b954914cbe85cdf4f7abce8d
MD5 baa0c738871c06d2bb448aed473f54ad
BLAKE2b-256 9d8a41ece43539234ebab781571c043eb9d67b6c67f742df9c7c0e6e7bc63dde

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page