nPhase is a command line ploidy agnostic phasing pipeline and algorithm which phases samples of any ploidy with sequence alignment of long and short read data to a reference sequence.

These details have not been verified by PyPI

Project links

Homepage

Project description

nPhase

Ploidy agnostic phasing pipeline and algorithm

Alt text

nPhase is a ploidy agnostic tool developed in python which predicts the haplotypes of a sample that was sequenced by both long and short reads by aligning them to a reference. It should work with any ploidy.

Quick-start

If you have bioconda you can install nPhase by running the following commands in your terminal:

conda create -n polyploidPhasing python=3.8
conda activate polyploidPhasing
conda install -c oakheart nphase

Then you can phase your data with the following command:

nphase pipeline --sampleName Individual_1 --reference /path/to/Individual_referenceGenome.fasta 
                --R1 /path/to/Individual_1_shortReads_R1.fastq.gz --R2 /path/to/Individual_1_shortReads_R2.fastq.gz
                --longReads /path/to/Individual_1_longReads.fastq.gz --longReadPlatform [ont|pacbio]
                --output /path/to/outputFolder

Installation

With bioconda

Install bioconda and set the correct channels by following the first two steps here: https://bioconda.github.io/user/install.html

Then you can create a new environment and install nPhase with the following commands:

conda create -n polyploidPhasing python=3.8
conda activate polyploidPhasing
conda install -c oakheart nphase

Without bioconda

Pre-requisites

You will have to install the following software before nPhase:

*Currently, installing samtools v1.10 or higher will cause an error to occur due to a bug in the way ngmlr handles MAPQ scores.

Installation via PyPI

You can now install nPhase via

pip install -U nPhase

Usage

There are two ways to run nPhase:

nphase pipeline will run the entire pipeline from start to finish and requires the following inputs:

nphase pipeline --sampleName SAMPLE_NAME --reference REFERENCE --output OUTPUT_FOLDER --longReads LONG_READ_FILE
                --longReadPlatform {ont,pacbio} --R1 SHORT_READ_FILE_R1 --R2 SHORT_READ_FILE_R2

Optional parameters are described further down.

nphase algorithm will only run the phasing algorithm, it requires inputs generated by nphase pipeline. This is useful if you want to test different paramaters on your dataset after having generated the pre-processed files once. Here are the inputs required by nphase algorithm:

nphase algorithm --sampleName SAMPLE_NAME --reference REFERENCE --output OUTPUT_FOLDER --longReads LONG_READ_FILE
                 --contextDepth CONTEXT_DEPTHS_FILE --processedLongReads VALIDATED_SNP_ASSIGNMENTS_FILE

Optional parameters are described further down.

Parameters

nphase pipeline [-h] [--threads [THREADS]] [--maxID [MAXID]] [--minOvl [MINOVL]] [--minSim [MINSIM]] [--minLen [MINLEN]] --sampleName SAMPLE_NAME
                       --reference REFERENCE --output OUTPUT_FOLDER --longReads LONG_READ_FILE --longReadPlatform {ont,pacbio} --R1 SHORT_READ_FILE_R1 --R2
                       SHORT_READ_FILE_R2
or
nphase algorithm [-h] [--threads [THREADS]] [--maxID [MAXID]] [--minOvl [MINOVL]] [--minSim [MINSIM]] [--minLen [MINLEN]] --sampleName SAMPLE_NAME
                        --reference REFERENCE --output OUTPUT_FOLDER --longReads LONG_READ_FILE --contextDepth CONTEXT_DEPTHS_FILE --processedLongReads
                        VALIDATED_SNP_ASSIGNMENTS_FILE

positional arguments:
    pipeline            Run the entire nPhase pipeline on your sample
    algorithm           Only run the nPhase algorithm. NOTE: This will require files generated by running the pipeline mode

arguments always required:
  --sampleName STRAINNAME
                        Name of your sample, ex: "Individual_1"
  --reference REFERENCE
                        Path to fasta file of reference genome to align to, ex: /home/reference/Individual_reference.fasta
  --output OUTPUTFOLDER
                        Path to output folder, ex: /home/phased/
  --longReads LONGREADFILE
                        Path to long read FastQ file, ex: /home/longReads/Individual_1.fastq.gz

additional arguments required by nphase pipeline:
  --longReadPlatform {ont,pacbio}
                        Long read platform, must be 'ont' or 'pacbio'
  --R1 SHORTREADFILE_R1
                        Path to paired end short read FastQ file #1, ex: /home/shortReads/Individual_1_R1.fastq.gz
  --R2 SHORTREADFILE_R2
                        Path to paired end short read FastQ file #2, ex: /home/shortReads/Individual_1_R2.fastq.gz

additional arguments required by nphase algorithm:
  --contextDepth CONTEXTDEPTHSFILE
                        Path to context depths file, ex: /home/phased/Individual_1/Overlaps/Individual_1.contextDepths.tsv
  --processedLongReads VALIDATEDSNPASSIGNMENTSFILE
                        Path to validated long read SNPs, ex:
                        /home/phased/Individual_1/VariantCalls/longReads/Individual_1.hetPositions.SNPxLongReads.validated.tsv

optional arguments:
  -h, --help            show this help message and exit
  --threads [THREADS]   Number of threads to use on some steps, default 8
  --maxID [MAXID]       MaxID parameter, determines how different two clusters must be to prevent them from merging. Default 0.05
  --minOvl [MINOVL]     minOvl parameter, determines the minimal percentage of overlap required to allow a merge between two clusters that have fewer than
                        100 heterozygous SNPs in common. Default 0.1
  --minSim [MINSIM]     minSim parameter, determines the minimal percentage of similarity required to allow a merge between two clusters. Default 0.01
  --minLen [MINLEN]     minLen parameter, any cluster based on fewer than N reads will not be output. Default 0

Paper

nPhase: An accurate and contiguous phasing method for polyploids

Media

Online lightning talk I gave about nPhase for an Oxford Nanopore event [5:44]: link

Misc

Current recommendations for default parameters are at least 20X coverage per haplotype (so 3*20=60X for a triploid) and a heterozygosity level of at least 0.4% (average of 1 heterozygous SNP every 250 bp).

It is currently untested on pacbio data so if you have a pacbio dataset (with a known ground truth) please contact me (raise an issue on github or email me), especially if you have errors.

If you have a hybrid sample that has an acquired genomic copy which is genetically distant from the rest of the genome, nPhase will struggle to predict accurate results as the "distant" haplotype will make the other haplotypes look incredibly similar to each other. The solution is to separate the long reads based on their genetic distance to a reference genome. More details in this PDF file.

Contact me

email: oabousaada@unistra.fr
discord: Peaceful#6956

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.2.1

Dec 14, 2023

1.2.0

Apr 21, 2022

1.1.10

Mar 22, 2022

1.1.9

Mar 1, 2022

1.1.8

Feb 5, 2022

1.1.7

Jan 24, 2022

1.1.6

Jan 20, 2022

1.1.5

Jan 20, 2022

1.1.4

Jan 19, 2022

1.1.3

Mar 22, 2021

1.1.2

Feb 23, 2021

1.1.1

Feb 23, 2021

1.1.0

Feb 23, 2021

1.0.14

Oct 28, 2020

This version

1.0.13

Oct 27, 2020

1.0.12

Oct 5, 2020

1.0.11

Aug 7, 2020

1.0.10

Jul 31, 2020

1.0.9

Jul 30, 2020

1.0.8

Jul 29, 2020

1.0.7

Jul 29, 2020

1.0.6

Jul 23, 2020

1.0.5 yanked

Jul 22, 2020

1.0.4 yanked

Jul 22, 2020

1.0.3 yanked

Jul 22, 2020

1.0.2 yanked

Jul 22, 2020

1.0.1 yanked

Jul 22, 2020

1.0.0 yanked

Jul 22, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nPhase-1.0.13.tar.gz (24.2 kB view details)

Uploaded Oct 27, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nPhase-1.0.13-py3-none-any.whl (34.4 kB view details)

Uploaded Oct 27, 2020 Python 3

File details

Details for the file nPhase-1.0.13.tar.gz.

File metadata

Download URL: nPhase-1.0.13.tar.gz
Upload date: Oct 27, 2020
Size: 24.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200917 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.5

File hashes

Hashes for nPhase-1.0.13.tar.gz
Algorithm	Hash digest
SHA256	`fa9bf50bda79cd387a84500983bf38b600b34fdfe3c7d55737bdea81f050ef0e`
MD5	`501ab2fbb6267123d1fbbbbccd020704`
BLAKE2b-256	`a8926eb34d7f8019a3dde40183975b07b75f1b205bbb68480066e29335909044`

See more details on using hashes here.

File details

Details for the file nPhase-1.0.13-py3-none-any.whl.

File metadata

Download URL: nPhase-1.0.13-py3-none-any.whl
Upload date: Oct 27, 2020
Size: 34.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200917 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.5

File hashes

Hashes for nPhase-1.0.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a141dbdeb4aaa5b172e8dbf08098a30ea406be37a195a4de354b2b1763035cee`
MD5	`3ab2de07a311038706d440daccf5b5b0`
BLAKE2b-256	`0210090708023c92ae36cdedd30ddd11ad6dd42a35ae80c96ce50803c1312b22`

See more details on using hashes here.

nPhase 1.0.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nPhase

Quick-start

Installation

With bioconda

Without bioconda

Pre-requisites

Installation via PyPI

Usage

Parameters

Paper

Media

Misc

Contact me

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes