Assemble contigs into a chromosome-scalse pseudo-assembly using alignments to a reference sequence.
Project description
GPatch
Assemble contigs into a chromosome-scalse pseudo-assembly using alignments to a reference sequence.
Starting with alignments of contigs to a reference genome, produce a chromosome-scale pseudoassembly by patching gaps between mapped contigs with sequences from the reference.
Dependencies
- Python >= v3.7
- samtools (https://github.com/samtools/samtools)
- biopython (https://biopython.org/)
- pysam (https://github.com/pysam-developers/pysam)
- minimap2 (https://github.com/lh3/minimap2)
We recommend using minimap2 for alignment, using the -a option to generate SAM output.
Installation
We recommend installing with conda, into a new environment:
conda create -n GPatch -c conda-forge -c bioconda Bio pysam minimap2 samtools GPatch
Install with pip:
pip install GPatch
Installation from the github repository is not recommended. However, if you must, follow the steps below:
- git clone https://github.com/adadiehl/GPatch
- cd GPatch/
- python3 -m pip install -e .
Usage
usage: GPatch [-h] -q SAM/BAM -r FASTA [-x BED] [-b FILENAME] [-m N]
[-d N] [-f FLOAT] [-e FLOAT]
Starting with alignments of contigs to a reference genome, produce a chromosome-scale pseudoassembly by patching gaps between mapped contigs with sequences from the reference. Reference chromosomes with no mapped contigs are printed to output unchanged.
Required Arguments
| Argument | Description |
|---|---|
| -q SAM/BAM, --query_bam SAM/BAM | Path to SAM/BAM file containing non-overlapping contig mappings to the reference genome. |
| -r FASTA, --reference_fasta FASTA | Path to reference genome fasta. |
Optional Arguments:
| Argument | Description |
|---|---|
| -h, --help | Show this help message and exit. |
| -x STR, --prefix STR | Prefix to add to output file names. Default=None |
| -b FILENAME, --store_final_bam FILENAME | Store the final set of primary contig alignments to the given file name. Default: Do not store the final BAM. |
| -m N, --min_qual_score N | Minimum mapping quality score to retain an alignment. Default=30 |
Output
GPatch produces three output files:
| File | Description |
|---|---|
| patched.fasta | The final patched genome. |
| contigs.bed | Location of contigs in the coordinate frame of the patched genome. |
| patches.bed | Location of patches in the coordinate frame of the reference genome. |
Citing GPatch
Please use the following citation if you use this software in your work:
CITATION_HERE
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file GPatch-0.3.5.tar.gz.
File metadata
- Download URL: GPatch-0.3.5.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1443daef746825d0749b7870f6b9e49d874ac4d057486d934f702abd194c52c1
|
|
| MD5 |
a2ff0bdba4674f9cfafc53ed787faae9
|
|
| BLAKE2b-256 |
584cdd1546595fc2970eee77782d5e1c057875ed7d451d953333963f66d03a0d
|
File details
Details for the file GPatch-0.3.5-py3-none-any.whl.
File metadata
- Download URL: GPatch-0.3.5-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7d0e809f8809341a1372acd5d635ff431f612e501142a02ea1b9b048421efa4
|
|
| MD5 |
075c9b8bda12f850ce8f6d0b99e40059
|
|
| BLAKE2b-256 |
5e59e3850f5ce476b1f6124033fad208b5ad6f37e90e70780e7847380c977d7b
|