Skip to main content

Assemble contigs into a chromosome-scalse pseudo-assembly using alignments to a reference sequence.

Project description

GPatch

Assemble contigs into a chromosome-scalse pseudo-assembly using alignments to a reference sequence.

Starting with alignments of contigs to a reference genome, produce a chromosome-scale pseudoassembly by patching gaps between mapped contigs with sequences from the reference.

Dependencies

We recommend using minimap2 for alignment, using the -a option to generate SAM output.

Installation

We recommend installing with conda, into a new environment:

conda create -n GPatch -c conda-forge -c bioconda Bio pysam minimap2 samtools GPatch

Install with pip:

pip install GPatch

Installation from the github repository is not recommended. However, if you must, follow the steps below:

  1. git clone https://github.com/adadiehl/GPatch
  2. cd GPatch/
  3. python3 -m pip install -e .

Usage

usage: GPatch.py [-h] -q SAM/BAM -r FASTA [-x STR] [-b FILENAME] [-m N]
                 [-w PATH] [-d]

Starting with alignments of contigs to a reference genome, produce a chromosome-scale pseudoassembly by patching gaps between mapped contigs with sequences from the reference. Reference chromosomes with no mapped contigs are printed to output unchanged.

Required Arguments

Argument Description
-q SAM/BAM, --query_bam SAM/BAM Path to SAM/BAM file containing non-overlapping contig mappings to the reference genome.
-r FASTA, --reference_fasta FASTA Path to reference genome fasta.

Optional Arguments:

Argument Description
-h, --help Show this help message and exit.
-x STR, --prefix STR Prefix to add to output file names. Default=None
-b FILENAME, --store_final_bam FILENAME Store the final set of primary contig alignments to the given file name. Default: Do not store the final BAM.
-m N, --min_qual_score N Minimum mapping quality score to retain an alignment. Default=30
-w PATH, --whitelist PATH Path to BED file containing whitelist regions: i.e., the inverse of blacklist regions. Supplying this will have the effect of excluding alignments that fall entirely within blacklist regions. Default=None
-d, --drop_missing Omit unpatched reference chromosome records from the output if no contigs map to them. Default: Unpatched chromosomes are printed to output unchanged.
-t, --no_trim Do not trim the 5-prime end of contigs whose mappings overlap the previously-placed contig. Default: Overlapping contig sequence will be trimmed at the previous 3-prime contig breakpoint.

Output

GPatch produces three output files:

File Description
patched.fasta The final patched genome.
contigs.bed Location of contigs in the coordinate frame of the patched genome.
patches.bed Location of patches in the coordinate frame of the reference genome.

Citing GPatch

Please use the following citation if you use this software in your work:

Fast and Accurate Draft Genome Patching with GPatch Adam Diehl, Alan Boyle bioRxiv 2025.05.22.655567; doi: https://doi.org/10.1101/2025.05.22.655567

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GPatch-0.3.8.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

GPatch-0.3.8-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file GPatch-0.3.8.tar.gz.

File metadata

  • Download URL: GPatch-0.3.8.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.4

File hashes

Hashes for GPatch-0.3.8.tar.gz
Algorithm Hash digest
SHA256 2a36a5911f90cf44d100b2cff538bb8d76fb84447cc70a93fbe8bb6fb5b54384
MD5 3fe0def754b7768f1a038c924e97a359
BLAKE2b-256 2b23a29550351faeaf376f1fb6cc0d920165f6be74826ad4f8d37300007f5e28

See more details on using hashes here.

File details

Details for the file GPatch-0.3.8-py3-none-any.whl.

File metadata

  • Download URL: GPatch-0.3.8-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.4

File hashes

Hashes for GPatch-0.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 af2cfc640911b31b637ef64a683c67fde4a53ee20f185e28eb0079459151365d
MD5 dc07741bbc1c04c93b8a68978fc2fbdf
BLAKE2b-256 26402389ff9755ea3da319b3968b5eaaf0ada179f45bcda7d9006e17cbeb7850

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page