Skip to main content

primer design for haplotype genotyping using indel information

Project description

primervcf package

PyPI License

Installation:

The package worked with python version >=3.8. Only tested in linux x64 system.

python package:

  • primer3-py>=0.6.1
  • pyssw >=0.1.7
  • primerdiffer >=0.1.4
  • biopython>=1.7.8
  • PyVCF3>=1.0.3

other program in your $PATH:

  • ncbi-blast

example code to install the packages under python3 with pip and conda

pip install primervcf # will also install all other python dependencies 
conda install -c bioconda blast # install ncbi blast, which is not included in pip installation

Walkthrough

Design primers based on the VCF file provided, parser the deletion, and force primers to overalp the deletion

  • Check all long indel regions (>=indel_cutoff, default is 10) to design primers.
  • Check local region to make sure the indel is not a local tamdem repeat (using ssw alignment).
  • Use the whole genome (--genome1) as db to make a specificity check, to ensure the primer can only amplify one region.
  • The dis-similarity between genome and the strain could be individual/strain/population level (<=1%).

Use C.briggsae QR25 strain as example. The fasta files and the vcf can be downloaded separately from cb5.fa and from QR25.vcf.

Quick example

vcf2del.py -f QR25.vcf -o QR25_del.bed
primerdesign_vcf.py -g cb5.fa -b QR25_del.bed --prefix QR25

Explanations:

usage: vcf2del.py [-h] [-d WKDIR] [-f FILE] [-o OUT] [-l LENGTH]
optional arguments:
  -h, --help            show this help message and exit
  -d WKDIR, --wkdir WKDIR
                        The dir path contain the file, if not given, use the current dir
  -f FILE, --file FILE  the vcf file used as input
  -o OUT, --out OUT     bed file output
  -l LENGTH, --length LENGTH
                        the min length for a deletion to be used, default is 10
# run example
vcf2del.py -f QR25.vcf -o QR25_del.bed

# check the result in file "QR25_del.bed"
head QR25_del.bed
ChrI    144730  144744
ChrI    423135  423167
ChrI    461207  461222
ChrI    465639  465661
ChrI    492790  492800

# run primerdesign_vcf.py to output primer
usage: primerdesign_vcf.py [-h] [-d WKDIR] [-g GENOME] [-b BEDFILE] [--alignlen ALIGNLEN] [--free3len FREE3LEN]
                           [-n PRIMERNUMBER] [--debug DEBUG] [--prefix PREFIX] [--primer3config PRIMER3CONFIG]

optional arguments:
  -h, --help            show this help message and exit
  -d WKDIR, --wkdir WKDIR
                        The dir path contain the file, if not given, use the current dir
  -g GENOME, --genome GENOME
                        the fasta file used to design primer and check specificity
  -b BEDFILE, --bedfile BEDFILE
                        the bed file containing the deletion region interval
  --alignlen ALIGNLEN   the cutoff of primer min align length as a right hit, default is 16
  --free3len FREE3LEN   the cutoff of primer 3' align length as a right hit, default is 2
  -n PRIMERNUMBER, --primernumber PRIMERNUMBER
                        the primer designed for each region, default is 5, do not have much impact for primer design
  --debug DEBUG         open debug mode or not, default is no
  --prefix PREFIX       prefix of output file, default is primers
  --primer3config PRIMER3CONFIG
                        the config file for the primer3

# run example, chose only the first 10 deletion to design primers
head QR_25_del.bed > QR25_delhead.bed
primerdesign_vcf.py -g cb5.fa -b QR25_delhead.bed --prefix QR25

Control parameters: design primers with given parameters

The default primer design parameter is described in general_setting.py:

primer3_general_settings =  {
        'PRIMER_OPT_SIZE': 20,
        'PRIMER_PICK_LEFT_PRIMER':1,
        'PRIMER_PICK_RIGHT_PRIMER':1,
        'PRIMER_PICK_INTERNAL_OLIGO': 0,
        'PRIMER_MIN_SIZE': 18,
        'PRIMER_MAX_SIZE': 23,
        'PRIMER_OPT_TM': 57.0,
        'PRIMER_MIN_TM': 46.0,
        'PRIMER_MAX_TM': 63.0,
        'PRIMER_MIN_GC': 20.0,
        'PRIMER_MAX_GC': 80.0,
        'PRIMER_PRODUCT_SIZE_RANGE': [[250, 650]],
        'PRIMER_NUM_RETURN':10,
        'PRIMER_MIN_THREE_PRIME_DISTANCE':10,
        'PRIMER_LIB_AMBIGUITY_CODES_CONSENSUS':0
}

The new parameter can be supplemnted as a file with the terms which need to be changed, for example, create a file with name of config.txt

# content of config.txt
# only changed the product size, the others are the same the default
{'PRIMER_PRODUCT_SIZE_RANGE': [[100, 200]]}

Other scripts which might be useful if you want to start from fastq mapping

usage: fq2vcf.py [-h] [-d WKDIR] [-f FILE] [-g GENOME] [-p PREFIX] [-c CORE]

optional arguments:
  -h, --help            show this help message and exit
  -d WKDIR, --wkdir WKDIR
                        The dir path contain the file, if not given, use the current dir
  -f FILE, --file FILE  the fastq file used as input
  -g GENOME, --genome GENOME
                        the genome file used for mapping
  -p PREFIX, --prefix PREFIX
                        the outvcf filename prefix, default is using the file prefix of fastq
  -c CORE, --core CORE  the core used for bwa mem mapping and samtools sort, default is 16

# this script require the bwa mem in your $PATH
# which can be installed by "conda install -c bioconda bwa"
# example, the SRR1793006 reads are from QR25 strain, can be downloaded from NCBI SRA database
fq2vcf.py -f SRR1793006_1.fastq,SRR1793006_2.fastq -g cb5.fa -c 32

Citation:

Please kindly cite our paper in bioinformatics if you find primerdiffer or primervcf useful in your work: Link to publication.

Ren, X., Shao, Y., Zhang, Y., Ni, Y., Bi, Y., & Li, R. (2023). Primerdiffer: a python command line module for large-scale primer design in haplotype genotyping. Bioinformatics, btad188.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

primervcf-0.1.5-py3-none-any.whl (17.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page