Skip to main content

Generate minimum gRNA set for multiple non-reference genomes

Project description

MINORg

Minimum Non-Reference gRNA finder

  • Finds the minimum gRNA set required to target multiple alignable genes in multiple non-reference genomes
  • Available as both command line application and Python package

Preprint: https://www.biorxiv.org/content/10.1101/2022.03.10.481891

Availability

Some dependencies are not available for Windows. Windows users should use a Linux emulator to run MINORg.

Installation

A version of MINORg is available at test.pypi.org. You may follow this guide to install MINORg and its dependencies.

Requirements

Links

IMPT

Please refer to slides/PDF in the 'Links' section for execution details for the version on the workstation (accessible only to lab members and guests with accounts).

Overview of steps

  1. Identify candidate targets in non-reference genome
    1. Extract user-specified reference gene(s) from a reference genome (.fasta) using GFF3 annotation (.gff)
      • Sequence(s) include introns
      • Optional: User may specify a protein domain (using CDD PSSM-ID) to restrict search
      1. CDS-only regions of user-specified reference gene(s) from a reference genome (.fasta) will be extracted and translated using GFF3 annotation
      2. RPS-BLAST protein sequence(s) to domain database and identify domain range(s)
      3. Extract user-specified reference gene(s) from a reference genome (.fasta) using GFF3 annotation (.gff) and restricted to the corresponding genomic coordinates of the domains
    2. BLASTn reference gene(s) against non-reference genome(s) (.fasta)
    3. Filter hits by minimum % identity (optional)
    4. Merge overlapping hits within specified distance of each other (to accommodate introns/insertions)
    5. Filter merged hits for minimum length and % identity into target sequences
    6. Filter target sequences for those with best alignment to target genes(s) (optional)
      • Ensures that genes that are similar but not part of the set of user-specified target gene(s) will not be targeted
  2. Identify candidate gRNA in non-reference targets
    1. Restricted by user-specified PAM and gRNA length
  3. Screen candidate gRNA
    1. Eliminate candidate gRNA with off-target hits
      1. Mask targets in non-reference genome(s) (.fasta)
        • Only regions the length of targets with 100% identity to targets will be masked
        • All non-reference genomes provided will be screened simultaneously so all candidate gRNA that pass this screening test should not have off-targets in any of the non-reference genomes provided
        • User may also provide sequences to check against
      2. BLASTn candidate gRNA against masked non-reference and reference genome(s)
        • Optional: Screen reference genome also
      3. Eliminate candidate gRNA with hits outside masked regions in non-reference genome(s) and fail maximum match/gaps criteria
    2. Eliminate candidate gRNA that do not align within the CDS of reference genes
      1. Extract CDS-only regions of user-specified reference gene(s) from a reference genome (.fasta) using GFF3 annotation (.gff)
        • If the user specified a domain, the range will be restricted accordingly
      2. Align non-reference target sequences (output of step 1.5) with reference sequences from steps 1.1 (or 1.1.3 if domain is specified) and 2.1
      3. For all candidate gRNA, check their position in the alignment (based on where in each non-reference target they originate) and eliminate any gRNA that do not align within AT LEAST ONE reference gene's desired feature
        • Users may change the desired feature (default is CDS)
  4. Find minimum gRNA set that covers all target sequences

Inputs

  • Step 1
    • Data:
      • Reference genome (--ref xxx.fasta)
      • Reference GFF3 annotation (--gff xxx.gff)
      • Non-reference sequences/genome (--nonref xxx.fasta)
    • Parameters:
      • Gene IDs (--gene)
        • Used with:
          • Accession/individual (-i) OR
          • Query fasta file (-q xxx.fasta)
      • Target sequences (--target xxx.fasta)
    • Optional parameters:
      • Minimum hit % identity (--minid 85 (%))
      • Minimum candidate target length (--minlen 0 (bp))
      • Maximum merge buffer (--buffer 100 (bp))
    • Optional for domain restriction:
      • PSSM-ID (--domain) and rpsblast+ database (--db)
  • Step 2
    • Parameters:
      • PAM (--pam SpCas9)
      • gRNA length (--length 20 (bp))
  • Step 3
    • Optional parameters:
      • Minimum off-target gaps (--ot-gap 0)
      • Minimum off-target mismatch (--ot-mismatch 1 (bp))
    • Optional data:
      • Background sequences (--background xxx.fasta)
  • Step 4
    • Optional paramters:
      • Number of sets to output (--set 1)
      • Manually approve each gRNA set (--manual)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minorg-0.2.3.4a0.tar.gz (182.3 kB view hashes)

Uploaded Source

Built Distribution

minorg-0.2.3.4a0-py3-none-any.whl (194.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page