Generate minimum gRNA set for multiple non-reference genomes
Project description
MINORg
Minimum Non-Reference gRNA finder
- Finds the minimum gRNA set required to target multiple alignable genes in multiple non-reference genomes
- Available as both command line application and Python package
Preprint: https://www.biorxiv.org/content/10.1101/2022.03.10.481891
Availability
Some dependencies are not available for Windows. Windows users should use a Linux emulator to run MINORg.
Installation
A version of MINORg is available at test.pypi.org. You may follow this guide to install MINORg and its dependencies.
Requirements
Links
- Tutorial, example, and documentation: https://rlrq.github.io/MINORg
- Detailed overview of steps in the programme: https://tinyurl.com/sr84ae9e (Google slides) (not up to date)
- Flowchart to select & use appropriate input parameters: https://tinyurl.com/jyke76b8 (PDF) (not up to date)
IMPT
Please refer to slides/PDF in the 'Links' section for execution details for the version on the workstation (accessible only to lab members and guests with accounts).
Overview of steps
- Identify candidate targets in non-reference genome
- Extract user-specified reference gene(s) from a reference genome (.fasta) using GFF3 annotation (.gff)
- Sequence(s) include introns
- Optional: User may specify a protein domain (using CDD PSSM-ID) to restrict search
- CDS-only regions of user-specified reference gene(s) from a reference genome (.fasta) will be extracted and translated using GFF3 annotation
- RPS-BLAST protein sequence(s) to domain database and identify domain range(s)
- Extract user-specified reference gene(s) from a reference genome (.fasta) using GFF3 annotation (.gff) and restricted to the corresponding genomic coordinates of the domains
- BLASTn reference gene(s) against non-reference genome(s) (.fasta)
- Filter hits by minimum % identity (optional)
- Merge overlapping hits within specified distance of each other (to accommodate introns/insertions)
- Filter merged hits for minimum length and % identity into target sequences
- Filter target sequences for those with best alignment to target genes(s) (optional)
- Ensures that genes that are similar but not part of the set of user-specified target gene(s) will not be targeted
- Extract user-specified reference gene(s) from a reference genome (.fasta) using GFF3 annotation (.gff)
- Identify candidate gRNA in non-reference targets
- Restricted by user-specified PAM and gRNA length
- Screen candidate gRNA
- Eliminate candidate gRNA with off-target hits
- Mask targets in non-reference genome(s) (.fasta)
- Only regions the length of targets with 100% identity to targets will be masked
- All non-reference genomes provided will be screened simultaneously so all candidate gRNA that pass this screening test should not have off-targets in any of the non-reference genomes provided
- User may also provide sequences to check against
- BLASTn candidate gRNA against masked non-reference and reference genome(s)
- Optional: Screen reference genome also
- Eliminate candidate gRNA with hits outside masked regions in non-reference genome(s) and fail maximum match/gaps criteria
- Mask targets in non-reference genome(s) (.fasta)
- Eliminate candidate gRNA that do not align within the CDS of reference genes
- Extract CDS-only regions of user-specified reference gene(s) from a reference genome (.fasta) using GFF3 annotation (.gff)
- If the user specified a domain, the range will be restricted accordingly
- Align non-reference target sequences (output of step 1.5) with reference sequences from steps 1.1 (or 1.1.3 if domain is specified) and 2.1
- For all candidate gRNA, check their position in the alignment (based on where in each non-reference target they originate) and eliminate any gRNA that do not align within AT LEAST ONE reference gene's desired feature
- Users may change the desired feature (default is CDS)
- Extract CDS-only regions of user-specified reference gene(s) from a reference genome (.fasta) using GFF3 annotation (.gff)
- Eliminate candidate gRNA with off-target hits
- Find minimum gRNA set that covers all target sequences
Inputs
- Step 1
- Data:
- Reference genome (--ref xxx.fasta)
- Reference GFF3 annotation (--gff xxx.gff)
- Non-reference sequences/genome (--nonref xxx.fasta)
- Parameters:
- Gene IDs (--gene)
- Used with:
- Accession/individual (-i) OR
- Query fasta file (-q xxx.fasta)
- Used with:
- Target sequences (--target xxx.fasta)
- Gene IDs (--gene)
- Optional parameters:
- Minimum hit % identity (--minid 85 (%))
- Minimum candidate target length (--minlen 0 (bp))
- Maximum merge buffer (--buffer 100 (bp))
- Optional for domain restriction:
- PSSM-ID (--domain) and rpsblast+ database (--db)
- Data:
- Step 2
- Parameters:
- PAM (--pam SpCas9)
- gRNA length (--length 20 (bp))
- Parameters:
- Step 3
- Optional parameters:
- Minimum off-target gaps (--ot-gap 0)
- Minimum off-target mismatch (--ot-mismatch 1 (bp))
- Optional data:
- Background sequences (--background xxx.fasta)
- Optional parameters:
- Step 4
- Optional paramters:
- Number of sets to output (--set 1)
- Manually approve each gRNA set (--manual)
- Optional paramters:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
minorg-0.2.3.4a0.tar.gz
(182.3 kB
view hashes)
Built Distribution
minorg-0.2.3.4a0-py3-none-any.whl
(194.8 kB
view hashes)
Close
Hashes for minorg-0.2.3.4a0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6a04af97f6999d400fc62d818783bd9c237cd41cd6b33687df7b7c6f716b7ef |
|
MD5 | 539d8cb235c0832240dae37b18d0408a |
|
BLAKE2b-256 | 612fda221122001f46f689caa0a4c076db48e38c048e9c32c951a997bfb2710f |