Prime editor gRNA design tool
Project description
Easy-Prime: an optimized prime editor gRNA design tool based on gradient boosting trees
Easy-Prime provides optimized pegRNA and ngRNA combinations for efficient Prime editing design.
Summary
PE design involves carefully choosing a standard sgRNA, a RT template that contains the desired edits, a PBS that primes the RT reaction, and a ngRNA that nicks the non-edit strand. Usually thousands of combinations are available for one single disired edit. Therefore, it is overwhelming to select the most likely high-efficient candidate from the huge number of combinations.
Easy-Prime applies a machine learning model (i.e., XGboost) that learned important PE design features from public PE amplicon sequencing data to help researchers selecting the best candidate.
Installation
The most easiest way to install Easy-Prime is via conda.
conda create -n genome_editing -c liyc1989 easy_prime
source activate genome_editing
easy_prime -h
Usage
git clone https://github.com/YichaoOU/easy_prime
cd easy_prime/test
easy_prime -h
easy_prime --version
## Please update the genome_fasta in config.yaml
easy_prime -c config.yaml -f test.vcf
## Will output results to a folder
Easy-Prime also provides a dash application.
git clone https://github.com/YichaoOU/easy_prime
cd easy_prime/dash_app
python main.py
Input
A vcf file containing at least 5 columns. See test/test.vcf
for examples.
Searching parameters for PE design
Default values are shown in the following yaml files.
genome_fasta: /path/to/genome.fa
scaffold: GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC
debug: 0
n_jobs: 4
min_PBS_length: 8
max_PBS_length: 17
min_RTT_length: 10
max_RTT_length: 25
min_distance_RTT5: 3
max_ngRNA_distance: 100
max_target_to_sgRNA: 10
sgRNA_length: 20
offset: -3
PAM: NGG
Output
The output folder contains:
- topX_pegRNAs.csv
- rawX_pegRNAs.csv.gz
- X_p_pegRNAs.csv.gz
- summary.csv
The top candidates are provided in topX_pegRNAs.csv
. This is a rawX format file.
rawX format
X means the input to machine learning models. Here, rawX basically means the file before machine learning featurization. Specifically, rawX contains 11 + 1 columns. The first 5 columns are from the input vcf file: sample_ID, chr, pos, ref, alt, where sample_ID ends with _candidate_xxx
, this indicates the N-th combination. The next 6 columns are genomic coordinates: type, seq, chr, start, end, strand, where the type
could be sgRNA, PBS, RTT, or ngRNA. Since for one PE design, it has to have these 4 components, which means that for one unique sample_ID
, it has 4 rows specifying the sequences for each of them. The 12-th column, which is optional, is the predicted efficiency; in other words, the Y for machine learning.
Both topX_pegRNAs.csv
and rawX_pegRNAs.csv.gz
use this format.
X format
X format is the numeric representation of rawX. X_p
format appends the predicted efficiency to the last column of X.
Main results
The main results, which is the top condidates, is provided in topX_pegRNAs.csv
.
PE design visualization
Users can visualize the predicted combinations using:
easy_prime_vis -f topX_pegRNAs.csv -s /path/to/genome_fasta.fa
This will output pdf files to a result dir.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file easy_prime-1.1.2.tar.gz
.
File metadata
- Download URL: easy_prime-1.1.2.tar.gz
- Upload date:
- Size: 31.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d97c2f928ff8ef1bc0504706df6d515d2e40d58738310bfa0a7fceb7ea1a5bbb |
|
MD5 | fa5b2d2a2bfc4157d8f3def0761314fe |
|
BLAKE2b-256 | 0e8ed95962a01d26d05316802c51ab5671b4039b0ff82a8f9878f443cdec7092 |
File details
Details for the file easy_prime-1.1.2-py3-none-any.whl
.
File metadata
- Download URL: easy_prime-1.1.2-py3-none-any.whl
- Upload date:
- Size: 44.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ccaa16c37e44bbdde035bfb92efd94e5d706d53927ade686ea2671c8b732144 |
|
MD5 | 0191eea6f4b940b054273592bd7e5a83 |
|
BLAKE2b-256 | 1e684abe4d139a756ed53cbfe51220da41d689dd342a681e1119e92f4dc7a0b2 |