Prime editor gRNA design tool

These details have not been verified by PyPI

Project links

Homepage

Project description

Easy-Prime: an optimized prime editor gRNA design tool based on gradient boosting trees

Easy-Prime provides optimized pegRNA and ngRNA combinations for efficient PE design.

Summary

PE design involves carefully choosing a standard sgRNA, a RT template that contains the desired edits, a PBS that primes the RT reaction, and a ngRNA that nicks the non-edit strand. Usually thousands of combinations are available for one single disired edit. Therefore, it is overwhelming to select the most likely high-efficient candidate from the huge number of combinations.

Easy-Prime applies a machine learning model (i.e., XGboost) that learns important PE design features from multiple published PE data sources to help researchers selecting the best candidate.

Installation

The most easiest way to install Easy-Prime is via conda (version >=4.9).


conda create -n genome_editing -c cheng_lab easy_prime

source activate genome_editing

easy_prime -h

easy_prime_vis -h

See https://easy-prime.readthedocs.io/en/latest/content/Installation.html for step-by-step installation screenshots.

Usage


## Make sure you have installed Easy-Prime before running the commands below

git clone https://github.com/YichaoOU/easy_prime

cd easy_prime/test

easy_prime -h

easy_prime --version

## Please update the genome_fasta in config.yaml, otherwise an error may occur!

easy_prime -c config.yaml -f test.vcf

## Will output results to a folder

Easy-Prime also provides a dash application.

Please have dash installed before running the dash application.


git clone https://github.com/YichaoOU/easy_prime

cd easy_prime/dash_app

python application.py

screenshot

Easy-Prime on AWS

Please use this URL: http://easy-prime.cc/

Tutorial

Input

vcf input example

VCF headers will be ignored. Only the first 5 columns from the vcf file will be used; they are: chr, pos, name/id, ref, alt.

## comment line, will be ignored
chr9	110184636	FIG5G_HEK293T_HEK3_6XHIS	G	GCACCATCATCACCATCAT
chr1	185056772	FIG5E_U2OS_RNF2_1CG	G	C
chr1	173878832	rs5878	T	C
chr11	22647331	FIG3C_FANCF_7AC_PE3B	T	G
chr19	10244324	EDFIG5B_DNMT1_dPAM	G	T

fasta input example

To specify reference and alternative allele, you need two fasta sequences; _ref is a keyword that will be recognized as the reference allele and _alt is a keyword for target mutations.

>rs2251964_ref
GTTACCAAAGCAAATGACATCTTGTGAAAGGGGAGGTCTGAAAAAAAAAAACAAGTGGGTGGGTTTTTTCAAAGTAGGCCACCGGGCCTGAGATGACCAGAATTCAAATTAGGATGACAGTGTAGTAGGGGAAGCAACCAGAATCGGACCT
>rs2251964_alt
GTTACCAAAGCAAATGACATCTTGTGAAAGGGGAGGTCTGAAAAAAAAAAACAAGTGGGTGGGTTTTTTCAAAGTAGGCCACCGGGCCTGAGATAACCAGAATTCAAATTAGGATGACAGTGTAGTAGGGGAAGCAACCAGAATCGGACCT

The PrimeDesign format input is only supported in the Easy-Prime web server.

Parameters

Genome: only support hg19 for now.

Results

The web output contain two parts:

pegRNA table

In this result table, each predicted sgRNA/ngRNA/RTT/PBS configuration will be provided in 4 rows, they will have the same variant ID and predicted efficiency.

Sequence visualization

By default, the top prediction will be shown automatically.

Input

A vcf file containing at least 5 columns. See test/test.vcf for examples.

Searching parameters for PE design

Default values are shown in the following yaml files.

genome_fasta: /path/to/genome.fa
scaffold: GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC
debug: 0
n_jobs: 4
min_PBS_length: 8
max_PBS_length: 17
min_RTT_length: 10
max_RTT_length: 25
min_distance_RTT5: 3
max_ngRNA_distance: 100
max_target_to_sgRNA: 10
sgRNA_length: 20
offset: -3
PAM: NGG

Output

The output folder contains:

topX_pegRNAs.csv
rawX_pegRNAs.csv.gz
X_p_pegRNAs.csv.gz
summary.csv

The top candidates are provided in topX_pegRNAs.csv. This is a rawX format file.

rawX format

X means the input to machine learning models. Here, rawX basically means the file before machine learning featurization. Specifically, rawX contains 11 + 1 columns. The first 5 columns are from the input vcf file: sample_ID, chr, pos, ref, alt, where sample_ID ends with _candidate_xxx, this indicates the N-th combination. The next 6 columns are genomic coordinates: type, seq, chr, start, end, strand, where the type could be sgRNA, PBS, RTT, or ngRNA. Since for one PE design, it has to have these 4 components, which means that for one unique sample_ID, it has 4 rows specifying the sequences for each of them. The 12-th column, which is optional, is the predicted efficiency; in other words, the Y for machine learning.

Both topX_pegRNAs.csv and rawX_pegRNAs.csv.gz use this format.

X format

X format is the numeric representation of rawX. X_p format appends the predicted efficiency to the last column of X.

Main results

The main results, which is the top condidates, is provided in topX_pegRNAs.csv.

PE design visualization

Users can visualize the predicted combinations using:

easy_prime_vis -f topX_pegRNAs.csv -s /path/to/genome_fasta.fa

This will output pdf files to a result dir.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.2

Jun 10, 2021

1.1.8

May 24, 2021

1.1.3

Jul 10, 2020

1.1.2

Jul 10, 2020

1.1.1

Jul 10, 2020

1.1

Jul 9, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_prime-1.2-2.tar.gz (1.1 MB view details)

Uploaded Jun 10, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

easy_prime-1.2-2-py2.py3-none-any.whl (1.2 MB view details)

Uploaded Jun 10, 2021 Python 2Python 3

File details

Details for the file easy_prime-1.2-2.tar.gz.

File metadata

Download URL: easy_prime-1.2-2.tar.gz
Upload date: Jun 10, 2021
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.0

File hashes

Hashes for easy_prime-1.2-2.tar.gz
Algorithm	Hash digest
SHA256	`f15c3076c5c9956eb1e4b379002a87f98bde85d2d20d110179d47115af31b244`
MD5	`1c79892bc7af3c294a830096c6c920d9`
BLAKE2b-256	`924ba6a4c079b2988c1bf2d1c617b6369e0ad080c6a207b535d1ecee815fa8b9`

See more details on using hashes here.

File details

Details for the file easy_prime-1.2-2-py2.py3-none-any.whl.

File metadata

Download URL: easy_prime-1.2-2-py2.py3-none-any.whl
Upload date: Jun 10, 2021
Size: 1.2 MB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.0

File hashes

Hashes for easy_prime-1.2-2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`65670f0932a922225d9db74ebdbe4477a3800cf3e0805d67bafa2946587833ce`
MD5	`61526368a69af49726b21ee9def7ca71`
BLAKE2b-256	`7a14b5a7d76940eb31e5e5d89e197928c74e485d28cf7dd69bdcc715478eab11`

See more details on using hashes here.

easy-prime 1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Easy-Prime: an optimized prime editor gRNA design tool based on gradient boosting trees

Summary

Installation

Usage

Easy-Prime on AWS

Tutorial

Input

Parameters

Results

Input

Searching parameters for PE design

Output

rawX format

X format

Main results

PE design visualization

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes