Skip to main content

Prime editor gRNA design tool

Project description

Version Python versions Platforms

Easy-Prime: an optimized prime editor gRNA design tool based on gradient boosting trees

Easy-Prime provides optimized pegRNA and ngRNA combinations for efficient Prime editing design.

Summary

PE design involves carefully choosing a standard sgRNA, a RT template that contains the desired edits, a PBS that primes the RT reaction, and a ngRNA that nicks the non-edit strand. Usually thousands of combinations are available for one single disired edit. Therefore, it is overwhelming to select the most likely high-efficient candidate from the huge number of combinations.

Easy-Prime applies a machine learning model (i.e., XGboost) that learned important PE design features from public PE amplicon sequencing data to help researchers selecting the best candidate.

Installation

The most easiest way to install Easy-Prime is via conda.


conda create -n genome_editing -c conda-forge -c bioconda -c anaconda -c liyc1989 easy_prime

source activate genome_editing

python -m pip install dna_features_viewer

easy_prime -h

Usage


git clone https://github.com/YichaoOU/easy_prime

cd easy_prime/test

easy_prime -h

easy_prime --version

easy_prime -c config.yaml -f test.vcf

## Will output results to a folder

Easy-Prime also provides a dash application.


git clone https://github.com/YichaoOU/easy_prime

cd easy_prime/dash_app

python main.py

screenshot

Input

A vcf file containing at least 5 columns. See test/test.vcf for examples.

Searching parameters for PE design

Default values are shown in the following yaml files.

genome_fasta: /path/to/genome.fa
scaffold: GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC
debug: 0
n_jobs: 4
min_PBS_length: 8
max_PBS_length: 17
min_RTT_length: 10
max_RTT_length: 25
min_distance_RTT5: 3
max_ngRNA_distance: 100
max_target_to_sgRNA: 10
sgRNA_length: 20
offset: -3
PAM: NGG

Output

The output folder contains:

  • topX_pegRNAs.csv
  • rawX_pegRNAs.csv.gz
  • X_p_pegRNAs.csv.gz
  • summary.csv

The top candidates are provided in topX_pegRNAs.csv. This is a rawX format file.

rawX format

X means the input to machine learning models. Here, rawX basically means the file before machine learning featurization. Specifically, rawX contains 11 + 1 columns. The first 5 columns are from the input vcf file: sample_ID, chr, pos, ref, alt, where sample_ID ends with _candidate_xxx, this indicates the N-th combination. The next 6 columns are genomic coordinates: type, seq, chr, start, end, strand, where the type could be sgRNA, PBS, RTT, or ngRNA. Since for one PE design, it has to have these 4 components, which means that for one unique sample_ID, it has 4 rows specifying the sequences for each of them. The 12-th column, which is optional, is the predicted efficiency; in other words, the Y for machine learning.

Both topX_pegRNAs.csv and rawX_pegRNAs.csv.gz use this format.

X format

X format is the numeric representation of rawX. X_p format appends the predicted efficiency to the last column of X.

Main results

The main results, which is the top condidates, is provided in topX_pegRNAs.csv.

PE design visualization

Users can visualize the predicted combinations using:

easy_prime_vis -f topX_pegRNAs.csv -s /path/to/genome_fasta.fa

This will output pdf files to a result dir.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_prime-1.1.1.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

easy_prime-1.1.1-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file easy_prime-1.1.1.tar.gz.

File metadata

  • Download URL: easy_prime-1.1.1.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for easy_prime-1.1.1.tar.gz
Algorithm Hash digest
SHA256 bd18ab075fea489b832efff279ac981e9fa38e136996b5aafe9d2d53bf3d8a30
MD5 a850fc1bace9bc05a2689362f9805630
BLAKE2b-256 4f84ec411e8c02aeca7b083d80fe74f7a014efbad65e317be49b494d7f8f7fe1

See more details on using hashes here.

File details

Details for the file easy_prime-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: easy_prime-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 32.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for easy_prime-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 afd9ec76e77de360d731eb4f784caf6e6dc84c015552205eb511d25cd28af8db
MD5 e88abcf6177a9e876819124ad6f05c67
BLAKE2b-256 ac395558b13caf65a8ea2880048ec6d239c71235aa8cb82d852f382a4cea573c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page