Python library to design sgRNA oligos
Project description
sgrna_designer
Python library to design sgRNAs for CRISPR tiling screens
The primary function of this package is design_sgrna_tiling_library
, in which you can input a list of
ensembl transcript IDs, specify a region of interest (e.g. three_prime_UTR) and get all sgRNAs
tiling those transcript regions.
Install
pip install git+https://github.com/gpp-rnd/sgrna_designer.git#egg=sgrna_designer
An example
In this example we'll design sgRNAs tiling the 3' UTR of PDL1 (CD274) and BRAF
Note: You must also have pandas installed to run this tutorial
from sgrna_designer.design import design_sgrna_tiling_library
target_transcripts = ['ENST00000381577', 'ENST00000644969'] # [PDL1, BRAF]
Note the design function is agnostic to CRISPR enzyme and pam preferences, so you must specifiy the following parameters in a design run:
- region: broad region you are trying to target (e.g. UTR)
- region: more specific region you are trying to target (e.g. three_prime_UTR)
- expand_3prime: amount to expand region in 3' direction
- expand_5prime: amount to expand region in 5' direction
- context_len: length of context sequence
- pam_start: position of PAM start relative to the context sequence
- pam_len: length of PAM
- sgrna_start: position of sgRNA relative to context sequence
- sgrna_len: length of sgRNA sequence
- pams: PAMs to target
- sg_positions: positions within the sgRNA to annotate and target (e.g. [4,8] for nucleotides 4 and 8 of the sgRNA for a base editing window)
sgrna_designs = design_sgrna_tiling_library(target_transcripts, region_parent='UTR',
region='three_prime_UTR', expand_3prime=30,
expand_5prime=30, context_len=30, pam_start=-6,
pam_len=3, sgrna_start=4, sgrna_len=20,
pams=['AGG', 'CGG', 'TGG', 'GGG'],
sg_positions=[4, 8], flag_seqs=['TTTT', 'CGTCTC', 'GAGACG'],
flag_seqs_start=['TCTC', 'AGACG'], flag_seqs_end=['GAGAC'])
sgrna_designs
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
context_sequence | pam_sequence | sgrna_sequence | sgrna_global_start | sgrna_global_4 | sgrna_global_8 | sgrna_strand | object_type | transcript_strand | transcript_id | chromosome | region_id | region_start | region_end | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CATTGGAACTTCTGATCTTCAAGCAGGGAT | AGG | GGAACTTCTGATCTTCAAGC | 5467872 | 5467875 | 5467879 | 1 | three_prime_UTR | 1 | ENST00000381577 | 9 | ENST00000381577 | 5467863 | 5470554 |
1 | ATTGGAACTTCTGATCTTCAAGCAGGGATT | GGG | GAACTTCTGATCTTCAAGCA | 5467873 | 5467876 | 5467880 | 1 | three_prime_UTR | 1 | ENST00000381577 | 9 | ENST00000381577 | 5467863 | 5470554 |
2 | CTTCAAGCAGGGATTCTCAACCTGTGGTTT | TGG | AAGCAGGGATTCTCAACCTG | 5467888 | 5467891 | 5467895 | 1 | three_prime_UTR | 1 | ENST00000381577 | 9 | ENST00000381577 | 5467863 | 5470554 |
3 | GCAGGGATTCTCAACCTGTGGTTTAGGGGT | AGG | GGATTCTCAACCTGTGGTTT | 5467894 | 5467897 | 5467901 | 1 | three_prime_UTR | 1 | ENST00000381577 | 9 | ENST00000381577 | 5467863 | 5470554 |
4 | CAGGGATTCTCAACCTGTGGTTTAGGGGTT | GGG | GATTCTCAACCTGTGGTTTA | 5467895 | 5467898 | 5467902 | 1 | three_prime_UTR | 1 | ENST00000381577 | 9 | ENST00000381577 | 5467863 | 5470554 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
845 | GCTCAGGTCCCTTCATTTGTACTTTGGAGT | TGG | AGGTCCCTTCATTTGTACTT | 140719570 | 140719567 | 140719563 | -1 | three_prime_UTR | -1 | ENST00000644969 | 7 | ENST00000644969 | 140719337 | 140726493 |
846 | TATAACAGAAAATATTGTTCAGTTTGGATA | TGG | ACAGAAAATATTGTTCAGTT | 140719522 | 140719519 | 140719515 | -1 | three_prime_UTR | -1 | ENST00000644969 | 7 | ENST00000644969 | 140719337 | 140726493 |
847 | ATTGTTCAGTTTGGATAGAAAGCATGGAGA | TGG | TTCAGTTTGGATAGAAAGCA | 140719509 | 140719506 | 140719502 | -1 | three_prime_UTR | -1 | ENST00000644969 | 7 | ENST00000644969 | 140719337 | 140726493 |
848 | TATTTAAAAACTGTATTATATAAAAGGCAA | AGG | TAAAAACTGTATTATATAAA | 140719426 | 140719423 | 140719419 | -1 | three_prime_UTR | -1 | ENST00000644969 | 7 | ENST00000644969 | 140719337 | 140726493 |
849 | CTGCTATAATAAAGATTGACTGCATGGAGA | TGG | TATAATAAAGATTGACTGCA | 140719360 | 140719357 | 140719353 | -1 | three_prime_UTR | -1 | ENST00000644969 | 7 | ENST00000644969 | 140719337 | 140726493 |
850 rows × 14 columns
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sgrna_designer-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b10eb39e2fb7d3a5c16d86baed81c76d428ba5c64fa6611a58c2107beecaf5db |
|
MD5 | 21e6e243ba2a6a920a15f30fd6e234bb |
|
BLAKE2b-256 | 205b81f7a607fe5bef721aab0b6d937a49f8cd85c4c6dc8548bab0fc90058758 |