PolyGEN engine for designing multiplexed RNA arrays and Golden Gate oligos.
Project description
PolyGEN
Design automation of polycistronic tRNA-based genes containing custom RNAs for assembly in Type IIs restriction enzyme-driven Golden Gate experiments. The backbone of PolyGEN is based on iBioCAD by HamediRad et al. (2019), for which the code is openly available here.
The code takes as input an array of custom RNAs and will compute the finished PTG together with the necessary oligomers to produce all parts from a plasmid containing a gRNA-tRNA template. Currently, the produced PTGs can include sgRNAs, pegRNAs, crRNAs, tigRNAs and other small RNAs. By default, PolyGEN will use the following parameters, which can be varied
- primer melting temperature between 55 and 65 °C if possible
- Digestion by BsaI
- 'tgcc' and 'gttt' as restriction overlaps with the plasmid
- no additional restriction sites in the plasmid
To calculate the primer melting temperatures, PolyGEN uses the same method and parameters as Benchling: SantaLucia (1998).
Prime editing PBS design
For pegRNA design, PolyGEN tests PBS lengths from 8 to 17 nt and chooses the PBS with melting temperature closest to 30 °C. The GenBank output annotates the selected PBS length instead of assuming a fixed 13 nt PBS.
TasA/TasH tigRNA design
PolyGEN includes a Tas guide design page for TIGR-Tas guide RNAs. It currently supports TasA and TasH mature tigRNA architectures.
- TasA targets are represented as two 9 bp programmable spacer regions.
- TasH multiplexing targets are represented as two 8 bp programmable spacer regions. The first 8 bp become spacer A and the reverse complement of the second 8 bp becomes spacer B.
- The mature tigRNA is assembled as edge repeat 5 prime, spacer A, loop repeat, spacer B and edge repeat 3 prime.
- The default TasA scaffold is AACCG, spacer A, AGTAACCCC, spacer B, AGTG. Adjacent TasA array units repeat as right edge followed by the next left edge.
- For adjacent TasA units, the AGTG-AACCG junction is treated as a shared edge boundary. Oligo design verifies that downstream fragment forward oligos carry the complete AGTGAACCG junction across the fragment boundary.
- GenBank exports annotate downstream TasA units with an inherited guide context spanning the upstream AGTG right edge plus the current AACCG-left-edge unit.
- Exact target windows are 18 bp for TasA and 16 bp for TasH.
- Exact target windows can be separated with new lines, spaces, commas, semicolons or vertical bars.
- Scaffold fields can be edited for locus-specific tigRNA scaffolds. DNA or RNA bases are accepted, but the final mature tigRNA must be 36 nt for TasA or the multiplexing unit must be 46 nt for TasH.
Designed tigRNAs can be selected and sent directly to the PTG page using the tigRNA architecture. Only checked rows are transferred; if no rows are checked, the full generated set is transferred. In this architecture, sequences are provided as mature tigRNAs:
tigRNA;sequence0|tigRNA;sequence1|tigRNA;sequence2
The tigRNA architecture does not add tRNAs and does not add a Cpf1 direct repeat. It treats each input as a mature tigRNA unit, then designs oligo-extension fragments for scarless Golden Gate assembly of multiplex tigRNA arrays.
For tigRNA oligo design, PolyGEN chooses internal Golden Gate overhangs from spacer A or spacer B, so fixed edge and loop repeat regions are not used as variable assembly overhang sources. Spacer-A splits are accepted only when enough left-side sequence remains for the oligo-extension fragment. Each fragment is produced from a forward/reverse oligo pair with an overlapping region that can be filled in by polymerase. The reported oligo melting temperature for tigRNA fragments is calculated only from that shared overlap, not from the full oligo sequence.
TasA and TasH overhang selection is Tm-aware. PolyGEN scores valid spacer-A and spacer-B splits by the predicted fill-in overlap Tm of the final oligo pair and prefers combinations where every tigRNA fragment is at least 45 °C. If no valid combination reaches that floor, the best available design is kept and the warning is written to the oligo page, CSV, raw JSON and GenBank outputs.
If no complete optimal overhang set can be found, PolyGEN enters rescue overhang mode. In rescue mode, it still chooses overhangs that appear in curated PolyGEN overhang tables, but it flags that the exact selected collection was not found as a validated optimal set. The warning is shown on the oligo page and written into the CSV, raw JSON and GenBank outputs.
The Reuse PTG border oligos option is applied only to PTG/Cas9 tRNA-processed assemblies. These reusable border oligos are specific to the selected restriction enzyme and 4 bp border linkers. CA and tigRNA assemblies do not use the tRNA architecture, so PolyGEN always designs new target-specific border oligos for those modes.
The GenBank output annotates each mature tigRNA with edge repeat 5 prime, spacer A, loop repeat, spacer B and edge repeat 3 prime features.
GenBank annotations
PolyGEN annotates generated GenBank files with readable labels and notes for each array unit, spacer, scaffold, tRNA, direct repeat, tigRNA subfeature, pegRNA PBS/RT-template segment and primer-binding site. These labels preserve the link between each RNA part and the sequence it targets after the array has been assembled.
For TasH multiplexing units, GenBank exports also mark the processing cuts inside the identical edge repeats and annotate the predicted mature tigRNA retained after edge-repeat maturation.
For TasH arrays, adjacent units share the identical edge repeat at each junction. PolyGEN keeps the first full unit and removes the duplicated leading edge repeat from each subsequent TasH unit during array assembly, while annotating the shared repeat as part of both neighboring units.
TasH oligo design uses the same shared-edge model: downstream fragment forward oligos include the shared edge-repeat overlap from the previous unit, and the fragment table notes when a leading edge repeat was shared rather than duplicated.
Setup Linux/Mac
First, install docker by running apt-get install docker
In the terminal, run through the following pipeline
- Clone this repository via
git clone https://git.hhu.de/urquizag/polygen - Navigate into the cloned repo
cd polygen - execute
docker-compose up(requires docker desktop to be active)
In the browser, open localhost:5000
PyPI engine package
The reusable Python engine is published separately as polygen-engine:
pip install polygen-engine
For local development from this repository:
python3.9 -m venv .venv
. .venv/bin/activate
pip install -e .
polygen-engine --version
The engine package exposes the design API, including PTGbldr, scarless_gg,
tas_guide_design, TAS_SYSTEMS and the TasA/TasH tigRNA helpers. The Flask
web app remains in polygen_scripts and is deployed with Docker/Gunicorn.
Cloud Run / container deployment
The Docker image runs PolyGEN with Gunicorn and listens on the PORT
environment variable, which is required by Cloud Run and similar container
hosts. Locally, Docker maps container port 8080 to host port 5000:
docker build -t polygen:test .
docker run --rm -p 5000:8080 -e PORT=8080 polygen:test
For a constrained Cloud Run demo deployment, run from the repository root and
verify that Dockerfile is listed before deploying. If Cloud Run says it is
"Building using Buildpacks", rerun the command from the directory containing
Dockerfile.
ls Dockerfile
gcloud run deploy polygen-v1-demo --source . --region=europe-west3 --allow-unauthenticated --min-instances=0 --max-instances=1 --concurrency=10 --cpu=1 --memory=512Mi --timeout=120 --set-env-vars SECRET_KEY=<new_secret>
For a public test service, set a real SECRET_KEY environment variable in the
Cloud Run service settings. The app keeps generated results in per-browser
server-side session state, so run a single Gunicorn worker and keep Cloud Run
to one instance for the first demo if downloads must survive between requests.
Setup Windows
Activate Windows Subsystem for Linux (WSL2) by
- open Control Panel
- open Turn Windows features on or off
- check features Virtual Machine Platform and Windows Subsystem for Linux
- confirm with OK
- restart the pc
- download the Update Setup from here
- Execute the wsl_update_x64.msi file
- open a command prompt
- run wsl --set-default-version 2
In the BIOS, enable Visualization tools. Next, install Docker and clone this git repository. With active docker, open a command prompt and navigate into the cloned repository using dir <location>. Execute docker-compose up.
In the browser, open localhost:5000
Common problems
If the install fails due to issues with docker-snap:
- sudo rm -rf /etc/docker
- sudo snap refresh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polygen_engine-0.2.0.tar.gz.
File metadata
- Download URL: polygen_engine-0.2.0.tar.gz
- Upload date:
- Size: 42.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
613a8841cf47f221d805177987a0ad3661efc5c6e87a443d3316b04548332c89
|
|
| MD5 |
2ac0e731cf4e566b1224a757216ca795
|
|
| BLAKE2b-256 |
067d6a2da253cd74cc0bacf3f7bc8f213ff525f17d42175b4b7a5f100006423a
|
File details
Details for the file polygen_engine-0.2.0-py3-none-any.whl.
File metadata
- Download URL: polygen_engine-0.2.0-py3-none-any.whl
- Upload date:
- Size: 49.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47511668a595d20ae28c6605eda6d78770385e383bbb368d5f6f9573a2dc564e
|
|
| MD5 |
a90cdcc2052206c91567d89f2a097849
|
|
| BLAKE2b-256 |
c72050201cf67debb3b318674ac06627be4893136db338823e640c8a462fa1ac
|