Oligopool Calculator - Automated design and analysis of oligopool libraries
Project description
Version: 2024.11.03
Installation • Getting Started • License • Citation
Oligopool Calculator
is a suite of algorithms for automated design and analysis of oligopool libraries.
It enables the scalable design of universal primer sets, error-correctable barcodes, the splitting of long constructs into multiple oligos, and the rapid packing and counting of barcoded reads -- all on a regular 8-core desktop computer.
We have used Oligopool Calculator
in multiple projects to build libraries of tens of thousands of promoters, ribozymes, and mRNA stability elements, illustrating the use of a flexible grammar to add multiple barcodes, cut sites, avoid excluded sequences, and optimize experimental constraints. These libraries were later characterized using highly efficient barcode counting provided by Oligopool Calculator
.
Oligopool Calculator
facilitates the creative design and application of massively parallel reporter assays by automating and simplifying the whole process. It has been benchmarked on simulated libraries containing millions of defined variants and to analyze billions of reads.
Design and analysis of oligopool variants using Oligopool Calculator
. (a) In Design Mode
, Oligopool Calculator
can be used to generate optimized barcode
s, primer
s, spacer
s, motif
s and split
longer oligos into shorter pad
ded fragments for downstream synthesis and assembly. (b) Once the library is assembled and cloned, barcoded amplicon sequencing data can be processed via Analysis Mode
for characterization. Analysis Mode
proceeds by first index
ing one or more sets of barcodes, pack
ing the reads, and then producing count matrices either using acount
(association counting) or xcount
(combinatorial counting).
Installation
Oligopool Calculator
is a Python3.10+
-exclusive library.
On Linux
, MacOS
and Windows Subsystem for Linux
you can install Oligopool Calculator
from PyPI, where it is published as the oligopool
package
$ pip install oligopool
or install it directly from GitHub.
$ pip install git+https://github.com/ayaanhossain/oligopool.git
Both approaches should install all dependencies automatically.
Note This GitHub version will always be updated with all recent fixes. The PyPI version should be more stable.
If you are on Windows
or simply prefer to, Oligopool Calculator
can also be used via docker
(see our notes).
Verifying Installation
Successful installation will look like this.
$ python
Python 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import oligopool as op
>>> op.__version__
'2024.10.24'
>>>
Getting Started
Oligopool Calculator
is carefully designed, easy to use, and stupid fast.
You can import the library and use its various functions either in a script or interactively inside a jupyter
environment. Use help(...)
to read the docs as necessary and follow along.
There are examples of a design parser and an analysis pipleine inside the examples
directory.
A notebook demonstrating Oligopool Calculator
in action is provided there as well.
$ python
Python 3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 18:08:52) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import oligopool as op
>>> help(op)
...
oligopool v2024.10.24
by ah
Automated design and analysis of oligopool libraries.
The various modules in Oligopool Calculator can be used
interactively in a jupyter notebook, or be used to define
scripts for design and analysis pipelines on the cloud.
Oligopool Calculator offers two modes of operation
- Design Mode for designing oligopool libraries, and
- Analysis Mode for analyzing oligopool datasets.
Design Mode workflow
1. Initialize a pandas DataFrame with core library elements.
a. The DataFrame must contain a unique 'ID' column serving as primary key.
b. All other columns in the DataFrame must be DNA sequences.
2. Define any optional background sequences via the background module.
3. Add necessary oligopool elements with constraints via element modules.
4. Optionally, split long oligos and pad them via assembly modules.
5. Perform additional maneuvers and finalize library via auxiliary modules.
Background module available
- background
Element modules available
- primer
- barcode
- motif
- spacer
Assembly modules available
- split
- pad
Auxiliary modules available
- merge
- revcomp
- lenstat
- final
Design Mode example sketch
>>> import pandas as pd
>>> import oligopool as op
>>>
>>> # Read initial library
>>> init_df = pd.read_csv('initial_library.csv')
>>>
>>> # Add oligo elements one by one
>>> primer_df, stats = op.primer(input_data=init_df, ...)
>>> barcode_df, stats = op.barcode(input_data=primer_df, ...)
...
>>> # Check length statistics as needed
>>> length_stats = op.lenstat(input_data=further_along_df)
...
>>>
>>> # Split and pad longer oligos if needed
>>> split_df, stats = op.split(input_data=even_further_along_df, ...)
>>> first_pad_df, stats = op.pad(input_data=split_df, ...)
>>> second_pad_df, stats = op.pad(input_data=split_df, ...)
...
>>>
>>> # Finalize the library
>>> final_df, stats = op.final(input_data=ready_to_go_df, ...)
...
Analysis Mode workflow
1. Index one or more CSVs containing the barcode information.
2. Pack all NGS FastQ files, optionally merging them if required.
3. Use acount for association counting of variants and barcodes.
4. If multiple barcode combinations are to be counted use xcount.
5. Combine count DataFrames and perform stats and ML as necessary.
Indexing module available
- index
Packing module available
- pack
Counting modules available
- acount
- xcount
Analysis Mode example sketch
>>> import pandas as pd
>>> import oligopool as op
>>>
>>> # Read annotated library
>>> bc1_df = pd.read_csv('barcode_1.csv')
>>> bc2_df = pd.read_csv('barcode_2.csv')
>>> av1_df = pd.read_csv('associate_1.csv')
...
>>>
>>> # Index barcodes and any associates
>>> bc1_index_stats = op.index(barcode_data=bc1_df, barcode_column='BC1', ...)
>>> bc2_index_stats = op.index(barcode_data=bc2_df, barcode_column='BC2', ...)
...
>>>
>>> # Pack experiment FastQ files
>>> sam1_pack_stats = op.pack(r1_file='sample_1_R1.fq.gz', ...)
>>> sam2_pack_stats = op.pack(r1_file='sample_2_R1.fq.gz', ...)
...
>>>
>>> # Compute and write barcode combination count matrix
>>> xcount_df, stats = op.xcount(index_files=['bc1_index', 'bc2_index'],
... pack_file='sample_1_pack', ...)
...
You can learn more about each module using help.
>>> import oligopool as op
>>> help(op)
>>> help(op.primer)
>>> help(op.barcode)
...
>>> help(op.xcount)
For advanced uses, the following classes are also available.
- vectorDB
- Scry
...
License
Oligpool Calculator
(c) 2024 Ayaan Hossain.
Oligpool Calculator
is an open-source software under GPL-3.0 License.
See LICENSE file for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file oligopool-2024.11.3.tar.gz
.
File metadata
- Download URL: oligopool-2024.11.3.tar.gz
- Upload date:
- Size: 147.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00569482d6527a1f5f28ac3d770cb2654e34978777f0127d65349cec3f28a20b |
|
MD5 | 9748901e72d589252c3b14dfc470dc5e |
|
BLAKE2b-256 | de2a6a8070da144efbc358f18b6d518101e97f43d425bdd6e9ed0b168ba50f8d |
File details
Details for the file oligopool-2024.11.3-py3-none-any.whl
.
File metadata
- Download URL: oligopool-2024.11.3-py3-none-any.whl
- Upload date:
- Size: 170.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d0c7ec06a6f801d60b7e2899b0801819365afc1d92fabb8923cde23f636406f |
|
MD5 | 2db3ed8b3bf39f025c9bdabdb742c75e |
|
BLAKE2b-256 | d9073500f47120d442e369841f41afa3b9db04cf183f81c2a486c9451f0c5de7 |