Skip to main content

Oligopool Calculator - Automated design and analysis of oligopool libraries

Project description

Oligopool Calculator

Version: 2024.11.03

InstallationGetting StartedLicenseCitation

Oligopool Calculator is a suite of algorithms for automated design and analysis of oligopool libraries.

It enables the scalable design of universal primer sets, error-correctable barcodes, the splitting of long constructs into multiple oligos, and the rapid packing and counting of barcoded reads -- all on a regular 8-core desktop computer.

We have used Oligopool Calculator in multiple projects to build libraries of tens of thousands of promoters, ribozymes, and mRNA stability elements, illustrating the use of a flexible grammar to add multiple barcodes, cut sites, avoid excluded sequences, and optimize experimental constraints. These libraries were later characterized using highly efficient barcode counting provided by Oligopool Calculator.

Oligopool Calculator facilitates the creative design and application of massively parallel reporter assays by automating and simplifying the whole process. It has been benchmarked on simulated libraries containing millions of defined variants and to analyze billions of reads.

Oligopool Calculator Workflow

Design and analysis of oligopool variants using Oligopool Calculator. (a) In Design Mode, Oligopool Calculator can be used to generate optimized barcodes, primers, spacers, motifs and split longer oligos into shorter padded fragments for downstream synthesis and assembly. (b) Once the library is assembled and cloned, barcoded amplicon sequencing data can be processed via Analysis Mode for characterization. Analysis Mode proceeds by first indexing one or more sets of barcodes, packing the reads, and then producing count matrices either using acount (association counting) or xcount (combinatorial counting).

Installation

Oligopool Calculator is a Python3.10+-exclusive library.

On Linux, MacOS and Windows Subsystem for Linux you can install Oligopool Calculator from PyPI, where it is published as the oligopool package

$ pip install oligopool

or install it directly from GitHub.

$ pip install git+https://github.com/ayaanhossain/oligopool.git

Both approaches should install all dependencies automatically.

Note This GitHub version will always be updated with all recent fixes. The PyPI version should be more stable.

If you are on Windows or simply prefer to, Oligopool Calculator can also be used via docker (see our notes).

Verifying Installation

Successful installation will look like this.

$ python
Python 3.10.9 | packaged by conda-forge | (main, Feb  2 2023, 20:20:04) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import oligopool as op
>>> op.__version__
'2024.10.24'
>>>

Getting Started

Oligopool Calculator is carefully designed, easy to use, and stupid fast.

You can import the library and use its various functions either in a script or interactively inside a jupyter environment. Use help(...) to read the docs as necessary and follow along.

There are examples of a design parser and an analysis pipleine inside the examples directory.

A notebook demonstrating Oligopool Calculator in action is provided there as well.

$ python
Python 3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 18:08:52) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import oligopool as op
>>> help(op)
...
    oligopool v2024.10.24
    by ah

    Automated design and analysis of oligopool libraries.

    The various modules in Oligopool Calculator can be used
    interactively in a jupyter notebook, or be used to define
    scripts for design and analysis pipelines on the cloud.

    Oligopool Calculator offers two modes of operation
        -   Design Mode for designing oligopool libraries, and
        - Analysis Mode for analyzing oligopool datasets.

    Design Mode workflow

        1. Initialize a pandas DataFrame with core library elements.
            a. The DataFrame must contain a unique 'ID' column serving as primary key.
            b. All other columns in the DataFrame must be DNA sequences.
        2. Define any optional background sequences via the background module.
        3. Add necessary oligopool elements with constraints via element modules.
        4. Optionally, split long oligos and pad them via assembly modules.
        5. Perform additional maneuvers and finalize library via auxiliary modules.

        Background module available
            - background

        Element modules available
            - primer
            - barcode
            - motif
            - spacer

        Assembly modules available
            - split
            - pad

        Auxiliary modules available
            - merge
            - revcomp
            - lenstat
            - final

        Design Mode example sketch

            >>> import pandas as pd
            >>> import oligopool as op
            >>>
            >>> # Read initial library
            >>> init_df = pd.read_csv('initial_library.csv')
            >>>
            >>> # Add oligo elements one by one
            >>> primer_df,  stats = op.primer(input_data=init_df, ...)
            >>> barcode_df, stats = op.barcode(input_data=primer_df, ...)
            ...
            >>> # Check length statistics as needed
            >>> length_stats = op.lenstat(input_data=further_along_df)
            ...
            >>>
            >>> # Split and pad longer oligos if needed
            >>> split_df, stats = op.split(input_data=even_further_along_df, ...)
            >>> first_pad_df,  stats = op.pad(input_data=split_df, ...)
            >>> second_pad_df, stats = op.pad(input_data=split_df, ...)
            ...
            >>>
            >>> # Finalize the library
            >>> final_df, stats = op.final(input_data=ready_to_go_df, ...)
            ...

    Analysis Mode workflow

        1. Index one or more CSVs containing the barcode information.
        2. Pack all NGS FastQ files, optionally merging them if required.
        3. Use acount for association counting of variants and barcodes.
        4. If multiple barcode combinations are to be counted use xcount.
        5. Combine count DataFrames and perform stats and ML as necessary.

        Indexing module available
            - index

        Packing module available
            - pack

        Counting modules available
            - acount
            - xcount

        Analysis Mode example sketch

            >>> import pandas as pd
            >>> import oligopool as op
            >>>
            >>> # Read annotated library
            >>> bc1_df = pd.read_csv('barcode_1.csv')
            >>> bc2_df = pd.read_csv('barcode_2.csv')
            >>> av1_df = pd.read_csv('associate_1.csv')
            ...
            >>>
            >>> # Index barcodes and any associates
            >>> bc1_index_stats = op.index(barcode_data=bc1_df, barcode_column='BC1', ...)
            >>> bc2_index_stats = op.index(barcode_data=bc2_df, barcode_column='BC2', ...)
            ...
            >>>
            >>> # Pack experiment FastQ files
            >>> sam1_pack_stats = op.pack(r1_file='sample_1_R1.fq.gz', ...)
            >>> sam2_pack_stats = op.pack(r1_file='sample_2_R1.fq.gz', ...)
            ...
            >>>
            >>> # Compute and write barcode combination count matrix
            >>> xcount_df, stats = op.xcount(index_files=['bc1_index', 'bc2_index'],
            ...                              pack_file='sample_1_pack', ...)
            ...

    You can learn more about each module using help.
        >>> import oligopool as op
        >>> help(op)
        >>> help(op.primer)
        >>> help(op.barcode)
        ...
        >>> help(op.xcount)

    For advanced uses, the following classes are also available.
        - vectorDB
        - Scry
...

License

Oligpool Calculator (c) 2024 Ayaan Hossain.

Oligpool Calculator is an open-source software under GPL-3.0 License.

See LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oligopool-2024.11.3.tar.gz (147.6 kB view details)

Uploaded Source

Built Distribution

oligopool-2024.11.3-py3-none-any.whl (170.9 kB view details)

Uploaded Python 3

File details

Details for the file oligopool-2024.11.3.tar.gz.

File metadata

  • Download URL: oligopool-2024.11.3.tar.gz
  • Upload date:
  • Size: 147.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for oligopool-2024.11.3.tar.gz
Algorithm Hash digest
SHA256 00569482d6527a1f5f28ac3d770cb2654e34978777f0127d65349cec3f28a20b
MD5 9748901e72d589252c3b14dfc470dc5e
BLAKE2b-256 de2a6a8070da144efbc358f18b6d518101e97f43d425bdd6e9ed0b168ba50f8d

See more details on using hashes here.

File details

Details for the file oligopool-2024.11.3-py3-none-any.whl.

File metadata

File hashes

Hashes for oligopool-2024.11.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6d0c7ec06a6f801d60b7e2899b0801819365afc1d92fabb8923cde23f636406f
MD5 2db3ed8b3bf39f025c9bdabdb742c75e
BLAKE2b-256 d9073500f47120d442e369841f41afa3b9db04cf183f81c2a486c9451f0c5de7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page