Skip to main content

Design orthogonal DNA sequences

Project description

seqwalk

seqwalk is a package for designing orthogonal DNA sequence libraries. It can efficiently generate libraries of sequences that satisfy sequence symmetry minimization constraints (i.e. minimizing longest common substrings). seqwalk additionally includes off-the-shelf orthogonal sequence libraries, as well as some tools for analyzing orthogonal sequence libraries. A code-free, interactive version of seqwalk can be found here.

For more details, see the paper.

Installation

$ pip install seqwalk

Usage

Designing a set of barcodes with minimal sequence symmetry

If you want a certain number of barcodes with maximum orthogonality, you can use the max_orthogonality function from the design module. You must specify the length of desired sequences (L) and the number of desired sequences (N). Optionally, specify the prevention of reverse complementary sequences, GC content limits, allowable alphabet, and specific prevented patterns. By default, reverse complementary sequences are allowed, there are no GC content constraints, a 3 letter (A/C/T, no G) code is used and any 4N sequence is prevented.

For example, if you want 100 barcodes with length 25, with prevented reverse complements, and a 4 letter alphabet, and between 10 and 15 G/C bases, you can use the following code:

from seqwalk import design

library = design.max_orthogonality(100, 25, alphabet="ACGT", RCfree=True, GClims=(10, 15))

This will generate a library of at least the specified size, with the strongest possible sequence symmetry constraint.

Designing a set of orthogonal barcodes with maximum size

If you have an orthogonality constraint in mind, you can use the max_size function from the design module to generate a maximally sized library with the given sequence symmetry minimization k values. That is, the shortest k for which no substring of length k appears twice.

If you want sequences that satisfy SSM for k=12, and you want barcodes of length 25, without considering reverse complementarity, and using a 4 letter alphabet, with no GC constraints, you can use the following code:

from seqwalk import design

library = design.max_size(25, 12, alphabet="ACGT")

Importing "off-the-shelf" experimentally characterized libraries

The io module provides the ability to import libraries that have been previously experimentally characterized, using code of the following format.

from seqwalk import io

PERprimers = io.load_library("kishi2018")

We provide the following libraries, accessible with the identifier tag.

identifier # of seqs seq length original use case ref
kishi2018 50 9nt PER primers Kishi et al, 2018

If you have an orthogonal library you would like to add, please submit a PR!

Quality control using pairwise comparisons

Once you have a library in the form of a list of sequences, you can use the analysis module to perform additional quality control. For example, we provide a function to compute pairwise Hamming distances.

from seqwalk import analysis

h_crosstalk = analysis.hamming_matrix(seqs)

Since sequence symmetry minimization does not explicitly guarantee low off-target hybridization strength, a simple function for using NUPACK to identify "bad" sequences is included in the analysis.py file. However, it is commented out to avoid the NUPACK dependency in the package (problematic due to NUPACK licensing).

License

seqwalk is licensed under the terms of the MIT license.

Credits

seqwalk was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqwalk-0.3.2.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

seqwalk-0.3.2-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file seqwalk-0.3.2.tar.gz.

File metadata

  • Download URL: seqwalk-0.3.2.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.12 Darwin/23.5.0

File hashes

Hashes for seqwalk-0.3.2.tar.gz
Algorithm Hash digest
SHA256 cd415203f77f87258c8d0d80b60de7fc3f487f218bcd49cb5c40cbc506c98d72
MD5 dfc98a7d8759f0be27cf4d912cf04843
BLAKE2b-256 c5366b5c9b5c9b0152b30c94d3938e1a9f4f0a143017de349228293ed3b56161

See more details on using hashes here.

File details

Details for the file seqwalk-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: seqwalk-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.12 Darwin/23.5.0

File hashes

Hashes for seqwalk-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 07ad19250d4b32372ebc9c923c8afe3c806c8559e9794b4498e93a68e72c7358
MD5 fd9c56ded648cc2c421fc1850627e3e7
BLAKE2b-256 ba9a85b785ccfda323a341f8632cf8adf1dad54db63c453ef04b9d9621f11fd4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page