A package to automatically generate oligonucleotide library pool sequences

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

oligocompose

Jamie Heather | CCR @ MGH | 2024

The aim of this repo is to facilitate the production of large oligo pool library sequences, particularly potential MHC peptide epitope libraries, from shorter substrings.

Information of sequences to be generated should be supplied in a tab-separated file.
- The first column must contain a name or identifier (which needn't be unique).
- All additional columns contain the sequences to combine in the output oligos, with optional additional comma-delimited arguments providing additional information about each sequence in square brackets after the sequence.
- Different lines may have different numbers of columns; the input file needn't be a 'proper' whole-grid tab-separated file.
- All lines must be left aligned, i.e. with no empty columns separating sequences.
- Comment lines that will not contribute to the output can also be included by beginning the line with a # character.
- This file is referred to via the -in or --in_file flags, e.g.:

oligocompose -in my_input_file.tsv

Sequences can be provided via one of several other ways, in any combination within a line:
- Explicitly as a simple DNA sequence, or as a amino acid sequence, which will be reverse-translated using the most commonly used codon to encode that residue in humans.
  - While the script attempts to automatically infer the sequence type (defaulting to nucleotides), it's preferable to explicitly label each sequence type in the square bracket parameter fields with [nt] or [aa].
  - E.g. TTT aaa[nt] CCC produces 'TTTAAACCC', while TTT aaa[aa] CCC produces 'TTTGCCGCCGCCAAA', with the codons for three alanines encoded between the TTT and CCC sequences.
Individual sequences that are to be used frequently (e.g. conserved adapter sequences) can be included in a reference 'fixed sequence' file, which can be specified using the -f or --fixed_file flag.
- This should fit the criteria of the input oligo tsv file, except it should be only two columns, name and reference sequence.
- Those sequences can then be referred to in the input oligo file but using those names, bracketed with ampersands (e.g. &name_of_sequence&).

oligocompose -in my_input_file.tsv -f fixed_sequences.tsv

For variable sequences (e.g. different sequences from a pool), instead of explicitly providing each individual a DNA string, users can provide a path to a file containing a list of sequences can be used, bracketed with dollar signs (e.g. $path_to/some_file.tsv$ )
- This file must be formatted the same as the fixed sequence file, i.e. a two column tsv of name\tsequence.
- In this situation an output sequence will be produced for every sequence in the additional file. E.g. when some_file contains the sequences AAA, CCC, GGG, the line TT $some_file$ TTT would produce TTAAATTT, TTCCCTTT, and TTGGGTTT.
- Note that this file cannot contain links to other files, only explicit sequences or references to them.
Degenerate DNA bases (using IUPAC codes) can also be used.
- E.g. TT NNN AA would produce 64 oligos, with each combination of all four nucleotides at each of the three degenerate positions.
Instead of just inserting whole sequences (provided either directly or via a file), all sub-sequences of a specified length can be individually combined.
- This is particularly useful for generating oligos tiled along the length of a gene or protein sequence.
- After giving a sequence, identifier, or path, integer values of length of subsequence to take can be specified in square brackets.
- Note that this behaviour defaults to a sliding window of 1 for amino acids, and 3 for nucleotides, assuming that its been provided with an in-frame ORF and that codons are to be maintained. This behaviour can be altered by providing any integer to the script via the --sl / --step_length flag.
- E.g. ACDEFGHIKLMNPQRSTVWY[10,aa] would produce nucleotides encoding every possible 10-mer across this theoretical protein sequence (ACDEFGHIKL, CDEFGHIKLM, DEFGHIKLMN, ... MNPQRSTVWY)
  - Running the same code with -sl 5 included in the script would instead produce 10-mers every 5 residues apart (just ACDEFGHIKL, GHIKLMNPQR, and MNPQRSTVWY).
- Alternatively if running on nucleotide sequences, coding sequences can be maintained by providing it with sequence lengths divisible by three
  - E.g. atgaccatgattacgccaagcttgcatgcctgcagg[30,nt] (encoding the 12-mer MTMITPSLHACR peptide) would produce oligos encoding every tiled 10-mer peptide.
Ordinarily the script produces upper case output (e.g. ttt a[aa] ccc = TTTGCCCCC), but some options have been provided to more easily show the joins of the different input sequence regions.
- The -cf / --case_flip flag makes sequential sequence fields alternate their case (e.g. TTTgccCCC)
- The --first_case flag can be used in conjuction with this to specifically call the starting case (e.g. -cf -fc lower = tttGCCccc)
There are also options to 3' pad nucleotides to ensure a minimum length (due to the requirements for certain oligo synthesis platforms)
- This can be specified with the -l / --oligo_len flag, which will add random 3' nucleotides to the end of any oligos shorter than this.
- Alternatively if further post-processing is to take place, additionally providing the -n / --n_pad flag will pad 3's with 'N' characters instead of random nucleotides.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.2.1

Nov 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oligocompose-0.2.1.tar.gz (10.6 kB view details)

Uploaded Nov 26, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oligocompose-0.2.1-py3-none-any.whl (11.7 kB view details)

Uploaded Nov 26, 2024 Python 3

File details

Details for the file oligocompose-0.2.1.tar.gz.

File metadata

Download URL: oligocompose-0.2.1.tar.gz
Upload date: Nov 26, 2024
Size: 10.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for oligocompose-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`fc738352ea8fa52a6a5c24b5a07f1a1a909ef896890acda2793e1bb9b8147161`
MD5	`43d69ae0425b426fb84a57be491610ff`
BLAKE2b-256	`c7312a3d4bb2ab8c6df78e41d60e837a40c56a1493f8189e3ffb24886b659489`

See more details on using hashes here.

File details

Details for the file oligocompose-0.2.1-py3-none-any.whl.

File metadata

Download URL: oligocompose-0.2.1-py3-none-any.whl
Upload date: Nov 26, 2024
Size: 11.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for oligocompose-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a9bfbc2f9af928e7a96ab7a1ca7560252f3e3305ae5f2bc6edd3200862131b73`
MD5	`ecdc746580ee629f6109cf9113a83503`
BLAKE2b-256	`0dc67dac0d94a1097889ab54ca4c41ed2169d4afdafa3e52da25a2bec6314069`

See more details on using hashes here.

oligocompose 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

oligocompose

Jamie Heather | CCR @ MGH | 2024

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes