Generating sets of random DNA sequences optimized for use in high-throughput sequencing.
Project description
🔴🟢🔵⚫️ monte barcode
Generating sets of random DNA sequences optimized for use in high-throughput sequencing.
Installation
The easy way
Install the pre-compiled version from PyPI:
pip install monte-barcode
From source
Clone the repository, then cd
into it. Then run:
pip install -e .
Usage
monte barcode provides command line utilities to generate completely random or peptide-encoding barcodes conforming to custom contraints, like minimum edit distance among the set, GC content, and color balance for Illumina chemistry.
Barcode sets and individual barcodes are deterministically given an adjective-noun mnemonic (generated by nemony) for easy reference.
Each utility gives a lot of commentary to stderr
, but the barcodes go to
stdout
by default so they can be piped.
Command line
Generate random barcodes of a particular length.
$ monte barcode --length 6 -n 5
Generating barcodes with the following parameters:
...
Requested barcodes with length 6, and 4096 possible combinations.
> Tried 16 barcodes, rejected 11, accepted 5; rejection rate is 0.69
Rejection reasons:
gc_content: 0.62
homopolymer: 0.25
restriction_sites: 0.06
mighty_orchid:l6-n5-d3:x0:fresh_prague TGAGGT
mighty_orchid:l6-n5-d3:x1:flexible_forest AGTTCG
mighty_orchid:l6-n5-d3:x2:fun_baby GACATC
mighty_orchid:l6-n5-d3:x3:woolly_podium TGTCCT
mighty_orchid:l6-n5-d3:x4:strong_factor GAACCA
Wrote barcode set called mighty_orchid, with minimum Hamming distance 3 and maximum Hamming distance 6.
Or encoding a peptide.
$ monte barcode --amino-acid HELP -n 5
Generating barcodes with the following parameters:
...
Using amino acid sequence HELP with length 12 and 96 possible combinations.
> Tried 7 barcodes, rejected 2, accepted 5; rejection rate is 0.29
Rejection reasons:
gc_content: 0.14
homopolymer: 0.14
basic_hamlet:l12-n5-d2:x0:volatile_lesson CATGAGCTGCCT
basic_hamlet:l12-n5-d2:x1:pricy_scuba CACGAACTGCCT
basic_hamlet:l12-n5-d2:x2:good_race CACGAATTGCCA
basic_hamlet:l12-n5-d2:x3:demanding_bruno CATGAATTACCG
basic_hamlet:l12-n5-d2:x4:pawky_plaster CATGAGTTACCT
Wrote barcode set called basic_hamlet, with minimum Hamming distance 2 and maximum Hamming distance 4.
Insist on a minimum edit distance.
$ monte barcode --length 6 -n 10 -d 3
Generating barcodes with the following parameters:
...
Requested barcodes with length 6, and 4096 possible combinations.
> Tried 39 barcodes, rejected 29, accepted 10; rejection rate is 0.74
Rejection reasons:
gc_content: 0.67
distance: 0.13
homopolymer: 0.05
scenic_blast:l6-n10-d3:x0:acidic_turtle TGTGTG
scenic_blast:l6-n10-d3:x1:rowdy_grace ACCATC
scenic_blast:l6-n10-d3:x2:rich_export CGTTAG
scenic_blast:l6-n10-d3:x3:unique_break GGAATC
scenic_blast:l6-n10-d3:x4:careful_fuji GCAAGT
scenic_blast:l6-n10-d3:x5:whimsical_derby CGGAAT
scenic_blast:l6-n10-d3:x6:pricy_aloha TTCTCC
scenic_blast:l6-n10-d3:x7:zestful_ricardo AGAGCT
scenic_blast:l6-n10-d3:x8:terse_cobra AAGTCC
scenic_blast:l6-n10-d3:x9:zany_chamber TTACGG
Wrote barcode set called scenic_blast, with minimum Hamming distance 3 and maximum Hamming distance 6.
Or insist on ideal color balance for Illumina chemistry.
$ monte barcode --length 6 -n 10 -d 3 --color
Generating barcodes with the following parameters:
...
Requested barcodes with length 6, and 4096 possible combinations.
> Tried 151 barcodes, rejected 141, accepted 10; rejection rate is 0.93
Rejection reasons:
gc_content: 0.65
homopolymer: 0.21
color_balance: 0.72
distance: 0.17
palindrome: 0.02
bright_cliff:l6-n10-d3:x0:ultimate_spray AGCGAT
bright_cliff:l6-n10-d3:x1:bulky_drama AGTTGC
bright_cliff:l6-n10-d3:x2:tropical_pinball TTCACG
bright_cliff:l6-n10-d3:x3:unique_info GTACGT
bright_cliff:l6-n10-d3:x4:chilly_sahara CCTCTT
bright_cliff:l6-n10-d3:x5:novel_wisdom GACCTA
bright_cliff:l6-n10-d3:x6:oceanic_plume AGACTG
bright_cliff:l6-n10-d3:x7:wanted_jessica TCTCGA
bright_cliff:l6-n10-d3:x8:incise_radical TCTGTC
bright_cliff:l6-n10-d3:x9:rebel_option TAGGAC
Wrote barcode set called bright_cliff, with minimum Hamming distance 3 and maximum Hamming distance 6.
You can also check and filter previously generated sets.
$ monte barcode --length 6 -n 10 -d 3 2> /dev/null | monte check --color --field 2
Checking barcodes with the following parameters:
...
> Tried 10 barcodes, rejected 6, accepted 4; rejection rate is 0.60
Rejection reasons:
color_balance: 0.60
Could only generate 4 barcodes, but 10 were requested. You might need to try different settings.
thorough_adam:l6-n4-d4:x0:savvy_ruby TCCTGA
thorough_adam:l6-n4-d4:x1:elfin_rufus AGCTTC
thorough_adam:l6-n4-d4:x2:damaged_atlas AAGGCA
thorough_adam:l6-n4-d4:x3:faded_elite GCACTA
Wrote barcode set called thorough_adam, with minimum Hamming distance 4 and maximum Hamming distance 5.
And try to sort by ideal color balance for Illumina chemistries (if you want to use subsets).
$ monte barcode --length 6 -n 15 -d 1 2> /dev/null | monte sort --field 2
Sorting barcodes with the following parameters:
...
round_mono:l6-n15-d2:x0:shady_soda AGTCCT
round_mono:l6-n15-d2:x1:vogue_cosmos TGAGTC
round_mono:l6-n15-d2:x2:upbeat_baboon AACGGA
round_mono:l6-n15-d2:x3:sweet_octavia CATCCT
round_mono:l6-n15-d2:x4:clean_copper CCTTAG
round_mono:l6-n15-d2:x5:fabulous_partner TCCTAG
round_mono:l6-n15-d2:x6:defiant_charlie GAACGA
round_mono:l6-n15-d2:x7:misty_miguel GCATGA
round_mono:l6-n15-d2:x8:urgent_rodeo ACTGTG
round_mono:l6-n15-d2:x9:injured_news GAAGGT
round_mono:l6-n15-d2:x10:clear_public TGAGAG
round_mono:l6-n15-d2:x11:seemly_satire GATTGG
round_mono:l6-n15-d2:x12:exemplary_robert TTCAGC
round_mono:l6-n15-d2:x13:nuclear_choice CATCAC
round_mono:l6-n15-d2:x14:discreet_shake GCATTG
Wrote barcode set called round_mono, with minimum Hamming distance 2 and maximum Hamming distance 6.
Details
usage: monte barcode [-h] --number NUMBER [--length LENGTH] [--rejection-rate REJECTION_RATE]
[--amino-acid AMINO_ACID] [--distance DISTANCE] [--homopolymer HOMOPOLYMER]
[--levenshtein] [--color] [--gc_min GC_MIN] [--gc_max GC_MAX]
[--output OUTPUT]
options:
-h, --help show this help message and exit
--number NUMBER, -n NUMBER
Number of barcodes to generate. Required.
--length LENGTH, -l LENGTH
Barcode length. Default: 12
--rejection-rate REJECTION_RATE, -r REJECTION_RATE
Rate of rejection before aborting. Default: 0.85
--amino-acid AMINO_ACID, -a AMINO_ACID
Generate barcodes encoding this amino acid sequence. Default: do not use.
--distance DISTANCE, -d DISTANCE
Minimum distance between barcodes. Default: 1
--homopolymer HOMOPOLYMER, -p HOMOPOLYMER
Maximum homopolymer length. Default: 3
--levenshtein, -e Use Levenshtein distance. Otherwise using Hamming diatnce. Default: False
--color, -c Check optimal Illumina color balance. Default: False
--gc_min GC_MIN, -g GC_MIN
Minimum GC content. Default: 0.4
--gc_max GC_MAX, -j GC_MAX
Maximum GC content. Default: 0.6
--output OUTPUT, -o OUTPUT
Output file. Default: STDOUT
usage: monte check [-h] [--distance DISTANCE] [--homopolymer HOMOPOLYMER] [--levenshtein]
[--color] [--gc_min GC_MIN] [--gc_max GC_MAX] [--field FIELD] [--output OUTPUT]
[input]
positional arguments:
input Input file. Default: STDIN.
options:
-h, --help show this help message and exit
--distance DISTANCE, -d DISTANCE
Minimum distance between barcodes. Default: 1
--homopolymer HOMOPOLYMER, -p HOMOPOLYMER
Maximum homopolymer length. Default: 3
--levenshtein, -e Use Levenshtein distance. Otherwise using Hamming diatnce. Default: False
--color, -c Check optimal Illumina color balance. Default: False
--gc_min GC_MIN, -g GC_MIN
Minimum GC content. Default: 0.4
--gc_max GC_MAX, -j GC_MAX
Maximum GC content. Default: 0.6
--field FIELD, -f FIELD
Column number for barcode sequences. Default: 1
--output OUTPUT, -o OUTPUT
Output file. Default: STDOUT
usage: monte sort [-h] [--field FIELD] [--output OUTPUT] [input]
positional arguments:
input Input file. Default: STDIN.
options:
-h, --help show this help message and exit
--field FIELD, -f FIELD
Column number for barcode sequences. Default: 1
--output OUTPUT, -o OUTPUT
Output file. Default: STDOUT
Python API
monte-barcode can be imported into Python to generate and check barcodes in your own programs.
import montebarcode as mb
Generate random DNA sequences.
>>> for bc in mb.infinite_barcodes(length=20, check_used=False):
... print(bc)
... break
...
ATCAGTCGTCACACTAGTTA
Or peptide-encoding sequences.
>>> list(mb.codon_barcodes("L", ordered=True))
['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG']
You can check the minimum and maximum distances among a set.
>>> mb.minmax_distance(['AAA', 'AAA'])
(0, 0)
>>> mb.minmax_distance(['AAA', 'TCG', 'AAT'])
(1, 3)
>>> mb.minmax_distance(['AAA', 'TCG', 'AAAT'], use_levenshtein=False)
(0, 3)
>>> mb.minmax_distance(['AAA', 'TCG', 'AAAT'])
(1, 4)
And get usage of each base at each position.
>>> mb.base_usage(['AAA', 'TTT', 'GCT', 'CCA'])[0]['A']
0.25
>>> mb.base_usage(['AAA', 'TTT', 'GCT', 'CCA'])[1]['G']
0
>>> mb.base_usage(['AAA', 'TTT', 'GCT', 'CCA'])[2]['A']
0.5
You can see whether adding a barcode to a set would throw off the Illumina color balance.
>>> mb.IlluminaColorBalance()('AAAT', ['TCGC', 'ACAG', 'TGGC', 'ATCG'])
True
>>> mb.IlluminaColorBalance()('AAAT', ['TCGC', 'CCAG', 'TGGC', 'ATCG'])
False
And run a suite of checks against a set of barcodes (or infinite stream), retrieving failure reasons, number of tries, and conforming barcode set.
>>> checks = [mb.Homopolymer(), mb.Palindrome()]
>>> mb.make_checks(['AAAAT', 'CCCGGG', 'ATCGCG', 'GCCGAT'], n=4, checks=checks, quiet=True)
(Counter({'homopolymer': 1, 'palindrome': 1}), 4, ['ATCGCG', 'GCCGAT'])
>>> mb.make_checks(['AAAAT', 'CCCGGG', 'ATCGCG', 'GCCGAT'], n=1, checks=checks, quiet=True)
(Counter({'homopolymer': 1, 'palindrome': 1}), 3, ['ATCGCG'])
Documentation
Full API documnetation is here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file monte-barcode-0.0.1.tar.gz
.
File metadata
- Download URL: monte-barcode-0.0.1.tar.gz
- Upload date:
- Size: 13.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 854fe74e102748efe815e2e1f1bf8715fc269e153011c952fdd4a83589694207 |
|
MD5 | a249f4fbf0afbb6ceb750d1392396a97 |
|
BLAKE2b-256 | bcfdb9203ac1e1077fb263c47dc3cf2a85b1849868f3532ca82c478be2d2204b |
Provenance
File details
Details for the file monte_barcode-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: monte_barcode-0.0.1-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1d214dcb1124fb8c3af677465d5cb5859a3b3e2d5f2c32cf41962fabbb01f4f |
|
MD5 | 033c44595a969b0db301c7f130d93b52 |
|
BLAKE2b-256 | 7b21c7fa216dfb025a56e4dcb65d31abe3f81ddb9cc53dd226d723c439a81b91 |