Basic utilities for working with nucleotide sequence strings.
Project description
🧬 streq
Python utilities for working with nucleotide sequence strings.
Installation
The easy way
Install the pre-compiled version from PyPI:
pip install streq
From source
Clone the repository, then cd
into it. Then run:
pip install -e .
Usage
Streq provides various utility functions in Python for working with nucleotide sequences. Sequences can be upper or lower case, and case will be preserved through transformations.
Transformations
Reverse complement.
>>> import streq as sq
>>>
>>> sq.reverse_complement('ATCG')
'CGAT'
Convert between RNA and DNA alphabets.
>>> sq.to_rna('ATCG')
'AUCG'
>>> sq.to_dna('AUCG')
'ATCG'
Slice circular sequences such as plasmids or bacterial genomes.
>>> sq.Circular('ATCG')[-1:3]
'GATC'
>>> sq.reverse_complement(sq.Circular('ATCG'))[-1:3]
'CGAT'
Cases are preserved throughout the transformations.
>>> sq.reverse_complement(sq.Circular('ATCg'))
'cGAT'
Calculations
Get GC and pyrimidine content.
>>> sq.gc_content('AGGG')
0.75
>>> sq.pyrimidine_content('AUGGG')
0.2
Get autocorrelation (rough indicator for secondary structure).
>>> sq.correlation('AACC')
0.0
>>> sq.correlation('AAATTT')
2.3
>>> sq.correlation('AAATTCT')
1.3047619047619046
>>> sq.correlation('AAACTTT')
1.9238095238095236
Wobble base-pairing can be taken into account.
>>> correlation('GGGTTT')
0.0
>>> correlation('GGGTTT', wobble=True)
2.3
>>> correlation('GGGUUU', wobble=True)
2.3
Provide a second sequence to get correlation between sequences.
>>> sq.correlation('AAA', 'TTT')
0.0
>>> sq.correlation('AAA', 'AAA')
3.0
Distances
Calculate Levenshtein (insert, delete, mutate) distance.
>>> sq.levenshtein('AAATTT', 'AAATTT')
0
>>> sq.levenshtein('AAATTT', 'ACTTT')
2
>>> sq.levenshtein('AAAG', 'TCGA')
4
Calculate Hamming (mismatch) distance.
>>> sq.hamming('AAA', 'ATA')
1
>>> sq.hamming('AAA', 'ATT')
2
>>> sq.hamming('AAA', 'TTT')
3
Search
Search sequences using IUPAC symbols and iterate through the results.
>>> for (start, end), match in sq.find_iupac('ARY', 'AATAGCAGTGTGAAC'):
... print(f"Found ARY at {start}:{end}: {match}")
...
Found ARY at 0:3: AAT
Found ARY at 3:6: AGC
Found ARY at 6:9: AGT
Found ARY at 12:15: AAC
Find common Type IIS restriction sites:
>>> sq.which_re_sites('AAAGAAG')
()
>>> sq.which_re_sites('AAAGAAGAC')
('BbsI',)
>>> sq.which_re_sites('AAAGAAGACACCTGC')
('BbsI', 'PaqCI')
Documentation
Check the API here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file streq-0.0.2.tar.gz
.
File metadata
- Download URL: streq-0.0.2.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01fa54a8d3cfe95a493ea6800a9b98025f19611fb4fd2576e83f5ec7de525133 |
|
MD5 | ad02a75976068a8ffd628f1a65dafbf6 |
|
BLAKE2b-256 | 5b1c029c36d432f7cba09ef72d29ab039092579fb9c179a936a80bb2228dc6d6 |
File details
Details for the file streq-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: streq-0.0.2-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 025b21858dd3a05a58cd485bec440b6412c52c7cadca5fe4bfd830185f8f132b |
|
MD5 | aca0dfa99b1cd9ee52b6b946eb5fb25a |
|
BLAKE2b-256 | 471e91239bcf5732649f5abea37b92062c774e9b65fa17eb40377e62bbed8298 |