Skip to main content

Basic utilities for working with nucleotide sequence strings.

Project description

🧬 streq

GitHub Workflow Status PyPI - Python Version PyPI

Python utilities for working with nucleotide sequence strings.

Installation

The easy way

Install the pre-compiled version from PyPI:

pip install streq

From source

Clone the repository, then cd into it. Then run:

pip install -e .

Usage

Streq provides various utility functions in Python for working with nucleotide sequences. Sequences can be upper or lower case, and case will be preserved through transformations.

Transformations

Reverse complement.

>>> import streq as sq
>>>
>>> sq.reverse_complement('ATCG')
'CGAT'

Convert between RNA and DNA alphabets.

>>> sq.to_rna('ATCG')
'AUCG'
>>> sq.to_dna('AUCG')
'ATCG'

Slice circular sequences such as plasmids or bacterial genomes.

>>> sq.Circular('ATCG')[-1:3]
'GATC'
>>> sq.reverse_complement(sq.Circular('ATCG'))[-1:3]
'CGAT'

Cases are preserved throughout the transformations.

>>> sq.reverse_complement(sq.Circular('ATCg'))
'cGAT'

Calculations

Get GC and pyrimidine content.

>>> sq.gc_content('AGGG')
0.75
>>> sq.pyrimidine_content('AUGGG')
0.2

Get autocorrelation (rough indicator for secondary structure).

>>> sq.correlation('AACC')
0.0
>>> sq.correlation('AAATTT')
2.3
>>> sq.correlation('AAATTCT')
1.3047619047619046
>>> sq.correlation('AAACTTT')
1.9238095238095236

Wobble base-pairing can be taken into account.

>>> correlation('GGGTTT')
0.0
>>> correlation('GGGTTT', wobble=True)
2.3
>>> correlation('GGGUUU', wobble=True)
2.3

Provide a second sequence to get correlation between sequences.

>>> sq.correlation('AAA', 'TTT')
0.0
>>> sq.correlation('AAA', 'AAA')
3.0

Distances

Calculate Levenshtein (insert, delete, mutate) distance.

>>> sq.levenshtein('AAATTT', 'AAATTT')
0
>>> sq.levenshtein('AAATTT', 'ACTTT')
2
>>> sq.levenshtein('AAAG', 'TCGA')
4

Calculate Hamming (mismatch) distance.

>>> sq.hamming('AAA', 'ATA')
1
>>> sq.hamming('AAA', 'ATT')
2
>>> sq.hamming('AAA', 'TTT')
3

Search

Search sequences using IUPAC symbols and iterate through the results.

>>> for (start, end), match in sq.find_iupac('ARY', 'AATAGCAGTGTGAAC'):
...     print(f"Found ARY at {start}:{end}: {match}")
... 
Found ARY at 0:3: AAT
Found ARY at 3:6: AGC
Found ARY at 6:9: AGT
Found ARY at 12:15: AAC

Find common Type IIS restriction sites:

>>> sq.which_re_sites('AAAGAAG')
()
>>> sq.which_re_sites('AAAGAAGAC')
('BbsI',)
>>> sq.which_re_sites('AAAGAAGACACCTGC')
('BbsI', 'PaqCI')

Documentation

Check the API here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streq-0.0.2.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

streq-0.0.2-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file streq-0.0.2.tar.gz.

File metadata

  • Download URL: streq-0.0.2.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for streq-0.0.2.tar.gz
Algorithm Hash digest
SHA256 01fa54a8d3cfe95a493ea6800a9b98025f19611fb4fd2576e83f5ec7de525133
MD5 ad02a75976068a8ffd628f1a65dafbf6
BLAKE2b-256 5b1c029c36d432f7cba09ef72d29ab039092579fb9c179a936a80bb2228dc6d6

See more details on using hashes here.

File details

Details for the file streq-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: streq-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for streq-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 025b21858dd3a05a58cd485bec440b6412c52c7cadca5fe4bfd830185f8f132b
MD5 aca0dfa99b1cd9ee52b6b946eb5fb25a
BLAKE2b-256 471e91239bcf5732649f5abea37b92062c774e9b65fa17eb40377e62bbed8298

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page