Skip to main content

Basic utilities for working with nucleotide sequence strings.

Project description

🧬 streq

Python utilities for working with nucleotide sequence strings.

Installation

The easy way

Install the pre-compiled version from PyPI:

pip install streq

From source

Clone the repository, then cd into it. Then run:

pip install -e .

Usage

Streq provides various utility functions in Python for working with nucleotide sequences. Sequences can be upper or lower case, and case will be preserved through transformations.

Transformations

Reverse complement.

>>> import streq as sq
>>>
>>> sq.reverse_complement('ATCG')
'CGAT'

Convert between RNA and DNA alphabets.

>>> sq.to_rna('ATCG')
'AUCG'
>>> sq.to_dna('AUCG')
'ATCG'

Slice circular sequences such as plasmids or bacterial genomes.

>>> sq.Circular('ATCG')[-1:3]
'GATC'
>>> sq.reverse_complement(sq.Circular('ATCG'))[-1:3]
'CGAT'

Cases are preserved throughout the transformations.

>>> sq.reverse_complement(sq.Circular('ATCg'))
'cGAT'

Calculations

Get GC and pyrimidine content.

>>> sq.gc_content('AGGG')
0.75
>>> sq.pyrimidine_content('AUGGG')
0.2

Get autocorrelation (rough indicator for secondary structure).

>>> sq.correlation('AACC')
0.0
>>> sq.correlation('AAATTT')
2.3
>>> sq.correlation('AAATTCT')
1.3047619047619046
>>> sq.correlation('AAACTTT')
1.9238095238095236

Provide a second sequence to get correlation between sequences.

>>> sq.correlation('AAA', 'TTT')
0.0
>>> sq.correlation('AAA', 'AAA')
3.0

Distances

Calculate Levenshtein (insert, delete, mutate) distance.

>>> sq.levenshtein('AAATTT', 'AAATTT')
0
>>> sq.levenshtein('AAATTT', 'ACTTT')
2
>>> sq.levenshtein('AAAG', 'TCGA')
4

Calculate Hamming (mismatch) distance.

>>> sq.hamming('AAA', 'ATA')
1
>>> sq.hamming('AAA', 'ATT')
2
>>> sq.hamming('AAA', 'TTT')
3

Search

Search sequences using IUPAC symbols and iterate through the results.

>>> for (start, end), match in sq.find_iupac('ARY', 'AATAGCAGTGTGAAC'):
...     print(f"Found ARY at {start}:{end}: {match}")
... 
Found ARY at 0:3: AAT
Found ARY at 3:6: AGC
Found ARY at 6:9: AGT
Found ARY at 12:15: AAC

Find common Type IIS restriction sites:

>>> sq.which_re_sites('AAAGAAG')
()
>>> sq.which_re_sites('AAAGAAGAC')
('BbsI',)
>>> sq.which_re_sites('AAAGAAGACACCTGC')
('BbsI', 'PaqCI')

Documentation

Check the API here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streq-0.0.1.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

streq-0.0.1-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file streq-0.0.1.tar.gz.

File metadata

  • Download URL: streq-0.0.1.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for streq-0.0.1.tar.gz
Algorithm Hash digest
SHA256 bc622a919b435c064f008e2b1b49ccf91646b649bb1ceed2e43062450d2b7d0c
MD5 ff1bfe6d44bc6be66fce77408dae7bf5
BLAKE2b-256 d6b2142d208e9bf9d2519799fdd4a7a3f19138a6c59f31cc95fce61c00247ec6

See more details on using hashes here.

Provenance

File details

Details for the file streq-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: streq-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for streq-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9bb7c2583283973138dc3e4137585465174d615b500c12f250f084b33cd4c175
MD5 0cf967a1e9292546579ea804fdf997a1
BLAKE2b-256 b3850a42b47b72eeedec6bee40f3b5ab17cb930b6446e30c377a7ca26b4f7cb8

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page