Rapidly identify repeats in a DNA sequence

These details have not been verified by PyPI

Environment
- Console
- MacOS X
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- POSIX :: Linux
Programming Language
- C
- Python :: 3.12
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

GitHub language count

Repeat Finder - Finding Repeats in DNA sequences

Repeatfinder is a stand alone program to quickly find repeats in DNA sequences. You might find that it is remarkably similar to an essential module in PhiSpy.

How to find repeats

Using the command line

You can use the pydna_repeatfinder command to find repeats in a fasta sequence.

By default, pydna_repeatfinder just prints the repeats in a simple tab-separated format that is easy to read and includes the DNA sequence.

For example:

$ pydna_repeatfinder -f tests/test_long.fasta
random_sequence Number:1        Len1:52 Len2:52 92      144     1946    1998    TCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT    TCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT
random_sequence Number:2        Len1:52 Len2:52 92      144     197     145     TCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT    CACCGTATACTCCATACTTCCGCGATACCCTTAGGCTAACCTACGACAACTG
random_sequence Number:3        Len1:53 Len2:53 145     198     1998    1945    CACCGTATACTCCATACTTCCGCGATACCCTTAGGCTAACCTACGACAACTGA   ATCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT

There are two long repeats here. The first from 92-144 is repeated in the same orientation (a direct repeat) at position 1946-1998.

We can also output the results formated so you can paste them directly into a GenBank file. This is perhaps the easiest way to visualise the repeats.

$ pydna_repeatfinder -f tests/test_long.fasta -o genbank
     repeat_region   join(92..144,1946..1998)
                     /note="direct repeat number 1 of length 53"
                     /rpt_type="direct"
     repeat_region   join(92..144,complement(197..145))
                     /note="inverted repeat number 2 of length 53"
                     /rpt_type="inverted"
     repeat_region   join(145..198,complement(1998..1945))
                     /note="inverted repeat number 3 of length 54"
                     /rpt_type="inverted"

In your own code

You can import the pydna_repeatfinder module and use it in your own code:

from PyRepeatFinder import find_repeats

r = find_repeats(dna_seq, gap_len, min_len, 0)
for rpt in r:
	# rpt is a dictionary with keys:
	# repeat_number
	# first_start
	# first_end
	# second_start
	# second_end

We are happy to add more output formats, please post a GitHub issue and tag it as an enhancement.

Installing pydna_repeatfinder

You should install it using bioconda:

mamba create -n pydna_repeatfinder -c bioconda pydna_repeatfinder
mamba activate pydna_repeatfinder

Citing pydna_repeatfinder

Please see the citation file.

Project details

These details have not been verified by PyPI

Environment
- Console
- MacOS X
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- POSIX :: Linux
Programming Language
- C
- Python :: 3.12
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

0.2.9

Jun 11, 2024

0.2.8

Jun 3, 2024

0.2.7

Jun 3, 2024

0.2.6

Jun 3, 2024

0.2.5

Jun 2, 2024

0.2.4

Jun 2, 2024

0.2.3

May 31, 2024

0.2.1

Apr 21, 2024

0.1.0

Apr 20, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydnarepeatfinder-0.2.9.tar.gz (21.3 kB view hashes)

Uploaded Jun 11, 2024 Source

Hashes for pydnarepeatfinder-0.2.9.tar.gz

Hashes for pydnarepeatfinder-0.2.9.tar.gz
Algorithm	Hash digest
SHA256	`2f88be25392e2fbb3f0d524b527725b4c9b2578a1ece0949a8dd0f72d1b137a6`
MD5	`027768ae8c69c9dfae5fb79414e49081`
BLAKE2b-256	`137b83211ef14464cde76e065043e9003d5b905bd95bc1076446219aa284b41c`