Rapidly identify repeats in a DNA sequence
Project description
Repeat Finder - Finding Repeats in DNA sequences
Repeatfinder is a stand alone program to quickly find repeats in DNA sequences. You might find that it is remarkably similar to an essential module in PhiSpy.
How to find repeats
Using the command line
You can use the pydna_repeatfinder
command to find repeats in a fasta sequence.
By default, pydna_repeatfinder
just prints the repeats in a simple
tab-separated format that is easy to read and includes the DNA sequence.
For example:
$ pydna_repeatfinder -f tests/test_long.fasta
random_sequence Number:1 Len1:52 Len2:52 92 144 1946 1998 TCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT TCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT
random_sequence Number:2 Len1:52 Len2:52 92 144 197 145 TCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT CACCGTATACTCCATACTTCCGCGATACCCTTAGGCTAACCTACGACAACTG
random_sequence Number:3 Len1:53 Len2:53 145 198 1998 1945 CACCGTATACTCCATACTTCCGCGATACCCTTAGGCTAACCTACGACAACTGA ATCAGTTGTCGTAGGTTAGCCTAAGGGTATCGCGGAAGTATGGAGTATACGGT
There are two long repeats here. The first from 92-144 is repeated in the same orientation (a direct repeat) at position 1946-1998.
We can also output the results formated so you can paste them directly into a GenBank file. This is perhaps the easiest way to visualise the repeats.
$ pydna_repeatfinder -f tests/test_long.fasta -o genbank
repeat_region join(92..144,1946..1998)
/note="direct repeat number 1 of length 53"
/rpt_type="direct"
repeat_region join(92..144,complement(197..145))
/note="inverted repeat number 2 of length 53"
/rpt_type="inverted"
repeat_region join(145..198,complement(1998..1945))
/note="inverted repeat number 3 of length 54"
/rpt_type="inverted"
In your own code
You can import the pydna_repeatfinder
module and use it in your own code:
from PyRepeatFinder import find_repeats
r = find_repeats(dna_seq, gap_len, min_len, 0)
for rpt in r:
# rpt is a dictionary with keys:
# repeat_number
# first_start
# first_end
# second_start
# second_end
We are happy to add more output formats, please post a GitHub issue and tag it as an enhancement.
Installing pydna_repeatfinder
You should install it using bioconda:
mamba create -n pydna_repeatfinder -c bioconda pydna_repeatfinder
mamba activate pydna_repeatfinder
Citing pydna_repeatfinder
Please see the citation file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pydnarepeatfinder-0.2.9.tar.gz
.
File metadata
- Download URL: pydnarepeatfinder-0.2.9.tar.gz
- Upload date:
- Size: 21.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f88be25392e2fbb3f0d524b527725b4c9b2578a1ece0949a8dd0f72d1b137a6 |
|
MD5 | 027768ae8c69c9dfae5fb79414e49081 |
|
BLAKE2b-256 | 137b83211ef14464cde76e065043e9003d5b905bd95bc1076446219aa284b41c |