Extract and analyze satellite DNA from raw sequences.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

extracTR

Introduction

extracTR is a tool for identifying and analyzing tandem repeats in genomic sequences. It works with raw sequencing data (FASTQ) or assembled genomes (FASTA), using k-mer based approaches to detect repetitive patterns efficiently.

Features

Efficient tandem repeat detection from raw sequencing data
Support for single-end and paired-end FASTQ files
Support for genome assemblies in FASTA format
Customizable parameters for fine-tuning repeat detection
Output in easy-to-analyze CSV format
Multi-threaded processing for improved performance

Requirements

Python 3.7 or later
Jellyfish 2.3.0 or later
Conda (for easy environment management)

Installation

We recommend installing extracTR in a separate Conda environment to manage dependencies effectively.

Create a new Conda environment:

conda create -n extractr_env python=3.9

Activate the environment:

conda activate extractr_env

Install Jellyfish:

conda install -c bioconda jellyfish

Install extracTR using pip:

pip install extracTR

To deactivate the environment when you're done:

conda deactivate

Usage

Before running extracTR, ensure that you have removed adapters from your sequencing reads and activated the Conda environment:

conda activate extractr_env

Basic usage:

For paired-end FASTQ files:

extracTR -1 reads_1.fastq -2 reads_2.fastq -o output_prefix -c 30

For single-end FASTQ file:

extracTR -1 reads.fastq -o output_prefix -c 30

For genome assembly in FASTA format:

extracTR -f genome.fasta -o output_prefix -c 30

Advanced usage with custom parameters:

extracTR -1 reads_1.fastq -2 reads_2.fastq -o output_prefix -t 64 -c 30 -k 25

Options:

-1, --fastq1: Input file with forward DNA sequences in FASTQ format
-2, --fastq2: Input file with reverse DNA sequences in FASTQ format (optional for paired-end data)
-f, --fasta: Input genome assembly in FASTA format
-o, --output: Prefix for output files
-t, --threads: Number of threads to use (default: 32)
-c, --coverage: Coverage to use for indexing (required)
-k, --k: K-mer size to use for indexing (default: 23)
--lu: Coverage cutoff for k-mers (default: 100 * coverage)

Note: You must provide either FASTQ file(s) or a FASTA file as input.

Output

extracTR generates the following output files:

{output_prefix}.csv: Main output file containing detected tandem repeats
{output_prefix}.sdat: Intermediate file with k-mer frequency data
Additional files for detailed analysis and debugging

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

0.2.20

Dec 5, 2024

0.2.19

Dec 5, 2024

0.2.18

Dec 4, 2024

0.2.17

Nov 5, 2024

0.2.16

Nov 4, 2024

0.2.15

Nov 4, 2024

0.2.14

Oct 31, 2024

0.2.13

Oct 31, 2024

0.2.12

Oct 31, 2024

0.2.11

Sep 27, 2024

0.2.10

Sep 27, 2024

0.2.9

Sep 27, 2024

0.2.8

Sep 27, 2024

0.2.7

Sep 27, 2024

0.2.6

Sep 27, 2024

0.2.5

Sep 27, 2024

0.2.4

Sep 27, 2024

0.2.3

Sep 27, 2024

0.2.2

Sep 27, 2024

0.2.1

Sep 27, 2024

0.2.0

Sep 27, 2024

0.1.0

Sep 27, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extracTR-0.2.20.tar.gz (15.7 kB view details)

Uploaded Dec 5, 2024 Source

File details

Details for the file extracTR-0.2.20.tar.gz.

File metadata

Download URL: extracTR-0.2.20.tar.gz
Upload date: Dec 5, 2024
Size: 15.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.19

File hashes

Hashes for extracTR-0.2.20.tar.gz
Algorithm	Hash digest
SHA256	`86d33014a9a4ef09f53413c1c341acbf50a9f27cd3128f38076aed1d5dd291e7`
MD5	`6f7f18196d2653ff3b02ddbf4418149f`
BLAKE2b-256	`f157fddd46329008879f65cf11d60d21de1c21a3ec80b886210fb14874491a68`

See more details on using hashes here.

extracTR 0.2.20

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

extracTR

Introduction

Features

Requirements

Installation

Usage

Output

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes