A Python package to design probes against overrepresented sequences in a fastq file.

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

PROBETHEUS

A Python package to detect overrepresented sequences in a fastq file and design probes against them. Designed for single read sequencing from immunoprecipitation experiments, riboSeq, and other single read sequencing experiments.

Installation

pip install probetheus

Features

Process single-end FASTQ files to find top represented sequences
Generate probes from top sequences with customizable lengths
Cluster sequences based on edit distance
Detect probe binding sites against reference sequences
Generate cumulative percentage plots with elbow point detection
Reanalyze results with custom elbow points
Subsample input files for faster analysis or testing

Usage

Processing FASTQ Files and Generating Probes

# Basic usage
probetheus process input.fastq.gz -o results.tsv

# With core sequence analysis
probetheus process input.fastq.gz -o results.tsv --find_core --core_length 25

# Process without length filtering
probetheus process input.fastq.gz -o results.tsv --skip_length_filter

# Check probe binding against reference
probetheus process input.fastq.gz -o results.tsv -r ref.fasta --max_binding_dist 2

# Process with subsampling (e.g., use 20% of reads)
probetheus process *.fastq.gz -o results.tsv --subsample 20

Reanalyzing Results

After initial processing, you can reanalyze the results with either a different elbow point or by specifying the number of probes:

# Reanalyze with a new elbow point
probetheus reanalyze --input results.tsv --elbow 5 --output-prefix new_results

# Reanalyze by specifying number of probes
probetheus reanalyze --input results.tsv --probes 10 --output-prefix new_results

This will create:

new_results_reanalyzed.tsv: New results file with selected sequences
new_results_reanalyzed_cumulative.png: Updated cumulative plot
new_results_reanalyzed_probes.fasta: New probe sequences

Arguments

Process Command

--output, -o: Output table file
--min-length: Minimum sequence length (default: 20)
--max-length: Maximum sequence length (default: 50)
--top-n: Number of top sequences to use for probe generation (default: 20)
--probe-length: Length of generated probes (default: 25)
--min-probe-length: Minimum acceptable probe length (default: 20)
--edit-distance: Maximum edit distance for clustering (default: 1)
--find-core: Find core sequences
--core-length: Length for core sequence analysis (default: 25)
--min-core-occurrence: Minimum fraction of sequences a core must appear in (default: 0.1)
--reference, -r: Reference FASTA file to check probe binding
--max-binding-dist: Maximum edit distance allowed for probe binding (default: 2)
--subsample: Subsample percentage (1-100) of reads from each file
--reads: Number of reads to take from each file (overrides --subsample if provided)
--cpus: Number of CPU cores to use (default: 8, max: number of cores - 1)

Reanalyze Command

--input, -i: Input results.tsv file from previous analysis
--elbow, -e: New elbow point (number of sequences to keep)
--probes, -p: Number of probes to generate (alternative to --elbow)
--output-prefix, -o: Prefix for output files (optional)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

0.2.3

Dec 20, 2024

0.2.2

Dec 20, 2024

This version

0.2.1

Dec 19, 2024

0.2.0

Dec 19, 2024

0.1.8

Dec 19, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

probetheus-0.2.1.tar.gz (15.1 kB view details)

Uploaded Dec 19, 2024 Source

File details

Details for the file probetheus-0.2.1.tar.gz.

File metadata

Download URL: probetheus-0.2.1.tar.gz
Upload date: Dec 19, 2024
Size: 15.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for probetheus-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`39e13a85f7078eb67a46020737037ea4461d3074cf122fb0ce0515718ec0c00e`
MD5	`52c83c1748515356cf776773412fd91c`
BLAKE2b-256	`88dd7f99956600356faad867f3e0c166742ef2082a087bceef1a41a386741239`

See more details on using hashes here.

probetheus 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

PROBETHEUS

Installation

Features

Usage

Processing FASTQ Files and Generating Probes

Reanalyzing Results

Arguments

Process Command

Reanalyze Command

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes