Skip to main content

A Python package to design probes against overrepresented sequences in a fastq file.

Project description

PROBETHEUS

A Python package to detect overrepresented sequences in a fastq file and design probes against them. Designed for single read sequencing from immunoprecipitation experiments, riboSeq, and other single read sequencing experiments.

Installation

pip install probetheus

Features

  • Process single-end FASTQ files to find top represented sequences
  • Generate probes from top sequences with customizable lengths
  • Cluster sequences based on edit distance
  • Detect probe binding sites against reference sequences
  • Generate cumulative percentage plots with elbow point detection
  • Reanalyze results with custom elbow points
  • Subsample input files for faster analysis or testing

Usage

Processing FASTQ Files and Generating Probes

# Basic usage
probetheus process input.fastq.gz -o results.tsv

# With core sequence analysis
probetheus process input.fastq.gz -o results.tsv --find_core --core_length 25

# Process without length filtering
probetheus process input.fastq.gz -o results.tsv --skip_length_filter

# Check probe binding against reference
probetheus process input.fastq.gz -o results.tsv -r ref.fasta --max_binding_dist 2

# Process with subsampling (e.g., use 20% of reads)
probetheus process *.fastq.gz -o results.tsv --subsample 20

Reanalyzing Results

After initial processing, you can reanalyze the results with a different elbow point:

# Reanalyze with a new elbow point
probetheus reanalyze --input results.tsv --elbow 5 --output-prefix new_results

This will create:

  • new_results_reanalyzed.tsv: New results file with selected sequences
  • new_results_reanalyzed_cumulative.png: Updated cumulative plot

Arguments

Process Command

  • --output, -o: Output table file
  • --min-length: Minimum sequence length (default: 20)
  • --max-length: Maximum sequence length (default: 50)
  • --top-n: Number of top sequences to use for probe generation (default: 20)
  • --probe-length: Length of generated probes (default: 25)
  • --min-probe-length: Minimum acceptable probe length (default: 20)
  • --edit-distance: Maximum edit distance for clustering (default: 1)
  • --find-core: Find core sequences
  • --core-length: Length for core sequence analysis (default: 25)
  • --min-core-occurrence: Minimum fraction of sequences a core must appear in (default: 0.1)
  • --reference, -r: Reference FASTA file to check probe binding
  • --max-binding-dist: Maximum edit distance allowed for probe binding (default: 2)
  • --subsample: Subsample percentage (1-100) of reads from each file
  • --reads: Number of reads to take from each file (overrides --subsample if provided)
  • --cpus: Number of CPU cores to use (default: 8, max: number of cores - 1)

Reanalyze Command

  • --input, -i: Input results.tsv file from previous analysis
  • --elbow, -e: New elbow point (number of sequences to keep)
  • --output-prefix, -o: Prefix for output files (optional)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

probetheus-0.1.8.tar.gz (14.4 kB view details)

Uploaded Source

File details

Details for the file probetheus-0.1.8.tar.gz.

File metadata

  • Download URL: probetheus-0.1.8.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for probetheus-0.1.8.tar.gz
Algorithm Hash digest
SHA256 ab2c339cc60608ef749becd1e402cc9de077c3b9d81baa2c2a1a8cd68a547617
MD5 082d2c32cb8b1bcc5e27b4476e110396
BLAKE2b-256 65bb028f4da00cce4cfb9aed281a6754c57942ba529baa019615dfb6826b5cab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page