Skip to main content

PWM regression optimizer for motif discovery in DNA sequences

Project description

pyprego

PyPI CI Docs License: MIT

Python implementation of the prego R package — a PWM Regression Optimizer for motif discovery in DNA sequences.

Installation

pip install -e .

Optional dependencies:

pip install pymisha   # for genomic interval integration
pip install logomaker  # for sequence logo plots

Quick Start

Continuous regression (find motifs correlated with a response)

import pyprego

# sequences: list of equal-length DNA strings
# response: 1D or 2D numpy array (one row per sequence)
result = pyprego.regress_pwm(sequences, response)

# Result contains:
result.pssm       # PSSM DataFrame (pos, A, C, G, T)
result.spat       # Spatial model DataFrame (bin, spat_factor)
result.pred       # Predictions for each sequence
result.consensus  # Consensus motif string
result.r2         # R-squared per response dimension

# Predict on new sequences
new_scores = result.predict(new_sequences)

Binary classification (find motifs that discriminate two classes)

result = pyprego.regress_pwm(
    sequences, binary_response,  # 0/1 vector
    score_metric="ks"
)
result.ks    # KS test statistic
result.pred  # Predictions

Multiple motifs

result = pyprego.regress_pwm(sequences, response, motif_num=3)
result.models      # List of individual motif models
result.multi_stats # Statistics for each motif
result.pred        # Combined predictions

PWM scoring with known motif

scores = pyprego.compute_pwm(sequences, pssm, spat=spat_model, bidirect=True)
local_scores = pyprego.compute_local_pwm(sequences, pssm)

K-mer screening

kmers = pyprego.screen_kmers(sequences, response, kmer_len=8)
print(kmers.head())  # Top correlated k-mers

PSSM utilities

pyprego.pssm_cor(pssm1, pssm2)       # Correlation between PSSMs
pyprego.pssm_match(pssm, motif_db)   # Match against database
pyprego.bits_per_pos(pssm)            # Information content
pyprego.consensus_from_pssm(pssm)     # Consensus sequence
pyprego.pssm_rc(pssm)                 # Reverse complement
pyprego.pssm_trim(pssm)              # Trim low-info edges

Model export/import

from pyprego.export import export_regression_model, load_regression_model

export_regression_model(result, "model.json")
loaded = load_regression_model("model.json")
new_scores = loaded.predict(new_sequences)

API Compatibility with R prego

pyprego implements the same functions as the R package:

R function Python function Status
regress_pwm() pyprego.regress_pwm() Complete
regress_multiple_motifs() pyprego.regress_pwm(motif_num=N) Complete
compute_pwm() pyprego.compute_pwm() Complete
compute_local_pwm() pyprego.compute_local_pwm() Complete
screen_kmers() pyprego.screen_kmers() Complete
generate_kmers() pyprego.generate_kmers() Complete
kmer_matrix() pyprego.kmer_matrix() Complete
pssm_cor() / pssm_diff() pyprego.pssm_cor() / pyprego.pssm_diff() Complete
pssm_match() pyprego.pssm_match() Complete
pssm_trim() / pssm_rc() pyprego.pssm_trim() / pyprego.pssm_rc() Complete
bits_per_pos() pyprego.bits_per_pos() Complete
create_motif_db() pyprego.create_motif_db() Complete
extract_pwm() pyprego.motif_db.extract_pwm() Complete
plot_pssm_logo() pyprego.plot_pssm_logo() Complete
intervals_to_seq() pyprego.intervals_to_seq() Complete (requires pymisha)
gextract_pwm() pyprego.gextract_pwm() Complete (requires pymisha)

Testing

# Fast tests (~6 seconds)
pytest tests/ --ignore=tests/test_high_level.py --ignore=tests/test_regression.py --ignore=tests/test_integration.py

# Full suite (includes slow regression tests)
pytest tests/

Architecture

  • NumPy-based: All computation uses NumPy arrays (no GPU/PyTorch dependency)
  • pandas DataFrames: PSSMs and spatial models use DataFrames matching R conventions
  • Optional pymisha: Genomic functions work when pymisha is installed
  • GPU-ready design: Clean array interfaces allow future torch tensor swap

See DECISIONS.md for detailed architecture decisions.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyprego-0.0.2.tar.gz (1.9 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyprego-0.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

pyprego-0.0.2-cp312-cp312-macosx_11_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pyprego-0.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

pyprego-0.0.2-cp311-cp311-macosx_11_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pyprego-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file pyprego-0.0.2.tar.gz.

File metadata

  • Download URL: pyprego-0.0.2.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyprego-0.0.2.tar.gz
Algorithm Hash digest
SHA256 623ecbbe8f238eed61200c708f37ae1c8f2b469e9793c1fadc33f963f6f56c03
MD5 6c99704c15482a7244d62419ad70f799
BLAKE2b-256 0939bb86e7c5a5f9dc988730b6ae9adcb01f1133b7ea19547b46a0af5b3a0970

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.2.tar.gz:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyprego-0.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyprego-0.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b1fc687fba498de39c58df1afb5be92cf71cb4cbac256eea36a8f75b94d89339
MD5 9869faf1c37c60e7a82674c06bba0dd8
BLAKE2b-256 326ecfc91fe06970d2039c8dd3c7db4e34b0db04c101c8b267e5d5e561a87e02

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyprego-0.0.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyprego-0.0.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b5dfbbc572498e31d747ee686579df370c630155ed56e00422d800c4b269b8b6
MD5 522f221c9603d04bda3474112b150f9c
BLAKE2b-256 aeb921ab0397b452c1a27eedfca2d70fd8a7cd17c19f12ae250f77a78ab9ca39

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyprego-0.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyprego-0.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e40c5089507f8422ba74b22be88cac855e629de51bf027c07c5539b7b5d99ec0
MD5 4245f45c4994ac0c5d8fa89fcf5dcd00
BLAKE2b-256 b0f15e93b1a00e3222108eb36a7f25930b7bba5edba5c65a3af8abe0b05bf02e

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyprego-0.0.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyprego-0.0.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 70bfda5254a40ca1e89071d37509d9e1b7281b78c2513ba0e1d111b59b764228
MD5 a983246ff2c1040e63eb90de98b98654
BLAKE2b-256 4ed0dc77d67830b6c8a29ef492edcfa9d6c0fcbaeafbd9215e0c3095ad3e9d38

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.2-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyprego-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyprego-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e49ab586d8320a2822ca459b09920472092c99dd9c71b1d61bbc557f2b787d36
MD5 a2c628c679ea48c43511f92a0157d68f
BLAKE2b-256 4777eb8f67be876be271be62059e7fb1268acf99ff269777d02d582028733f15

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page