Skip to main content

PWM regression optimizer for motif discovery in DNA sequences

Project description

pyprego

PyPI CI Docs License: MIT

Python implementation of the prego R package — a PWM Regression Optimizer for motif discovery in DNA sequences.

Installation

pip install -e .

Optional dependencies:

pip install pymisha   # for genomic interval integration
pip install logomaker  # for sequence logo plots

Quick Start

Continuous regression (find motifs correlated with a response)

import pyprego

# sequences: list of equal-length DNA strings
# response: 1D or 2D numpy array (one row per sequence)
result = pyprego.regress_pwm(sequences, response)

# Result contains:
result.pssm       # PSSM DataFrame (pos, A, C, G, T)
result.spat       # Spatial model DataFrame (bin, spat_factor)
result.pred       # Predictions for each sequence
result.consensus  # Consensus motif string
result.r2         # R-squared per response dimension

# Predict on new sequences
new_scores = result.predict(new_sequences)

Binary classification (find motifs that discriminate two classes)

result = pyprego.regress_pwm(
    sequences, binary_response,  # 0/1 vector
    score_metric="ks"
)
result.ks    # KS test statistic
result.pred  # Predictions

Multiple motifs

result = pyprego.regress_pwm(sequences, response, motif_num=3)
result.models      # List of individual motif models
result.multi_stats # Statistics for each motif
result.pred        # Combined predictions

PWM scoring with known motif

scores = pyprego.compute_pwm(sequences, pssm, spat=spat_model, bidirect=True)
local_scores = pyprego.compute_local_pwm(sequences, pssm)

K-mer screening

kmers = pyprego.screen_kmers(sequences, response, kmer_len=8)
print(kmers.head())  # Top correlated k-mers

PSSM utilities

pyprego.pssm_cor(pssm1, pssm2)       # Correlation between PSSMs
pyprego.pssm_match(pssm, motif_db)   # Match against database
pyprego.bits_per_pos(pssm)            # Information content
pyprego.consensus_from_pssm(pssm)     # Consensus sequence
pyprego.pssm_rc(pssm)                 # Reverse complement
pyprego.pssm_trim(pssm)              # Trim low-info edges

Model export/import

from pyprego.export import export_regression_model, load_regression_model

export_regression_model(result, "model.json")
loaded = load_regression_model("model.json")
new_scores = loaded.predict(new_sequences)

API Compatibility with R prego

pyprego implements the same functions as the R package:

R function Python function Status
regress_pwm() pyprego.regress_pwm() Complete
regress_multiple_motifs() pyprego.regress_pwm(motif_num=N) Complete
compute_pwm() pyprego.compute_pwm() Complete
compute_local_pwm() pyprego.compute_local_pwm() Complete
screen_kmers() pyprego.screen_kmers() Complete
generate_kmers() pyprego.generate_kmers() Complete
kmer_matrix() pyprego.kmer_matrix() Complete
pssm_cor() / pssm_diff() pyprego.pssm_cor() / pyprego.pssm_diff() Complete
pssm_match() pyprego.pssm_match() Complete
pssm_trim() / pssm_rc() pyprego.pssm_trim() / pyprego.pssm_rc() Complete
bits_per_pos() pyprego.bits_per_pos() Complete
create_motif_db() pyprego.create_motif_db() Complete
extract_pwm() pyprego.motif_db.extract_pwm() Complete
plot_pssm_logo() pyprego.plot_pssm_logo() Complete
intervals_to_seq() pyprego.intervals_to_seq() Complete (requires pymisha)
gextract_pwm() pyprego.gextract_pwm() Complete (requires pymisha)

Testing

# Fast tests (~6 seconds)
pytest tests/ --ignore=tests/test_high_level.py --ignore=tests/test_regression.py --ignore=tests/test_integration.py

# Full suite (includes slow regression tests)
pytest tests/

Architecture

  • NumPy-based: All computation uses NumPy arrays (no GPU/PyTorch dependency)
  • pandas DataFrames: PSSMs and spatial models use DataFrames matching R conventions
  • Optional pymisha: Genomic functions work when pymisha is installed
  • GPU-ready design: Clean array interfaces allow future torch tensor swap

See DECISIONS.md for detailed architecture decisions.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyprego-0.0.1.tar.gz (1.9 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyprego-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

pyprego-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

pyprego-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file pyprego-0.0.1.tar.gz.

File metadata

  • Download URL: pyprego-0.0.1.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyprego-0.0.1.tar.gz
Algorithm Hash digest
SHA256 2a95be3b48ceadd007b074c5e2902b75e69d6404d42c61e30522aba5c2ca09ac
MD5 0b5aa6ead878cef4f290efc963ad5d09
BLAKE2b-256 34cffb829a446409cf8543b22271697e0c78ff75318a3315e60b9061e425a123

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.1.tar.gz:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyprego-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyprego-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 44790ded170fc9d9d2aeff276a55cdb915d8a5b20497ef785d94719fcf40ebee
MD5 3756846a914face87772db3cf0de42d6
BLAKE2b-256 5a05555c33b01773a7de360a3194469cb13fcd30cc42e4b95e972fd1eff3d960

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyprego-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyprego-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b035df7db0c754e397c0c011fcc0ba7d96332d03ee6bb3373c6865a26062ee4b
MD5 3fe58cc3485657d1e0d49fccfe8e6e75
BLAKE2b-256 1dcf877d07cf389e0fe55d19afe1575cfced7ed28871deac601fd1d806c6b537

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyprego-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyprego-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e4a0dc33e0e32205d6c00c1da80ddcfb3729eb03a57fde0997943ab4c112c95b
MD5 810b6cd1515cda63e17057d71b4db10e
BLAKE2b-256 89306f8e7518178c7f6205f4ba9b6e943e995dbc58561572649ee9491c389355

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page