Skip to main content

PWM regression optimizer for motif discovery in DNA sequences

Project description

pyprego

PyPI CI Docs License: MIT

Python implementation of the prego R package — a PWM Regression Optimizer for motif discovery in DNA sequences.

Installation

pip install -e .

Optional dependencies:

pip install pymisha   # for genomic interval integration
pip install logomaker  # for sequence logo plots

Quick Start

Continuous regression (find motifs correlated with a response)

import pyprego

# sequences: list of equal-length DNA strings
# response: 1D or 2D numpy array (one row per sequence)
result = pyprego.regress_pwm(sequences, response)

# Result contains:
result.pssm       # PSSM DataFrame (pos, A, C, G, T)
result.spat       # Spatial model DataFrame (bin, spat_factor)
result.pred       # Predictions for each sequence
result.consensus  # Consensus motif string
result.r2         # R-squared per response dimension

# Predict on new sequences
new_scores = result.predict(new_sequences)

Binary classification (find motifs that discriminate two classes)

result = pyprego.regress_pwm(
    sequences, binary_response,  # 0/1 vector
    score_metric="ks"
)
result.ks    # KS test statistic
result.pred  # Predictions

Multiple motifs

result = pyprego.regress_pwm(sequences, response, motif_num=3)
result.models      # List of individual motif models
result.multi_stats # Statistics for each motif
result.pred        # Combined predictions

PWM scoring with known motif

scores = pyprego.compute_pwm(sequences, pssm, spat=spat_model, bidirect=True)
local_scores = pyprego.compute_local_pwm(sequences, pssm)

K-mer screening

kmers = pyprego.screen_kmers(sequences, response, kmer_len=8)
print(kmers.head())  # Top correlated k-mers

PSSM utilities

pyprego.pssm_cor(pssm1, pssm2)       # Correlation between PSSMs
pyprego.pssm_match(pssm, motif_db)   # Match against database
pyprego.bits_per_pos(pssm)            # Information content
pyprego.consensus_from_pssm(pssm)     # Consensus sequence
pyprego.pssm_rc(pssm)                 # Reverse complement
pyprego.pssm_trim(pssm)              # Trim low-info edges

Model export/import

from pyprego.export import export_regression_model, load_regression_model

export_regression_model(result, "model.json")
loaded = load_regression_model("model.json")
new_scores = loaded.predict(new_sequences)

API Compatibility with R prego

pyprego implements the same functions as the R package:

R function Python function Status
regress_pwm() pyprego.regress_pwm() Complete
regress_multiple_motifs() pyprego.regress_pwm(motif_num=N) Complete
compute_pwm() pyprego.compute_pwm() Complete
compute_local_pwm() pyprego.compute_local_pwm() Complete
screen_kmers() pyprego.screen_kmers() Complete
generate_kmers() pyprego.generate_kmers() Complete
kmer_matrix() pyprego.kmer_matrix() Complete
pssm_cor() / pssm_diff() pyprego.pssm_cor() / pyprego.pssm_diff() Complete
pssm_match() pyprego.pssm_match() Complete
pssm_trim() / pssm_rc() pyprego.pssm_trim() / pyprego.pssm_rc() Complete
bits_per_pos() pyprego.bits_per_pos() Complete
create_motif_db() pyprego.create_motif_db() Complete
extract_pwm() pyprego.motif_db.extract_pwm() Complete
plot_pssm_logo() pyprego.plot_pssm_logo() Complete
intervals_to_seq() pyprego.intervals_to_seq() Complete (requires pymisha)
gextract_pwm() pyprego.gextract_pwm() Complete (requires pymisha)

Testing

# Fast tests (~6 seconds)
pytest tests/ --ignore=tests/test_high_level.py --ignore=tests/test_regression.py --ignore=tests/test_integration.py

# Full suite (includes slow regression tests)
pytest tests/

Architecture

  • NumPy-based: All computation uses NumPy arrays (no GPU/PyTorch dependency)
  • pandas DataFrames: PSSMs and spatial models use DataFrames matching R conventions
  • Optional pymisha: Genomic functions work when pymisha is installed
  • GPU-ready design: Clean array interfaces allow future torch tensor swap

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyprego-0.0.3.tar.gz (1.9 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyprego-0.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

pyprego-0.0.3-cp312-cp312-macosx_11_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pyprego-0.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

pyprego-0.0.3-cp311-cp311-macosx_11_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pyprego-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file pyprego-0.0.3.tar.gz.

File metadata

  • Download URL: pyprego-0.0.3.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyprego-0.0.3.tar.gz
Algorithm Hash digest
SHA256 eea0da47e19d090d70af76d327819ffda45aef5870a46e889fc45cc983f9b5eb
MD5 3cc7e6e5e2bfa6ce799641cad089504a
BLAKE2b-256 5683059ce01af1627a0ad11669f952e37fb52215b53751969ac8160f0bae77d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.3.tar.gz:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyprego-0.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyprego-0.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3736f4c35ac956b580a988d5c56a91ac5ea2316e6f449739ab43243f2e11e4dd
MD5 12646ca086ad9a0188fb658e893c7207
BLAKE2b-256 d6e40e0bb6f5673771e3d33b74cc8cc240616415aef55e4db12e942b33f9b11a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyprego-0.0.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyprego-0.0.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f36d7752ece4c28a57ce8486983f03b7a53679519df25e0dede74ac89b1719ca
MD5 1aea74c1d23823bb10a57281d72feb6f
BLAKE2b-256 9c592b9cdbad5fec582609bf928e8d4ef8c92e81fb4c83f5c631e0518821197d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.3-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyprego-0.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyprego-0.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 273169a9e78ca7f8f373575c619cca68b263eef26610f8bf6a126cf3db102dea
MD5 dfe3aec49a3df15e278fb2a4dde7b2e9
BLAKE2b-256 b8f6aab141c7fc948f620b511be595009cb41ffeca4aaa7d1e157eaa9add2630

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyprego-0.0.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyprego-0.0.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 429780e1d98396ef191e888677a21421a38c81739b082f118ca25740adce2b1d
MD5 615e136fb8ab4afa3b1d31a7a0cda010
BLAKE2b-256 b1f3d5fccae7323f940119230bd43e6ce27eb6ef2b954de28ad5adee94d4059b

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.3-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyprego-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyprego-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9f8cdd44f382bd41e481a770d179b9a04d353a4145efb10388c882e458f04062
MD5 372618a5c900ba39d83bc397f5625e17
BLAKE2b-256 b4b6a44ed8492a52d79fbb0865f150ff2a53674229eeb9c009db9230248130b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprego-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pyprego

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page