PWM regression optimizer for motif discovery in DNA sequences
Project description
pyprego
Python implementation of the prego R package — a PWM Regression Optimizer for motif discovery in DNA sequences.
Installation
pip install -e .
Optional dependencies:
pip install pymisha # for genomic interval integration
pip install logomaker # for sequence logo plots
Quick Start
Continuous regression (find motifs correlated with a response)
import pyprego
# sequences: list of equal-length DNA strings
# response: 1D or 2D numpy array (one row per sequence)
result = pyprego.regress_pwm(sequences, response)
# Result contains:
result.pssm # PSSM DataFrame (pos, A, C, G, T)
result.spat # Spatial model DataFrame (bin, spat_factor)
result.pred # Predictions for each sequence
result.consensus # Consensus motif string
result.r2 # R-squared per response dimension
# Predict on new sequences
new_scores = result.predict(new_sequences)
Binary classification (find motifs that discriminate two classes)
result = pyprego.regress_pwm(
sequences, binary_response, # 0/1 vector
score_metric="ks"
)
result.ks # KS test statistic
result.pred # Predictions
Multiple motifs
result = pyprego.regress_pwm(sequences, response, motif_num=3)
result.models # List of individual motif models
result.multi_stats # Statistics for each motif
result.pred # Combined predictions
PWM scoring with known motif
scores = pyprego.compute_pwm(sequences, pssm, spat=spat_model, bidirect=True)
local_scores = pyprego.compute_local_pwm(sequences, pssm)
K-mer screening
kmers = pyprego.screen_kmers(sequences, response, kmer_len=8)
print(kmers.head()) # Top correlated k-mers
PSSM utilities
pyprego.pssm_cor(pssm1, pssm2) # Correlation between PSSMs
pyprego.pssm_match(pssm, motif_db) # Match against database
pyprego.bits_per_pos(pssm) # Information content
pyprego.consensus_from_pssm(pssm) # Consensus sequence
pyprego.pssm_rc(pssm) # Reverse complement
pyprego.pssm_trim(pssm) # Trim low-info edges
Model export/import
from pyprego.export import export_regression_model, load_regression_model
export_regression_model(result, "model.json")
loaded = load_regression_model("model.json")
new_scores = loaded.predict(new_sequences)
API Compatibility with R prego
pyprego implements the same functions as the R package:
| R function | Python function | Status |
|---|---|---|
regress_pwm() |
pyprego.regress_pwm() |
Complete |
regress_multiple_motifs() |
pyprego.regress_pwm(motif_num=N) |
Complete |
compute_pwm() |
pyprego.compute_pwm() |
Complete |
compute_local_pwm() |
pyprego.compute_local_pwm() |
Complete |
screen_kmers() |
pyprego.screen_kmers() |
Complete |
generate_kmers() |
pyprego.generate_kmers() |
Complete |
kmer_matrix() |
pyprego.kmer_matrix() |
Complete |
pssm_cor() / pssm_diff() |
pyprego.pssm_cor() / pyprego.pssm_diff() |
Complete |
pssm_match() |
pyprego.pssm_match() |
Complete |
pssm_trim() / pssm_rc() |
pyprego.pssm_trim() / pyprego.pssm_rc() |
Complete |
bits_per_pos() |
pyprego.bits_per_pos() |
Complete |
create_motif_db() |
pyprego.create_motif_db() |
Complete |
extract_pwm() |
pyprego.motif_db.extract_pwm() |
Complete |
plot_pssm_logo() |
pyprego.plot_pssm_logo() |
Complete |
intervals_to_seq() |
pyprego.intervals_to_seq() |
Complete (requires pymisha) |
gextract_pwm() |
pyprego.gextract_pwm() |
Complete (requires pymisha) |
Testing
# Fast tests (~6 seconds)
pytest tests/ --ignore=tests/test_high_level.py --ignore=tests/test_regression.py --ignore=tests/test_integration.py
# Full suite (includes slow regression tests)
pytest tests/
Architecture
- NumPy-based: All computation uses NumPy arrays (no GPU/PyTorch dependency)
- pandas DataFrames: PSSMs and spatial models use DataFrames matching R conventions
- Optional pymisha: Genomic functions work when pymisha is installed
- GPU-ready design: Clean array interfaces allow future torch tensor swap
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyprego-0.0.3.tar.gz.
File metadata
- Download URL: pyprego-0.0.3.tar.gz
- Upload date:
- Size: 1.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eea0da47e19d090d70af76d327819ffda45aef5870a46e889fc45cc983f9b5eb
|
|
| MD5 |
3cc7e6e5e2bfa6ce799641cad089504a
|
|
| BLAKE2b-256 |
5683059ce01af1627a0ad11669f952e37fb52215b53751969ac8160f0bae77d7
|
Provenance
The following attestation bundles were made for pyprego-0.0.3.tar.gz:
Publisher:
publish.yml on tanaylab/pyprego
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyprego-0.0.3.tar.gz -
Subject digest:
eea0da47e19d090d70af76d327819ffda45aef5870a46e889fc45cc983f9b5eb - Sigstore transparency entry: 1238649397
- Sigstore integration time:
-
Permalink:
tanaylab/pyprego@9b2194bfbe6f653ecce21b7b5924ed3a4d0e2f48 -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/tanaylab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9b2194bfbe6f653ecce21b7b5924ed3a4d0e2f48 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyprego-0.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pyprego-0.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.2 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3736f4c35ac956b580a988d5c56a91ac5ea2316e6f449739ab43243f2e11e4dd
|
|
| MD5 |
12646ca086ad9a0188fb658e893c7207
|
|
| BLAKE2b-256 |
d6e40e0bb6f5673771e3d33b74cc8cc240616415aef55e4db12e942b33f9b11a
|
Provenance
The following attestation bundles were made for pyprego-0.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on tanaylab/pyprego
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyprego-0.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
3736f4c35ac956b580a988d5c56a91ac5ea2316e6f449739ab43243f2e11e4dd - Sigstore transparency entry: 1238649414
- Sigstore integration time:
-
Permalink:
tanaylab/pyprego@9b2194bfbe6f653ecce21b7b5924ed3a4d0e2f48 -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/tanaylab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9b2194bfbe6f653ecce21b7b5924ed3a4d0e2f48 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyprego-0.0.3-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: pyprego-0.0.3-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f36d7752ece4c28a57ce8486983f03b7a53679519df25e0dede74ac89b1719ca
|
|
| MD5 |
1aea74c1d23823bb10a57281d72feb6f
|
|
| BLAKE2b-256 |
9c592b9cdbad5fec582609bf928e8d4ef8c92e81fb4c83f5c631e0518821197d
|
Provenance
The following attestation bundles were made for pyprego-0.0.3-cp312-cp312-macosx_11_0_arm64.whl:
Publisher:
publish.yml on tanaylab/pyprego
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyprego-0.0.3-cp312-cp312-macosx_11_0_arm64.whl -
Subject digest:
f36d7752ece4c28a57ce8486983f03b7a53679519df25e0dede74ac89b1719ca - Sigstore transparency entry: 1238649401
- Sigstore integration time:
-
Permalink:
tanaylab/pyprego@9b2194bfbe6f653ecce21b7b5924ed3a4d0e2f48 -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/tanaylab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9b2194bfbe6f653ecce21b7b5924ed3a4d0e2f48 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyprego-0.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pyprego-0.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.2 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
273169a9e78ca7f8f373575c619cca68b263eef26610f8bf6a126cf3db102dea
|
|
| MD5 |
dfe3aec49a3df15e278fb2a4dde7b2e9
|
|
| BLAKE2b-256 |
b8f6aab141c7fc948f620b511be595009cb41ffeca4aaa7d1e157eaa9add2630
|
Provenance
The following attestation bundles were made for pyprego-0.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on tanaylab/pyprego
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyprego-0.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
273169a9e78ca7f8f373575c619cca68b263eef26610f8bf6a126cf3db102dea - Sigstore transparency entry: 1238649408
- Sigstore integration time:
-
Permalink:
tanaylab/pyprego@9b2194bfbe6f653ecce21b7b5924ed3a4d0e2f48 -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/tanaylab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9b2194bfbe6f653ecce21b7b5924ed3a4d0e2f48 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyprego-0.0.3-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: pyprego-0.0.3-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
429780e1d98396ef191e888677a21421a38c81739b082f118ca25740adce2b1d
|
|
| MD5 |
615e136fb8ab4afa3b1d31a7a0cda010
|
|
| BLAKE2b-256 |
b1f3d5fccae7323f940119230bd43e6ce27eb6ef2b954de28ad5adee94d4059b
|
Provenance
The following attestation bundles were made for pyprego-0.0.3-cp311-cp311-macosx_11_0_arm64.whl:
Publisher:
publish.yml on tanaylab/pyprego
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyprego-0.0.3-cp311-cp311-macosx_11_0_arm64.whl -
Subject digest:
429780e1d98396ef191e888677a21421a38c81739b082f118ca25740adce2b1d - Sigstore transparency entry: 1238649430
- Sigstore integration time:
-
Permalink:
tanaylab/pyprego@9b2194bfbe6f653ecce21b7b5924ed3a4d0e2f48 -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/tanaylab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9b2194bfbe6f653ecce21b7b5924ed3a4d0e2f48 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyprego-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pyprego-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.2 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f8cdd44f382bd41e481a770d179b9a04d353a4145efb10388c882e458f04062
|
|
| MD5 |
372618a5c900ba39d83bc397f5625e17
|
|
| BLAKE2b-256 |
b4b6a44ed8492a52d79fbb0865f150ff2a53674229eeb9c009db9230248130b9
|
Provenance
The following attestation bundles were made for pyprego-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on tanaylab/pyprego
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyprego-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
9f8cdd44f382bd41e481a770d179b9a04d353a4145efb10388c882e458f04062 - Sigstore transparency entry: 1238649423
- Sigstore integration time:
-
Permalink:
tanaylab/pyprego@9b2194bfbe6f653ecce21b7b5924ed3a4d0e2f48 -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/tanaylab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9b2194bfbe6f653ecce21b7b5924ed3a4d0e2f48 -
Trigger Event:
push
-
Statement type: