Evolutionary Walsh-Hadamard transform and compressed sensing for protein fitness landscapes
Project description
eWHT: Evolutionary Walsh-Hadamard Transform for Fitness Landscapes
ewht is a Python package for analyzing combinatorial fitness landscapes using the evolutionary Walsh-Hadamard transform (eWHT). It provides:
- Fast O(N log N) forward and inverse eWHT transforms
- Evolutionary mutation probabilities
psfrom MSAs or ESM2-650M - Data preprocessing helpers (genotype encoding, evolutionary subsampling)
- Compressed sensing with LASSO on eWHT/WHT bases
Installation
ewht supports Python 3.9 and above. Install from PyPI:
pip install ewht
Optional extras:
pip install ewht[esm] # ESM2-650M ps estimation (requires torch + transformers)
Quickstart
The package contains an example CR6261-H1 dataset from the paper. Load it, estimate ps from MSA, compute the eWHT, and run compressed sensing. The full script can be found in example_ewht.py:
import ewht
# Load data and preprocess
raw = ewht.load_example()
print(raw.head())
mutant mutated_sequence fitness estimated_fitness
0 WT QVQLVESGAEVKKPGSSVKVSCKASGGTFSSYAISWVRQAPGQGPE... 7.0 0
1 L104V QVQLVESGAEVKKPGSSVKVSCKASGGTFSSYAISWVRQAPGQGPE... 7.0 0
2 A79V QVQLVESGAEVKKPGSSVKVSCKASGGTFSSYAISWVRQAPGQGPE... 7.0 0
3 A79V;L104V QVQLVESGAEVKKPGSSVKVSCKASGGTFSSYAISWVRQAPGQGPE... 7.0 0
4 S77G QVQLVESGAEVKKPGSSVKVSCKASGGTFSSYAISWVRQAPGQGPE... 7.0 0
POSITIONS = [28, 30, 58, 59, 62, 74, 75, 76, 77, 79, 104]
MUTANTS = ["P", "R", "T", "K", "P", "D", "F", "A", "G", "V", "V"]
WT = (
"QVQLVESGAEVKKPGSSVKVSCKASGGTFSSYAISWVRQAPGQGPEWMGGIIPIFGTANYAQKFQGRVTITADKSTSTAYMELSSLRSEDTAMYYCAKHMGYQLRETMDVWGQGTTVTVSS"
)
L = len(POSITIONS)
print(df.head())
print(f"{df['genotype'].nunique()} unique genotypes, L={L}")
mutant mutated_sequence fitness estimated_fitness genotype
0 WT QVQLVESGAEVKKPGSSVKVSCKASGGTFSSYAISWVRQAPGQGPE... 7.0 0 00000000000
1 L104V QVQLVESGAEVKKPGSSVKVSCKASGGTFSSYAISWVRQAPGQGPE... 7.0 0 00000000001
2 A79V QVQLVESGAEVKKPGSSVKVSCKASGGTFSSYAISWVRQAPGQGPE... 7.0 0 00000000010
3 A79V;L104V QVQLVESGAEVKKPGSSVKVSCKASGGTFSSYAISWVRQAPGQGPE... 7.0 0 00000000011
4 S77G QVQLVESGAEVKKPGSSVKVSCKASGGTFSSYAISWVRQAPGQGPE... 7.0 0 00000000100
2048 unique genotypes, L=11
with example_msa() as msa_path:
# Compute ps from MSA
ps = get_ps(WT_SEQUENCE, POSITIONS, MUTANTS, msa=msa_path)
plot_ps(ps, OUTPUT_DIR / "ps_from_msa.png")
# Compute eWHT
coeffs, center = efwht_from_dataframe(df, ps, basis="eWHT")
plot_ewht_spectrum(coeffs, L, OUTPUT_DIR / "ewht_spectrum.png", max_order=MAX_ORDER)
# Sample evolutionary sequences for compressed sensing
train, test = sample_evolutionary_sequences(
df,
ps,
msa=msa_path,
positions=POSITIONS,
wt_sequence=WT_SEQUENCE,
mutants=MUTANTS,
fraction=0.75,
train_n=TRAIN_N,
random_state=0,
)
print(f"train={len(train)}, test={len(test)}")
train=100, test=162
# Run compressed sensing experiment
result = run_cs_experiment(train, test, ps, basis="eWHT", center_by_ps=True, random_state=0)
print(f"best lambda: {result.best_lambda}")
print(f"train R²: {result.train_metrics['r2']:.4f}")
print(f"test R²: {result.test_metrics['r2']:.4f}")
best lambda: 0.005
train R²: 0.9662
test R²: 0.8282
print(f"Figures in {OUTPUT_DIR.resolve()}/")
Run the full example:
python example_ewht.py
Evolutionary mutation probabilities
get_ps estimates per-site mutation probabilities from an MSA or, if no MSA is given, from ESM2-650M:
eWHT spectrum
The forward transform decomposes the centered landscape into Walsh coefficients grouped by interaction order:
Core API
| Function | Description |
|---|---|
efwht_from_dataframe(df, ps) |
Forward eWHT from a preprocessed DataFrame |
efwht(y, ps) |
Forward eWHT on a length-2^L landscape vector |
iefwht(coeffs, ps) |
Inverse eWHT (exact round-trip with matching norm) |
get_ps(sequence, positions, mutants, msa=...) |
Per-site mutation probabilities |
genotypes_from_dataframe(df, positions, wt_sequence, mutants) |
Build binary genotype column from sequences |
sample_evolutionary_sequences(df, ps, ...) |
Evolutionary subsampling with optional MSA mask |
run_cs_experiment(train, test, ps) |
Lasso compressed sensing with CV on train |
Genotype encodings
ewht accepts genotypes as:
- Binary strings:
"00101"(0= WT,1= mutant) - Pseudoboolean strings:
"1-1-11"(1= WT,-1= mutant)
For custom mappings, add a genotype column directly instead of using genotypes_from_dataframe.
Optional dependencies
| Extra | Packages | Use case |
|---|---|---|
| (default) | numpy, pandas, scipy, scikit-learn | transforms, MSA-based ps, CS |
ewht[esm] |
torch, transformers | ps from ESM2-650M when no MSA is available |
Publishing to PyPI
From a clean checkout of the repository:
# Install build tools
pip install build twine
# Build sdist + wheel (includes bundled example_data/)
python -m build
# Upload to TestPyPI first (recommended)
twine upload --repository testpypi dist/*
# Verify install
pip install --index-url https://test.pypi.org/simple/ ewht
# Upload to PyPI
twine upload dist/*
Before the first upload:
- Create accounts on PyPI and TestPyPI.
- Configure an API token:
~/.pypircorTWINE_USERNAME=__token__/TWINE_PASSWORD=pypi-.... - Ensure the package name
ewhtis available on PyPI (or changenameinpyproject.toml). - Bump
versioninpyproject.tomlandewht/__init__.pyfor each release.
Development
pip install -e ".[esm]"
pytest tests/ -v -m "not slow"
python example_ewht.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ewht-0.0.1.tar.gz.
File metadata
- Download URL: ewht-0.0.1.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db3cdd4bb9cf30d2965ed7dfd564eb4eac91bbb4225461c20241b627461d4afc
|
|
| MD5 |
dd56222b6e4bb94e8536114aefb43a8b
|
|
| BLAKE2b-256 |
d4156a93fdb833c1ca1f208ac84b656285ada30cf193d725c7788c3c23b5e9f6
|
File details
Details for the file ewht-0.0.1-py3-none-any.whl.
File metadata
- Download URL: ewht-0.0.1-py3-none-any.whl
- Upload date:
- Size: 3.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c9038a879b5ae8e6ae8e2070cddd41968bddbb737317fec6fbaf33670134c71
|
|
| MD5 |
cdb70df98ad6fc6f008364c59d4910ad
|
|
| BLAKE2b-256 |
27d5c1ce8d57601f4834b26218864d8f97845b5829fcf9417a9259794c754e42
|