Encoder-only protein tagger (SDPA + RoPE + SwiGLU)
Project description
STōk: structure tokenizer
Encoder-only protein structure tokenizer using SDPA attention with RoPE and a SwiGLU MLP, managed via Hydra. The classifier can be tied to a frozen VQ codebook for per-residue structure tokens.
install
pip install stok
smoke test
The following smoke test command will print the config, model parameter count, and run a tiny forward pass:
stok smoke-test
Config overrides can be used to run a smoke test using a different model architecture. This is useful for testing different architectures to ensure that the selected hyperparameters are compatible.
stok smoke-test model.encoder.d_model=512 model.encoder.n_heads=8 model.encoder.n_layers=6
codebook presets and custom files
By default the model uses the built-in codebook preset base, which corresponds to the codebook used in the Large GCP-VQVAE model. Config overrides can be used to change the codebook.
-
Use a different built-in preset (for example, the codebook use in the Lite GCP-VQVAE model variant:
stok smoke-test model.codebook.preset=lite
-
Use a custom codebook file (overrides preset):
stok smoke-test model.codebook.path=/abs/path/to/codebook.pt
If using a custom codebook file, it must be a PyTorch tensor saved in .pt format and of shape [C, d_code], where C is the codebook size and d_code is the codebook dimension. If d_code does not match the encoder model dimension, a linear projection will be automatically added to the classifier head.
Configuration fields:
model:
codebook:
preset: "base" # one of: "base", "lite" (default: base)
path: null # custom file path; when set, overrides preset
training
Single‑GPU (quick/dev):
stok train \
data.train=/abs/path/to/train.csv \
data.eval=/abs/path/to/eval.csv
Multi‑GPU with Accelerate (spawns one process per GPU):
accelerate launch -m stok.train \
data.train=/abs/path/to/train.csv \
data.eval=/abs/path/to/eval.csv
Notes:
- Verify your setup with:
accelerate env - If your default Accelerate config is not set to 8 processes, you can pass:
accelerate launch --num_processes 8 -m stok.train ...
- DataLoader workers are per process. Tune
data.num_workersto avoid oversubscription when using many GPUs.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stok-0.1.4.tar.gz.
File metadata
- Download URL: stok-0.1.4.tar.gz
- Upload date:
- Size: 6.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec13b0dc4cade38f8d5adc33a95fb27da4a7e4adea5079d651afaa562261f1d3
|
|
| MD5 |
15c494f8687b3ca4ad0c01c4acec09f4
|
|
| BLAKE2b-256 |
5be0ffcdcaf3d9a0b5d16ba01708113ac28aed3a2a49c3f2232a8e7f505008b6
|
Provenance
The following attestation bundles were made for stok-0.1.4.tar.gz:
Publisher:
python-publish.yaml on briney/stok
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stok-0.1.4.tar.gz -
Subject digest:
ec13b0dc4cade38f8d5adc33a95fb27da4a7e4adea5079d651afaa562261f1d3 - Sigstore transparency entry: 673213689
- Sigstore integration time:
-
Permalink:
briney/stok@d5c38129a1e5583eac98406f20cbf50f8907b55c -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/briney
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yaml@d5c38129a1e5583eac98406f20cbf50f8907b55c -
Trigger Event:
release
-
Statement type:
File details
Details for the file stok-0.1.4-py3-none-any.whl.
File metadata
- Download URL: stok-0.1.4-py3-none-any.whl
- Upload date:
- Size: 6.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
483f21244fe348f55350d12d4b62bbf58f0476753c913f8448669026d73f0408
|
|
| MD5 |
d764caffc5477e8d970d2f6c19baf50d
|
|
| BLAKE2b-256 |
bb7468c499cb14598d7d982a7701e09226d1f40c47f3bddd9aced51eae993c60
|
Provenance
The following attestation bundles were made for stok-0.1.4-py3-none-any.whl:
Publisher:
python-publish.yaml on briney/stok
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stok-0.1.4-py3-none-any.whl -
Subject digest:
483f21244fe348f55350d12d4b62bbf58f0476753c913f8448669026d73f0408 - Sigstore transparency entry: 673213701
- Sigstore integration time:
-
Permalink:
briney/stok@d5c38129a1e5583eac98406f20cbf50f8907b55c -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/briney
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yaml@d5c38129a1e5583eac98406f20cbf50f8907b55c -
Trigger Event:
release
-
Statement type: