Skip to main content

Encoder-only protein tagger (SDPA + RoPE + SwiGLU)

Project description

STōk: structure tokenizer

Encoder-only protein structure tokenizer using SDPA attention with RoPE and a SwiGLU MLP, managed via Hydra. The classifier can be tied to a frozen VQ codebook for per-residue structure tokens.

install

pip install stok

smoke test

The following smoke test command will print the config, model parameter count, and run a tiny forward pass:

stok smoke-test

Config overrides can be used to run a smoke test using a different model architecture. This is useful for testing different architectures to ensure that the selected hyperparameters are compatible.

stok smoke-test model.encoder.d_model=512 model.encoder.n_heads=8 model.encoder.n_layers=6

codebook presets and custom files

By default the model uses the built-in codebook preset base, which corresponds to the codebook used in the Large GCP-VQVAE model. Config overrides can be used to change the codebook.

  • Use a different built-in preset (for example, the codebook use in the Lite GCP-VQVAE model variant:

    stok smoke-test model.codebook.preset=lite
    
  • Use a custom codebook file (overrides preset):

    stok smoke-test model.codebook.path=/abs/path/to/codebook.pt
    

If using a custom codebook file, it must be a PyTorch tensor saved in .pt format and of shape [C, d_code], where C is the codebook size and d_code is the codebook dimension. If d_code does not match the encoder model dimension, a linear projection will be automatically added to the classifier head.

Configuration fields:

model:
  codebook:
    preset: "base"   # one of: "base", "lite" (default: base)
    path: null       # custom file path; when set, overrides preset

training

Single‑GPU (quick/dev):

stok train \
  data.train=/abs/path/to/train.csv \
  data.eval=/abs/path/to/eval.csv

Multi‑GPU with Accelerate (spawns one process per GPU):

accelerate launch -m stok.train \
  data.train=/abs/path/to/train.csv \
  data.eval=/abs/path/to/eval.csv

Notes:

  • Verify your setup with:
    accelerate env
    
  • If your default Accelerate config is not set to 8 processes, you can pass:
    accelerate launch --num_processes 8 -m stok.train ...
    
  • DataLoader workers are per process. Tune data.num_workers to avoid oversubscription when using many GPUs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stok-0.1.4.tar.gz (6.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stok-0.1.4-py3-none-any.whl (6.0 MB view details)

Uploaded Python 3

File details

Details for the file stok-0.1.4.tar.gz.

File metadata

  • Download URL: stok-0.1.4.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stok-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ec13b0dc4cade38f8d5adc33a95fb27da4a7e4adea5079d651afaa562261f1d3
MD5 15c494f8687b3ca4ad0c01c4acec09f4
BLAKE2b-256 5be0ffcdcaf3d9a0b5d16ba01708113ac28aed3a2a49c3f2232a8e7f505008b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for stok-0.1.4.tar.gz:

Publisher: python-publish.yaml on briney/stok

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file stok-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: stok-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 6.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stok-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 483f21244fe348f55350d12d4b62bbf58f0476753c913f8448669026d73f0408
MD5 d764caffc5477e8d970d2f6c19baf50d
BLAKE2b-256 bb7468c499cb14598d7d982a7701e09226d1f40c47f3bddd9aced51eae993c60

See more details on using hashes here.

Provenance

The following attestation bundles were made for stok-0.1.4-py3-none-any.whl:

Publisher: python-publish.yaml on briney/stok

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page