Skip to main content

Encoder-only protein tagger (SDPA + RoPE + SwiGLU)

Project description

STōk: structure tokenizer

Encoder-only protein structure tokenizer using SDPA attention with RoPE and a SwiGLU MLP, managed via Hydra. The classifier can be tied to a frozen VQ codebook for per-residue structure tokens.

install

pip install stok

smoke test

The following smoke test command will print the config, model parameter count, and run a tiny forward pass:

stok smoke-test

Config overrides can be used to run a smoke test using a different model architecture. This is useful for testing different architectures to ensure that the selected hyperparameters are compatible.

stok smoke-test model.encoder.d_model=512 model.encoder.n_heads=8 model.encoder.n_layers=6

codebook presets and custom files

By default the model uses the built-in codebook preset base, which corresponds to the codebook used in the Large GCP-VQVAE model. Config overrides can be used to change the codebook.

  • Use a different built-in preset (for example, the codebook use in the Lite GCP-VQVAE model variant:

    stok smoke-test model.codebook.preset=lite
    
  • Use a custom codebook file (overrides preset):

    stok smoke-test model.codebook.path=/abs/path/to/codebook.pt
    

If using a custom codebook file, it must be a PyTorch tensor saved in .pt format and of shape [C, d_code], where C is the codebook size and d_code is the codebook dimension. If d_code does not match the encoder model dimension, a linear projection will be automatically added to the classifier head.

Configuration fields:

model:
  codebook:
    preset: "base"   # one of: "base", "lite" (default: base)
    path: null       # custom file path; when set, overrides preset

training

Single‑GPU (quick/dev):

stok train \
  data.train=/abs/path/to/train.csv \
  data.eval=/abs/path/to/eval.csv

Multi‑GPU with Accelerate (spawns one process per GPU):

accelerate launch -m stok.train \
  data.train=/abs/path/to/train.csv \
  data.eval=/abs/path/to/eval.csv

Notes:

  • Verify your setup with:
    accelerate env
    
  • If your default Accelerate config is not set to 8 processes, you can pass:
    accelerate launch --num_processes 8 -m stok.train ...
    
  • DataLoader workers are per process. Tune data.num_workers to avoid oversubscription when using many GPUs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stok-0.1.3.tar.gz (6.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stok-0.1.3-py3-none-any.whl (6.0 MB view details)

Uploaded Python 3

File details

Details for the file stok-0.1.3.tar.gz.

File metadata

  • Download URL: stok-0.1.3.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stok-0.1.3.tar.gz
Algorithm Hash digest
SHA256 de7cd628ebd079b9d7e899a823048ad3a532e5ec362f9810100ab20bae577543
MD5 289c2c3c953285a311ccb9d19bd72e3a
BLAKE2b-256 48188801b27cc6955ca729d9e1a93d13dfcbd78ce8a55a6081cb53ac7ef209d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for stok-0.1.3.tar.gz:

Publisher: python-publish.yaml on briney/stok

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file stok-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: stok-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 6.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stok-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f73c0e793c27bb544bae359f5002876801251fcb772efba9d235259257b6240a
MD5 8b403b87e44f88ddd028369d819120ee
BLAKE2b-256 9b54abf4ca6c2c876f2df9e576f7c4d502963e7b427ad81d04a9de9925a36fd6

See more details on using hashes here.

Provenance

The following attestation bundles were made for stok-0.1.3-py3-none-any.whl:

Publisher: python-publish.yaml on briney/stok

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page