Skip to main content

Encoder-only protein tagger (SDPA + RoPE + SwiGLU)

Project description

STōk: structure tokenizer

Encoder-only protein structure tokenizer using SDPA attention with RoPE and a SwiGLU MLP, managed via Hydra. The classifier can be tied to a frozen VQ codebook for per-residue structure tokens.

install

pip install stok

smoke test

The following smoke test command will print the config, model parameter count, and run a tiny forward pass:

stok smoke-test

Config overrides can be used to run a smoke test using a different model architecture. This is useful for testing different architectures to ensure that the selected hyperparameters are compatible.

stok smoke-test model.encoder.d_model=512 model.encoder.n_heads=8 model.encoder.n_layers=6

codebook presets and custom files

By default the model uses the built-in codebook preset base, which corresponds to the codebook used in the Large GCP-VQVAE model. Config overrides can be used to change the codebook.

  • Use a different built-in preset (for example, the codebook use in the Lite GCP-VQVAE model variant:

    stok smoke-test model.codebook.preset=lite
    
  • Use a custom codebook file (overrides preset):

    stok smoke-test model.codebook.path=/abs/path/to/codebook.pt
    

If using a custom codebook file, it must be a PyTorch tensor saved in .pt format and of shape [C, d_code], where C is the codebook size and d_code is the codebook dimension. If d_code does not match the encoder model dimension, a linear projection will be automatically added to the classifier head.

Configuration fields:

model:
  codebook:
    preset: "base"   # one of: "base", "lite" (default: base)
    path: null       # custom file path; when set, overrides preset

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stok-0.1.1.tar.gz (6.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stok-0.1.1-py3-none-any.whl (6.0 MB view details)

Uploaded Python 3

File details

Details for the file stok-0.1.1.tar.gz.

File metadata

  • Download URL: stok-0.1.1.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stok-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0d4b9ec8d6db6694516e1151e42dab4730ebf1e610d03fb5e95a14e8f3f2caf9
MD5 14662286b2250444283d9c899f7a4032
BLAKE2b-256 5622042d651f0577acec203e7f2fdfbf146295a5902acbf54c74d74bad8bb954

See more details on using hashes here.

Provenance

The following attestation bundles were made for stok-0.1.1.tar.gz:

Publisher: python-publish.yaml on briney/stok

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file stok-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: stok-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stok-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a88ec5f5f267f1f65ac625e4ebae5e48721d636e6a766a9e52164405345307d4
MD5 4eaa9b30858016c145d43c1a431d8772
BLAKE2b-256 d4a3329e2da5f5319f2234aaf00bb2070a20a349e08b095b75f9c37364b8f45f

See more details on using hashes here.

Provenance

The following attestation bundles were made for stok-0.1.1-py3-none-any.whl:

Publisher: python-publish.yaml on briney/stok

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page