Skip to main content

Encoder-only protein tagger (SDPA + RoPE + SwiGLU)

Project description

STōk: structure tokenizer

Encoder-only protein structure tokenizer using SDPA attention with RoPE and a SwiGLU MLP, managed via Hydra. The classifier can be tied to a frozen VQ codebook for per-residue structure tokens.

install

pip install stok

smoke test

The following smoke test command will print the config, model parameter count, and run a tiny forward pass:

stok smoke-test

Config overrides can be used to run a smoke test using a different model architecture. This is useful for testing different architectures to ensure that the selected hyperparameters are compatible.

stok smoke-test model.encoder.d_model=512 model.encoder.n_heads=8 model.encoder.n_layers=6

codebook presets and custom files

By default the model uses the built-in codebook preset base, which corresponds to the codebook used in the Large GCP-VQVAE model. Config overrides can be used to change the codebook.

  • Use a different built-in preset (for example, the codebook use in the Lite GCP-VQVAE model variant:

    stok smoke-test model.codebook.preset=lite
    
  • Use a custom codebook file (overrides preset):

    stok smoke-test model.codebook.path=/abs/path/to/codebook.pt
    

If using a custom codebook file, it must be a PyTorch tensor saved in .pt format and of shape [C, d_code], where C is the codebook size and d_code is the codebook dimension. If d_code does not match the encoder model dimension, a linear projection will be automatically added to the classifier head.

Configuration fields:

model:
  codebook:
    preset: "base"   # one of: "base", "lite" (default: base)
    path: null       # custom file path; when set, overrides preset

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stok-0.1.0.tar.gz (6.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stok-0.1.0-py3-none-any.whl (6.0 MB view details)

Uploaded Python 3

File details

Details for the file stok-0.1.0.tar.gz.

File metadata

  • Download URL: stok-0.1.0.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stok-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a524a23511da7942d96ebe616d0533a3e2ec5fe6a5e96dfae2bb2320e38c04c0
MD5 38bebe0bb0f3cf2c3249cc34fdc1d586
BLAKE2b-256 66cffcafad7d3acbb5524c36f648dea52646306b0ee576ad59489fa99248de4d

See more details on using hashes here.

Provenance

The following attestation bundles were made for stok-0.1.0.tar.gz:

Publisher: python-publish.yaml on briney/stok

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file stok-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: stok-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stok-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 80bd8d65486643df5e681bcd6892752f8dc9ef0eae2f255c68b9da0d6c300b03
MD5 71bbc04dbb0c047da349361d6e2cc577
BLAKE2b-256 2ac25977f4a61af34c496bddc93e4c13982e5f71f992b3b96da34bb71a243ad1

See more details on using hashes here.

Provenance

The following attestation bundles were made for stok-0.1.0-py3-none-any.whl:

Publisher: python-publish.yaml on briney/stok

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page