Skip to main content

Encoder-only protein tagger (SDPA + RoPE + SwiGLU)

Project description

STōk: structure tokenizer

Encoder-only protein structure tokenizer using SDPA attention with RoPE and a SwiGLU MLP, managed via Hydra. The classifier can be tied to a frozen VQ codebook for per-residue structure tokens.

install

pip install stok

smoke test

The following smoke test command will print the config, model parameter count, and run a tiny forward pass:

stok smoke-test

Config overrides can be used to run a smoke test using a different model architecture. This is useful for testing different architectures to ensure that the selected hyperparameters are compatible.

stok smoke-test model.encoder.d_model=512 model.encoder.n_heads=8 model.encoder.n_layers=6

codebook presets and custom files

By default the model uses the built-in codebook preset base, which corresponds to the codebook used in the Large GCP-VQVAE model. Config overrides can be used to change the codebook.

  • Use a different built-in preset (for example, the codebook use in the Lite GCP-VQVAE model variant:

    stok smoke-test model.codebook.preset=lite
    
  • Use a custom codebook file (overrides preset):

    stok smoke-test model.codebook.path=/abs/path/to/codebook.pt
    

If using a custom codebook file, it must be a PyTorch tensor saved in .pt format and of shape [C, d_code], where C is the codebook size and d_code is the codebook dimension. If d_code does not match the encoder model dimension, a linear projection will be automatically added to the classifier head.

Configuration fields:

model:
  codebook:
    preset: "base"   # one of: "base", "lite" (default: base)
    path: null       # custom file path; when set, overrides preset

training

Single‑GPU (quick/dev):

stok train \
  data.train=/abs/path/to/train.csv \
  data.eval=/abs/path/to/eval.csv

Multi‑GPU with Accelerate (spawns one process per GPU):

accelerate launch -m stok.train \
  data.train=/abs/path/to/train.csv \
  data.eval=/abs/path/to/eval.csv

Notes:

  • Verify your setup with:
    accelerate env
    
  • If your default Accelerate config is not set to 8 processes, you can pass:
    accelerate launch --num_processes 8 -m stok.train ...
    
  • DataLoader workers are per process. Tune data.num_workers to avoid oversubscription when using many GPUs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stok-0.1.2.tar.gz (6.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stok-0.1.2-py3-none-any.whl (6.0 MB view details)

Uploaded Python 3

File details

Details for the file stok-0.1.2.tar.gz.

File metadata

  • Download URL: stok-0.1.2.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stok-0.1.2.tar.gz
Algorithm Hash digest
SHA256 aa537501d5fc8eaf5a997859559260d5af1cf8acd6b1bffe707617cc69dfc56a
MD5 b6deaf9e5d584fd89522acdad934fdd4
BLAKE2b-256 ff638d9694f93afc3710804d7a40a49a5c9e24d03967026fe742551ec2ea3085

See more details on using hashes here.

Provenance

The following attestation bundles were made for stok-0.1.2.tar.gz:

Publisher: python-publish.yaml on briney/stok

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file stok-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: stok-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stok-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 872ecd653006123c0261a93a58629f5ea46cc97d964fdef734d059a29c65f6f0
MD5 024fa0c9e783535fdcb3aa3d559447d4
BLAKE2b-256 29dd06425e5c94e9064eaac40fd7c5ca507bee658054cb2706663c0a6e8345f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for stok-0.1.2-py3-none-any.whl:

Publisher: python-publish.yaml on briney/stok

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page