Skip to main content

Add your description here

Project description

Procustes

procustes truncates a protein around a ligand using cutoff-based residue selection.

CLI

procustes INPUT_STRUCTURE OUTPUT_DIR [options]

Required positional arguments:

  • INPUT_STRUCTURE: input .pdb or .cif containing protein + ligand
  • OUTPUT_DIR: base output directory where all outputs are written

Options:

  • --ligand ligand residue name (default: LIG)
  • --cutoff cutoff distance in angstrom (default: 4.0)
  • --ca use only alpha-carbon distances (default: use any residue atom)
  • --fill-gaplength internal removed gaps shorter than this value are restored to original residues; gaps at or above it are considered for alanine-based filling (default: 4)
  • --extra-residues comma-separated extra protein residues to force-keep before gap logic (RESID for single-chain inputs, CHAIN:RESID for multi-chain; spaces/trailing commas are accepted)
  • --nofill disable long-gap filling
  • --caps add ACE/NME caps to all no-fill biopolymer chains (requires --nofill)
  • --fill-method filling backend: pdbfixer or boltz (default: pdbfixer)
  • --fill-models-count number of Boltz fill candidates per cutoff (default: 3, max: 20)
  • --aa-length residue spacing used to estimate minimum bridge alanines from terminal CA distance (default: 4.0)
  • --boltz-cache optional Boltz cache path
  • --boltz-diffusion-samples diffusion samples passed to boltz predict (default: 1)
  • --boltz-devices device count passed to boltz predict (default: 1)
  • --boltz-accelerator accelerator passed to boltz predict: cpu, gpu, tpu (default: gpu)
  • --boltz-use-msa-server pass --use_msa_server to boltz predict
  • --no-boltz-potentials disable Boltz --use_potentials (enabled by default)
  • --boltz-template-threshold template force threshold written in Boltz YAML (default: 0.1)
  • --color colorized progress output mode: auto (default), always, never
  • --quiet disable progress output

Outputs are written directly under OUTPUT_DIR:

OUTPUT_DIR/
  _boltz/
    cutoff_<cutoff>_template.pdb
    cutoff_<cutoff>_<candidate>/
      <job>.yaml
      predictions/...
  a<cutoff>truncated.pdb
  b<cutoff>truncated.pdb
  ...
  <cutoff>truncated.pdb
  summary.json

OUTPUT_DIR is created if missing, but only when its parent directory already exists. If the parent path does not exist, procustes fails with an error.

summary.json is written once per run and includes run parameters (including extra_residues_requested) plus a cutoffs array (single entry) with residue counts, candidate scores, winning candidate metadata, and extra_residues_applied.

When --nofill is set, Boltz is skipped and only <cutoff>truncated.pdb is written.

When both --nofill --caps are set, every resulting protein chain is capped with ACE and NME, chain IDs are reassigned deterministically starting at A, and small-molecule binder chain IDs are reassigned from X to avoid collisions.

If --nofill is set, custom fill arguments (--fill-models-count, --aa-length, or any --boltz-* option) raise an error.

If --caps is set without --nofill, procustes raises an error.

If --fill-method pdbfixer is selected, any --boltz-* options raise an error.

Final output normalization is always applied to <cutoff>truncated.pdb: ligand/small-molecule residues are written first, small-molecule chain IDs are assigned from X, and biopolymer chains are assigned from A to avoid chain-ID collisions.

During CLI execution, procustes prints per-cutoff stage logs (residue selection, detected gap ranges/lengths, Boltz command invocation, candidate scores) plus final summaries with kept residues, alanine-filled residues, elapsed time, and output file path.

For Boltz fill runs (--fill-method boltz), each candidate YAML includes a templates entry pointing to OUTPUT_DIR/_boltz/cutoff_<cutoff>_template.pdb (protein after short-gap restoration), with chain_id, template_id, force: true, and threshold so Boltz can enforce template guidance while modeling alanine bridge regions.

After each Boltz candidate model is generated, procustes aligns it to the cutoff template with MDAnalysis using only non-inserted residues (the original kept residues, excluding alanine bridge insertions), then grafts template coordinates for those non-gap residues before merging ligand atoms.

Integration reference workflow

The TYK2 end-to-end integration suite lives in tests/integration/test_tyk2_end_to_end.py and validates four compressed fixtures (ejm31, ejm42, jmc27, jmc28) by running the full CLI entrypoint in-process.

Reference artifacts are stored under tests/reference/<complex>/ as:

  • 9truncated.pdb (byte-for-byte comparison after stripping hydrogen records, to avoid OpenMM/PDBFixer hydrogen-placement nondeterminism)
  • summary.json (field-aware JSON comparison)

To regenerate these references intentionally (one-time baseline refresh), run:

uv run --extra dev python scripts/generate_tyk2_references.py

By default, integration test temporary directories are deleted. Set PROCUSTES_KEEP_ITEST_TMP=1 to retain them for debugging.

Development

Use the project dev environment with uv:

uv sync --extra dev

Run formatting and linting:

uv run --extra dev ruff format src tests scripts
uv run --extra dev ruff check src tests scripts

Run tests:

uv run --extra dev pytest -q

Run only the TYK2 integration tests:

cd tests/integration && pytest -q

PyPI Release

Tag-based releases use hatch-vcs dynamic versioning and upload wheel-only artifacts.

Prerequisites:

  • clean git working tree
  • local branch fully synced with upstream
  • ~/.pypirc configured for [pypi] credentials
  • git and uv available on PATH

Run:

python scripts/release_pypi.py X.Y.Z

The release script will:

  1. validate X.Y.Z format
  2. verify git cleanliness and upstream sync
  3. ensure tag does not already exist locally/remotely
  4. create annotated tag X.Y.Z
  5. build exactly one wheel into dist/ (no sdist)
  6. upload only that wheel via twine using ~/.pypirc pypi section
  7. push the release tag to origin

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

procustes-0.1.0-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file procustes-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: procustes-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for procustes-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 661b25b196736650f5a9a30e0afe0b6b631aa07142bf9f3c571c3cb4078e8229
MD5 19e6da3470fc3e59e303ec920008cb21
BLAKE2b-256 b63350f1fcc5084d59c8d132c540b9046b3d94600704f1415a4c9584bc4062bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page