Skip to main content

CSP5: pip-installable CASCADE NMR predictor (13C + 1H baselines).

Project description

CSP5

CSP5 is a pip-installable CASCADE predictor package with:

  • batched 13C and 1H prediction
  • prediction from precomputed geometries (no re-embedding)
  • shift matching utilities with dp (default), scipy, and murty (k-best)

Bundled defaults:

  • 13C model: CSP5 base (13C) (model_id: csp5-base-13c)
  • 1H model: CSP5 base (1H) (model_id: csp5-base-1h)

Install

pip install CSP5

Prediction CLI

In interactive terminals, csp5 prints status lines to stderr before and after prediction. If a run is slow, it prints an additional note that first invocation can take longer while dependencies and model weights initialize, plus periodic "still working" updates during long runs. Use --no-status to silence them.

From SMILES

csp5 --smiles "CCO" --nucleus 1H
csp5 --smiles-file smiles.txt --nucleus 13C --batch-size 64

From precomputed geometries (parquet structures dataset)

Input dataset requirements:

  • required columns: smiles, molblock
  • optional columns: conformer_rank, conformer_id, energy, energy_method

Predict only rank-0 conformers:

csp5 \
  --structures-path /path/to/structures.parquet \
  --conformer-rank 0 \
  --nucleus 1H \
  --batch-size 64

Predict using all conformers in the dataset:

csp5 \
  --structures-path /path/to/structures.parquet \
  --use-all-conformers \
  --nucleus 13C

Prediction Python API

from csp5 import predict_smiles, predict_structures, predict_sdf

# Standard SMILES mode
res = predict_smiles(["CCO", "c1ccccc1"], nucleus="1H", batch_size=32)
print(res.predictions.head())

# Precomputed-geometry parquet mode
res2 = predict_structures(
    "/path/to/structures.parquet",
    nucleus="1H",
    conformer_rank=0,
    use_all_conformers=False,
)

# Precomputed-geometry SDF mode
res3 = predict_sdf("/path/to/embedded.sdf", nucleus="13C")

Matching CLI

csp5-match expects one shift per line in each file.

Default fast path (dp)

csp5-match \
  --predicted-file predicted.txt \
  --experimental-file experimental.txt \
  --solver dp

SciPy Hungarian option

csp5-match \
  --predicted-file predicted.txt \
  --experimental-file experimental.txt \
  --solver scipy

Murty k-best option

csp5-match \
  --predicted-file predicted.txt \
  --experimental-file experimental.txt \
  --solver murty \
  --k-best-policy clip \
  --k-best 25 \
  --temperature 0.5 \
  --mae-delta-threshold 0.2

Matching Python API

from csp5 import match_shifts

pred = [7.35, 7.30, 1.25]
exp = [7.34, 7.31, 1.20]

# DP (default)
r1 = match_shifts(pred, exp, solver="dp")

# SciPy Hungarian
r2 = match_shifts(pred, exp, solver="scipy")

# Murty k-best
r3 = match_shifts(pred, exp, solver="murty", k_best=10, k_best_policy="clip")
print(r3.assignment_entropy, r3.num_competing_assignments)

Solver Notes

  • dp is the default and is intended for the standard 1D shift objective.
  • scipy uses Hungarian assignment on the full padded cost matrix.
  • murty is the k-best solver; use this when you need assignment ambiguity analysis.
  • For murty, k_best_policy="clip" (default) returns all feasible unique assignments when k_best is larger than what exists. Use k_best_policy="strict" to fail instead.
  • dp and scipy are top-1 only (k_best must be 1).

Output Notes

  • Prediction failures are returned explicitly (failures) with reason tags.
  • Prediction output always includes nucleus, model_id, and model_name.
  • For structures-mode predictions, conformer metadata columns are propagated when available.

Release

Local macOS wheel build

From repo root:

cd deploy/CSP5
rm -rf dist build *.egg-info
MACOSX_DEPLOYMENT_TARGET=11.0 uvx --from build pyproject-build --wheel
uvx --from twine twine check dist/*
uvx --from twine twine upload --repository pypi --skip-existing dist/*.whl

MACOSX_DEPLOYMENT_TARGET=11.0 keeps wheel tags broadly compatible (for example, macosx_11_0_arm64) instead of pinning to the host macOS version.

Cross-platform publishing (Linux + macOS)

Use GitHub Actions workflow:

  • file: .github/workflows/release-csp5.yml
  • trigger:
    • push a tag like csp5-v0.2.5 (build + publish), or
    • run manually with publish=true
  • required repo secret: PYPI_API_TOKEN

The workflow builds:

  • Linux manylinux x86_64 wheels for Python 3.10, 3.11, 3.12, and 3.13
  • macOS arm64 wheels for Python 3.10, 3.11, 3.12, and 3.13
  • one source distribution (sdist)

Then it uploads all artifacts to PyPI in one step.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csp5-0.2.5.tar.gz (34.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csp5-0.2.5-cp310-cp310-manylinux_2_24_x86_64.whl (34.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64

File details

Details for the file csp5-0.2.5.tar.gz.

File metadata

  • Download URL: csp5-0.2.5.tar.gz
  • Upload date:
  • Size: 34.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for csp5-0.2.5.tar.gz
Algorithm Hash digest
SHA256 5b55fece941fe95758a52d3aedd10ba86214d82a10e0a1444da1fbfa2ace01f5
MD5 5e24665f027c5e4ad50287b8989c5ecd
BLAKE2b-256 62aee90834fea9c6e5708dc3426f56860adfc23a75f2f05551052af2630196be

See more details on using hashes here.

File details

Details for the file csp5-0.2.5-cp310-cp310-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for csp5-0.2.5-cp310-cp310-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 f1014e3c68bd3f1e01111bae5b46773b34d1f569040af4f9dddbee62f0138c0f
MD5 eeac556ae091b64461b81132f638db94
BLAKE2b-256 98441d6a34215de7b10ca78d15977b1c6611a562b6b8f8bce059fa87bbe864a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page