Skip to main content

Denpex training SDK for SDC detection and checkpoint validation.

Project description

denpex-sdk

Zero-third-party Python helpers for SDC detection (per-layer weight/gradient norms, NaN/Inf checks, optimizer state validation) and asynchronous checkpoint validation (manifest integrity + content checksums) for S3/Lustre.

SDC detection

from denpex_sdk import SdcHook, install_sdc_hooks

hook = install_sdc_hooks(model, SdcHook(max_weight_norm=1e4, max_grad_norm=1e4))
weight_report = hook.inspect()
grad_report = hook.inspect_gradients()
opt_report = hook.inspect_optimizer(optimizer)

Checkpoint validation

from denpex_sdk import build_manifest, manifest_to_json, manifest_from_json, validate_checkpoint

manifest = build_manifest(["/checkpoints/step-1000/"], metadata={"job_id": "..."})
with open("/checkpoints/step-1000/manifest.json", "w") as handle:
    handle.write(manifest_to_json(manifest))

with open("/checkpoints/step-1000/manifest.json") as handle:
    expected = manifest_from_json(handle.read())

result = validate_checkpoint(["/checkpoints/step-1000/"], expected_manifest=expected)
assert result.valid

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

denpex_sdk-0.1.0.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

denpex_sdk-0.1.0-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file denpex_sdk-0.1.0.tar.gz.

File metadata

  • Download URL: denpex_sdk-0.1.0.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for denpex_sdk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b92b78d7b99fc8ac7cad2ab64bad270765d420a0fce6bf1266c637d0188eeea6
MD5 97d8d689e968b4ec6110433dc27a4ed1
BLAKE2b-256 bd3618c43e29bf56389dfefc747c62661884f621f9ff2a5fc24b0ff56f7db847

See more details on using hashes here.

File details

Details for the file denpex_sdk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: denpex_sdk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for denpex_sdk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2994f380507c8a6ff43fdb699f9d493617dbb365d73a38044df889d42837d9dd
MD5 818e1a74a773ab73df005bfaccc7daff
BLAKE2b-256 920325fc8395486d70745ec852d5d24f116a42b6edee08e802df1b21a7567f3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page