Denpex training SDK for SDC detection and checkpoint validation.
Project description
denpex-sdk
Zero-third-party Python helpers for SDC detection (per-layer weight/gradient norms, NaN/Inf checks, optimizer state validation) and asynchronous checkpoint validation (manifest integrity + content checksums) for S3/Lustre.
SDC detection
from denpex_sdk import SdcHook, install_sdc_hooks
hook = install_sdc_hooks(model, SdcHook(max_weight_norm=1e4, max_grad_norm=1e4))
weight_report = hook.inspect()
grad_report = hook.inspect_gradients()
opt_report = hook.inspect_optimizer(optimizer)
Checkpoint validation
from denpex_sdk import build_manifest, manifest_to_json, manifest_from_json, validate_checkpoint
manifest = build_manifest(["/checkpoints/step-1000/"], metadata={"job_id": "..."})
with open("/checkpoints/step-1000/manifest.json", "w") as handle:
handle.write(manifest_to_json(manifest))
with open("/checkpoints/step-1000/manifest.json") as handle:
expected = manifest_from_json(handle.read())
result = validate_checkpoint(["/checkpoints/step-1000/"], expected_manifest=expected)
assert result.valid
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
denpex_sdk-0.1.0.tar.gz
(22.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file denpex_sdk-0.1.0.tar.gz.
File metadata
- Download URL: denpex_sdk-0.1.0.tar.gz
- Upload date:
- Size: 22.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b92b78d7b99fc8ac7cad2ab64bad270765d420a0fce6bf1266c637d0188eeea6
|
|
| MD5 |
97d8d689e968b4ec6110433dc27a4ed1
|
|
| BLAKE2b-256 |
bd3618c43e29bf56389dfefc747c62661884f621f9ff2a5fc24b0ff56f7db847
|
File details
Details for the file denpex_sdk-0.1.0-py3-none-any.whl.
File metadata
- Download URL: denpex_sdk-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2994f380507c8a6ff43fdb699f9d493617dbb365d73a38044df889d42837d9dd
|
|
| MD5 |
818e1a74a773ab73df005bfaccc7daff
|
|
| BLAKE2b-256 |
920325fc8395486d70745ec852d5d24f116a42b6edee08e802df1b21a7567f3c
|