Skip to main content

Integration testing, streaming utilities, and repetition detection for distributed LLM inference on DGX Spark clusters

Project description

mypy and pytests black-lint Cumulative Clones

dgxarley

Tooling for the DGX Arley K3s inference cluster — integration tests, streaming utilities, and CLI entry points for SGLang, Ollama, and OpenWebUI services.

Heureka! — Qwen3-235B-A22B MoE (AWQ 4-bit) running distributed inference across both DGX Sparks:

235B AWQ Heureka

sglang-raw — Dual-panel SSE stream viewer

sglang-raw: rendered response + raw JSON chunks

Dual-panel Rich TUI for inspecting SGLang's OpenAI-compatible streaming API in real time. The top half renders the AI response as it arrives, while the bottom half displays the raw JSON SSE stream chunks — showing fields like chat_completion_chunk, choices, delta, finish_reason, and model. Useful for debugging streaming behaviour, verifying token delivery, and understanding the wire format of the API.

sglang-raw — Think/text token classification

sglang-raw: token table with think/text classification

Token-level stream inspection with per-chunk breakdown in a structured table. Columns show the token type (think vs text), content, finish reason, and cumulative token count — visualizing how reasoning tokens (from <think>...</think> blocks) are separated from the actual output tokens. This view helps when tuning thinking budgets, verifying reasoning_parser behaviour, or diagnosing unexpected token classification.

What's included

CLI tools

Command Description
sglang-raw Interactive SSE stream viewer with dual-panel Rich display (interpreted output + raw JSON chunks)
sglang-test Direct SGLang client with sequential and parallel load testing (live Rich TUI)
openwebui-test OpenWebUI / LLM client with preset management and streaming
ollama-test Ollama API health, model, embedding, and chat completions tests

Libraries

Module Description
dgxarley.integration.repetition_detector Offline n-gram, sentence, and loop repetition analysis for completed LLM outputs
dgxarley.integration.streaming_repetition_guard Real-time repetition detection for token streams with configurable thresholds

Installation

pip install dgxarley

Quick start

from dgxarley.integration.repetition_detector import detect_repetition

report = detect_repetition(llm_output)
print(report.summary())
# [LOW] score=0.12 — N-Gram 'this is a test' x2
from dgxarley.integration.streaming_repetition_guard import RepetitionGuard

guard = RepetitionGuard()
for chunk in llm_stream:
    token = chunk.choices[0].delta.content or ""
    result = guard.feed(token)
    if result.should_stop:
        print(f"STOP: {result.reason}")
        break

Requirements

  • Python >= 3.14

Source & documentation

Full documentation, network architecture, and Ansible playbooks: GitHub

License

This project is licensed under the LGPL where applicable/possible — see LICENSE.md. Some files/parts may use other licenses: MIT | GPL | LGPL. Always check per‑file headers/comments.

Authors

  • Repo owner (primary author)
  • Additional attributions are noted inline in code comments

Acknowledgments

  • Inspirations and snippets are referenced in code comments where appropriate.

⚠️ Note

This is a development/experimental project. For production use, review security settings, customize configurations, and test thoroughly in your environment. Provided "as is" without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software. Use at your own risk.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dgxarley-0.0.10.tar.gz (55.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dgxarley-0.0.10-py3-none-any.whl (67.9 kB view details)

Uploaded Python 3

File details

Details for the file dgxarley-0.0.10.tar.gz.

File metadata

  • Download URL: dgxarley-0.0.10.tar.gz
  • Upload date:
  • Size: 55.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.14.3 HTTPX/0.28.1

File hashes

Hashes for dgxarley-0.0.10.tar.gz
Algorithm Hash digest
SHA256 dbad23ce2d6ab62d9cf46392a7af6b66d554ee62e3809487a9e6c6c080f9ebae
MD5 b7dc60875ca8d9f29bc09952c6d2e9d7
BLAKE2b-256 cf55c80a879b9181cc1dbc2b7a2c8ac3f1d9d9890f2921931b964d09852cf807

See more details on using hashes here.

File details

Details for the file dgxarley-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: dgxarley-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 67.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.14.3 HTTPX/0.28.1

File hashes

Hashes for dgxarley-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 7ee730b5444745bf311f607b7d20345e23dc0aee22b9ffdd850260c256843e1b
MD5 9f5fca9ea323e85edad1f3d2cb2b32c8
BLAKE2b-256 21c163c084471b26aba9c0ce7f5140ac5f7f143a234b8330141ab5cff166ed35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page