Download and quantize the HyperNix PyTorch model to GGUF (fp32/fp16/Q8_0/Q6_K/Q4_K_M).
Project description
hypernix
End-to-end toolkit for the ray0rf1re/hyper-nix.1 family of PyTorch
language models.
hypernix started as a one-shot GGUF converter and has since grown into
a small but opinionated framework covering the whole lifecycle — from a
blank directory to a trained, quantized, uploaded HuggingFace snapshot.
Every subsystem is a plain Python module you can pick up and use in
isolation.
| Subsystem | What it does |
|---|---|
hypernix.download |
Pull snapshots from the Hub (short-name resolution, gated repos, offline cache). |
hypernix.train |
HyperNixConfig, HyperNixModel, init_from_scratch, expand_checkpoint, train. Non-HyperNix archs route through AutoModelForCausalLM. |
hypernix.old_oven |
CodeOven — ready-to-use wrapper around a snapshot: .complete(), .chat(), .fill(), .save_pt(). new_oven() spins a fresh one from the ARCH_PRESETS registry. |
hypernix.old_fridge |
Memory housekeeping: freeze, unfreeze, parameter_stats, offload_to_cpu, chill_cache. |
hypernix.mediocre_fridge |
Judge-training dataset generation — synthesize_judge_corpus, collect_responses_from. |
hypernix.new_fridge |
Training-curve graphing — parse_training_log, plot_loss_curve, plot_score_distribution. Matplotlib installed lazily. |
hypernix.new_range / old_range / industrial_range |
Labeling rubrics for mediocre_fridge.collect_responses_from: new_range is a zero-dep first-fail rubric, old_range is a scored rubric with explainability, industrial_range is the LLM-as-judge wrapper. |
hypernix.freezer |
VRAM manager: OldFreezer (8-10 GB), NewFreezer (11 GB+), FlashFreezer (OOM-safe retry wrapper). Pascal (sm_61 / CUDA 6.1) helpers + 16 CPU presets (i7 7th-14th gen, Core Ultra Series 1 & 2) + 20 GPU presets (H100/H200, RTX A4500-A6000, RTX PRO Ada/Blackwell, 4070 Ti Super, 4080 Super, 1660 Ti, 2080/2080 Super/2080 Ti, 3080 Ti, 1080/1080 Ti). |
hypernix.smoke_alarm |
Training-step planner & monitor. RadsAlarm (constants, lightest), GasAlarm (CPU/GPU presets), ModernAlarm (warmup-measured), AutoAlarm (selector). Plus storage_warning, mid-run check. |
hypernix.convert |
Safetensors → GGUF at fp32/fp16. Architecture-agnostic tensor naming. |
hypernix.quantize |
llama-quantize driver for Q8_0, Q6_K, Q4_K_M, Q5_K_M. |
hypernix.upload |
Push the produced artifacts back to a HuggingFace repo. |
Cross-platform: Linux, macOS, Windows. Python 3.10 – 3.13.
Install
From PyPI:
pip install "hypernix[llama-cpp]" # + bundled llama-cpp-python
pip install "hypernix[train]" # + transformers, accelerate
pip install hypernix # core only
Need a specific torch build? Install torch first; pip will reuse it rather than replace it:
# CUDA 11.8 — old drivers, Pascal GPUs (GTX 1080 et al.)
pip install --index-url https://download.pytorch.org/whl/cu118 torch
pip install hypernix
# CUDA 12.x — modern default
pip install --index-url https://download.pytorch.org/whl/cu124 torch
pip install hypernix
# CPU-only
pip install --index-url https://download.pytorch.org/whl/cpu torch
pip install hypernix
Sanity-check the environment:
hypernix doctor # report
hypernix doctor --fix # install missing runtime deps
Automatic dependency management can be disabled with
HYPERNIX_AUTO_INSTALL=0.
Quickstart
Chat with any supported model
hypernix chat --repo-id nix2.5 --message "hello"
hypernix chat --repo-id qwen3.5-4b --message "explain rotary embeddings"
hypernix chat --repo-id gemma-4-e4b --message "write a haiku"
Short names resolve via KNOWN_MODELS; see
Supported model families.
Convert a snapshot to GGUF
# Default: fp32 + fp16
hypernix --repo-id ray0rf1re/hyper-nix.1 --output-dir ./out
# Opt in to k-quants (needs llama-quantize)
hypernix --repo-id ray0rf1re/hyper-nix.1 --output-dir ./out \
--quants fp32 fp16 q8_0 q6_k q4_k_m
Train HyperNix 1.5 (~92.1 M params) on a GTX 1080
python examples/train_hypernix_1_5_gtx1080.py \
--dataset corpus.txt \
--tokenizer-source ./hyper-nix-v1 \
--out-dir ./hypernix-1.5 \
--steps 2000 --batch-size 1 --context-length 1024
Auto-detects compute capability 6.x, forces fp16 (Pascal has no native
bf16), disables TF32 / SDPA / torch.compile, and wraps the training
loop in a FlashFreezer so OOMs pause-and-halve rather than crash. See
wiki/Pascal.md for the full Pascal playbook.
Build a HyperNix 0.1.5 evaluator
python examples/train_hypernix_0_1_5_evaluator.py --out-dir ./eval
Synthesizes a judge-training corpus with mediocre_fridge, freezes
embeddings with old_fridge, trains via oven.train, reloads with the
other oven, plots the loss curve with new_fridge. Self-contained
smoke test for every subsystem.
Python API tour
import hypernix
from hypernix import freezer, old_oven, old_fridge, mediocre_fridge, new_fridge
# 1) Auto-pick a VRAM strategy. On a GTX 1080 this returns OldFreezer(fp16);
# on a 3090 it returns NewFreezer(fp32 / bf16 on Ampere).
fz = freezer.flash_freezer(base=freezer.auto_freezer(), slow=True)
# 2) Preheat an oven from a short name (downloads on first call).
oven = old_oven.preheat(repo_id="nix2.5", device="cuda", dtype="float16")
# 3) Memory hygiene.
old_fridge.freeze(oven.model, patterns=("embed_tokens",))
print(old_fridge.parameter_stats(oven.model))
# 4) Training data.
dataset = mediocre_fridge.synthesize_judge_corpus(n=1024, out_path="judge.txt")
# 5) Train inside a FlashFreezer so OOMs don't blow up the run.
fz.guard(lambda: oven.train(dataset, "./trained", steps=500, batch_size=1))
# 6) Graph.
import pathlib
log = pathlib.Path("./trained/train.log").read_text()
new_fridge.plot_loss_curve(new_fridge.parse_training_log(log), "loss.png")
CLI reference
hypernix <subcommand> [options]
all download -> convert -> [quantize] (default)
download fetch a HuggingFace snapshot
convert produce fp32 / fp16 GGUF from a snapshot
quantize run llama-quantize on an fp16 / fp32 GGUF
verify read-validate a GGUF and print headers
info package + optional GGUF header summary
upload push files to a HuggingFace repo
doctor environment diagnostic (pass --fix to install deps)
fetch-llama-quantize pre-seed the llama-quantize cache
train init create a fresh HyperNix snapshot
train expand warm-start a bigger model from a smaller one
train run minimal causal-LM training loop
generate sample text from a local snapshot
oven code-generation wrapper (preheat + complete / fill)
chat interactive chat REPL against any supported model
Quant aliases accepted by --quants and hypernix quantize:
| Alias | llama.cpp enum |
|---|---|
fp32, f32 |
F32 |
fp16, f16 |
F16 |
q8, q8_0 |
Q8_0 |
q6, q6_k |
Q6_K |
q4km, q4_k_m |
Q4_K_M |
q5km, q5_k_m |
Q5_K_M |
Supported model families
Short names (CLI & Python)
Pass any of these to hypernix chat --repo-id, old_oven.preheat,
download_model, etc.
| Family | Short names |
|---|---|
| HyperNix | hyper-nix.1, hyper-nix, hypernix, nano-nano-v4, nano-mini-6.99-v2, nano-nano-927-v3 |
| Nix (ray0rf1re/nix collection) | nix, nix2.5, nix2.6-m, nix2.6-mm, nix-2.7a, nix2.7, nix2.6 |
| Llama 3.x | llama-3.1-8b, llama-3.1-8b-instruct, llama-3.2-1b, llama-3.2-3b, llama-3.3-70b-instruct |
| Qwen 2.5 / 3 / 3.5 / 3.6 | qwen2.5-*, qwen3-0.6b, qwen3-8b, qwen3.5-{0.8b,2b,4b,9b,27b,35b-a3b,122b-a10b,397b-a17b}, qwen3.6-35b-a3b |
| Gemma 2 / 3 / 4 | gemma-2-{2b,9b,27b}, gemma-3-{1b,4b}, gemma-4-{e2b,e4b,26b-a4b,31b} |
| Phi 3 / 3.5 / 4 | phi-3-mini, phi-3.5-mini, phi-4 |
| DeepSeek | deepseek-r1-distill-llama-8b, deepseek-r1-distill-qwen-7b, deepseek-v2-lite, deepseek-v3 |
| GLM 4 / 5 / 5.1 | glm-4-9b-chat, glm-4.1v, glm-5, glm-5.1, glm-5.1-fp8 |
| Mistral / Mixtral | mistral-7b-instruct, mixtral-8x7b-instruct |
| NVIDIA | nemotron-4-15b, llama-3.1-nemotron-70b-instruct, mistral-nemo-12b |
| OpenAI gpt-oss | gpt-oss-20b, gpt-oss-120b |
The full registry lives in hypernix.KNOWN_MODELS.
ARCH_PRESETS (seeds for new_oven)
new_oven(arch="...", ...) spins a fresh, parametric model in the
shape of any of these families:
hypernix,llama,llama3,llama3.1,llama3.2,llama3.3,llama4qwen2,qwen2.5,qwen3,qwen3.5,qwen3.6gemma,gemma2,gemma3,gemma4mistral,phi3,phi4glm4,glm5,glm5.1deepseek,deepseek-r1,nemotron,gpt-oss/gptossnix,nix2
Presets are seeds for brand-new parametric models. Loading a
pretrained checkpoint for any of these families works without a matching
preset because non-HyperNix model_type values route through
transformers.AutoModelForCausalLM.
Examples
examples/quickstart.py— 5-line Python API demo.examples/custom_arch.py— arbitrary-size HyperNix.examples/upload_to_hub.py— publish to the Hub.examples/train_hypernix_0_1_5_evaluator.py— tiny evaluator demo wiring ovens + all three fridges.examples/train_hypernix_1_5_gtx1080.py— production-shape 92.1 M model trained on an 8 GB Pascal card.
Wiki / deep dives
Topic-focused reference guides live in the wiki/ directory:
wiki/Home.md— indexwiki/Ovens.md—old_oven/new_ovenreferencewiki/Fridges.md—old_fridge/mediocre_fridge/new_fridgewiki/Ranges.md—new_range/old_range/industrial_range(labeling rubrics)wiki/Freezer.md— VRAM manager (OldFreezer / NewFreezer / FlashFreezer)wiki/Alarms.md— smoke alarms (Rads / Gas / Modern / Auto) + CPU / GPU preset tableswiki/Pascal.md— CUDA 6.1 / GTX 1080 playbookwiki/Architectures.md— ARCH_PRESETS and KNOWN_MODELSwiki/Training.md— scratch training, expansion, and fine-tuning flowswiki/CLI.md— full CLI cheat sheetwiki/Quantization.md— GGUF conversion + k-quant pipeline
How the GGUF pipeline works
huggingface_hub.snapshot_downloadpulls weights + tokenizer files.- The converter loads the state dict, infers dimensions from tensor shapes (so any HyperNix size works), and maps tensor names onto llama.cpp's canonical GGUF layout when a recognizable pattern matches (Llama, GPT-NeoX, GPT-2, nanoGPT). Unknown names round-trip verbatim.
llama-quantizeconsumes the fp16 GGUF to produce each k-quant.
The CLI emits exactly one fp16 intermediate and reuses it for every k-quant in the plan.
Platform notes
- Linux: full support, every distro tested (Ubuntu, Debian, Arch, Fedora, openSUSE, Alpine, NixOS).
- macOS: Metal for inference, Homebrew for
llama-quantize. - Windows: native support; doctor accepts Windows;
llama-quantizeauto-downloads Windows binaries; use scoop / chocolatey for system deps. - Pascal (GTX 1080 / 1080 Ti / Titan Xp): install torch from the CUDA 11.8 index first (see above). Use
OldFreezerorauto_freezer();pascal_safe_dtype()picks fp16.hypernix.freezer.pascal_mode_hints()returns the full Pascal cheat sheet.
Build / release
pip install build twine
python -m build
twine check --strict dist/*
Release tags (vX.Y.Z) fire .github/workflows/release.yml which
publishes to PyPI via Trusted Publishing and attaches the wheel +
sdist + an examples-scripts tarball + SHA256SUMS to a GitHub
Release.
License
Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hypernix-0.43.0.tar.gz.
File metadata
- Download URL: hypernix-0.43.0.tar.gz
- Upload date:
- Size: 147.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03333ba8607841626eb40abf1a8d8b586417f80a1fe8187a9efcd86d55c83305
|
|
| MD5 |
076953a83923a6dccb37ab66270153b6
|
|
| BLAKE2b-256 |
280888aa6fd40ce93addc70cb9b340d13171b58a5bbc3c72ff812e5263612f04
|
Provenance
The following attestation bundles were made for hypernix-0.43.0.tar.gz:
Publisher:
public-release.yml on minerofthesoal/HyperNix-pip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hypernix-0.43.0.tar.gz -
Subject digest:
03333ba8607841626eb40abf1a8d8b586417f80a1fe8187a9efcd86d55c83305 - Sigstore transparency entry: 1355932999
- Sigstore integration time:
-
Permalink:
minerofthesoal/HyperNix-pip@571325d0513103ea194a2801f2abc9e81c72ad68 -
Branch / Tag:
refs/heads/claude/pytorch-quantization-package-cJMQp - Owner: https://github.com/minerofthesoal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
public-release.yml@571325d0513103ea194a2801f2abc9e81c72ad68 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file hypernix-0.43.0-py3-none-any.whl.
File metadata
- Download URL: hypernix-0.43.0-py3-none-any.whl
- Upload date:
- Size: 92.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc2d3134c69482b180edfb6fcc404c63b292cf3061edeb3f355f4f2c32a8099e
|
|
| MD5 |
781ca3b965521da51b66b12a9b5b19c4
|
|
| BLAKE2b-256 |
7ce573f2252b9213debe9528ff2840753ddce59c7f70baa039249ea3a773cfa4
|
Provenance
The following attestation bundles were made for hypernix-0.43.0-py3-none-any.whl:
Publisher:
public-release.yml on minerofthesoal/HyperNix-pip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hypernix-0.43.0-py3-none-any.whl -
Subject digest:
fc2d3134c69482b180edfb6fcc404c63b292cf3061edeb3f355f4f2c32a8099e - Sigstore transparency entry: 1355933002
- Sigstore integration time:
-
Permalink:
minerofthesoal/HyperNix-pip@571325d0513103ea194a2801f2abc9e81c72ad68 -
Branch / Tag:
refs/heads/claude/pytorch-quantization-package-cJMQp - Owner: https://github.com/minerofthesoal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
public-release.yml@571325d0513103ea194a2801f2abc9e81c72ad68 -
Trigger Event:
workflow_dispatch
-
Statement type: