Skip to main content

Download and quantize the HyperNix PyTorch model to GGUF (fp32/fp16/Q8_0/Q6_K/Q4_K_M).

Project description

hypernix logo

hypernix

PyPI Python License

End-to-end toolkit for the HyperNix family of PyTorch language models — the chat-tuned ray0rf1re/hyper-Nix.2 (current default) and the original ray0rf1re/hyper-nix.1 (still fully supported via the same preheat() / download_model() paths) — covering download, chat, fine-tune, evaluate, quantize, and ship.

Subsystem What it does
hypernix.download Pull snapshots from the Hub (short-name resolution, gated repos, offline cache).
hypernix.train HyperNixConfig, HyperNixModel, init_from_scratch, expand_checkpoint, train. Non-HyperNix archs route through AutoModelForCausalLM.
hypernix.old_oven CodeOven — ready-to-use wrapper around a snapshot: .complete(), .chat(), .fill(), .save_pt(). new_oven() spins a fresh one from the ARCH_PRESETS registry.
hypernix.old_fridge Memory housekeeping: freeze, unfreeze, parameter_stats, offload_to_cpu, chill_cache.
hypernix.mediocre_fridge Judge-training dataset generation — synthesize_judge_corpus, collect_responses_from.
hypernix.new_fridge Training-curve graphing — parse_training_log, plot_loss_curve, plot_score_distribution. Matplotlib installed lazily.
hypernix.new_range / old_range / industrial_range Labeling rubrics for mediocre_fridge.collect_responses_from: new_range is a zero-dep first-fail rubric, old_range is a scored rubric with explainability, industrial_range is the LLM-as-judge wrapper.
hypernix.freezer VRAM manager: OldFreezer (8-10 GB), NewFreezer (11 GB+), FlashFreezer (OOM-safe retry wrapper). Pascal (sm_61 / CUDA 6.1) helpers + 16 CPU presets (i7 7th-14th gen, Core Ultra Series 1 & 2) + 20 GPU presets (H100/H200, RTX A4500-A6000, RTX PRO Ada/Blackwell, 4070 Ti Super, 4080 Super, 1660 Ti, 2080/2080 Super/2080 Ti, 3080 Ti, 1080/1080 Ti).
hypernix.smoke_alarm Training-step planner & monitor. RadsAlarm (constants, lightest), GasAlarm (CPU/GPU presets), ModernAlarm (warmup-measured), AutoAlarm (selector). Plus storage_warning, mid-run check.
hypernix.pans 5-tier data preprocessing: FryingPanSaucePanSkilletGrillPanWok. Pair with sink.Sink.pour to write the output to disk.
hypernix.microwave 5-tier throwaway inference: defrostlow_zapzaphigh_zapchat_zap, plus reheat for continuing a prior output.
hypernix.table Dead-simple tabular viewer: from_training_log, from_judge_corpus, filter, select, show.
hypernix.sink Append-only file sink with optional rotation + dedupe.
hypernix.instant_pot brew(recipe) — one-shot end-to-end pipeline. Also available as hypernix brew recipe.json.
hypernix.coffee_maker 3 tiers (drip / french-press / percolator) + cold_brew type for long checkpointed runs.
hypernix.espresso_maker 4-tier evaluation: Ristretto / SingleShot / DoubleShot / Lungo — run a prompt battery, score, return shots.
hypernix.blender 4-tier multi-source mixing: HandBlender / PersonalBlender / CountertopBlender / HighPowerBlender.
hypernix.toaster 4-tier per-line formatting: TwoSliceToaster / FourSliceToaster / ConveyorToaster / ToasterOven.
hypernix.food_processor 4-tier bulk chunking: ChopBlade / SliceBlade / ShredBlade / PureeBlade.
hypernix.smoker 4-tier training quality: UseableSmoker / GoodSmoker / CommercialSmoker / HighQualitySmoker.
hypernix.deep_fryer 2-tier model-weight perturbation: LightFry (regulariser) / HeavyFry (severe, for bad-model negatives). In-place, reversible via snapshot.
hypernix.cake_pan Hybrid CPU + GPU training guard with NaN/Inf detection, wall-time watchdog, memory-pressure offload, and pristine-state rollback via BakeOff.
hypernix.salt_shaker 3-tier gentle data augmentation: FromTheBag / HandCrusher / PoshSaltDish.
hypernix.pepper_shaker 3-tier sharp perturbations: SmallShaker (MLM-style mask) / Dish (typos) / TallHandmade (negation).
hypernix.pressure_cooker Custom AdamW optimizer in 5 tiers: base PressureCooker + CPU (StovetopCooker, ElectricCooker) + GPU (InductionCooker, ProCooker) + universal_cooker selector. Grad accumulation, GradScaler integration, fused/foreach AdamW, optional CUDA-graph capture on Pro.
hypernix.lunchbox Consistent-schema dataset packager. Lunchbox.for_eval() pre-loads the recommended eval-results columns; pack(path) / push_to_hub(repo_id) routes through datasets.Dataset.from_list so the Parquet huggingface metadata stays coherent with the column set (fixes the CastError: column names don't match path in the Hub viewer).
hypernix.whisk Checkpoint averaging — swa_average (uniform mean), ema (exponential), geometric_mean. Accepts state dicts or paths to .pt / .safetensors. whisk_to_snapshot writes a full HF-style snapshot in one call.
hypernix.cutting_board Train / val / test splitting. CuttingBoard (deterministic random) + StratifiedBoard (preserves class distribution on labelled records). Renormalises ratios; writes per-split files with .slice_to_files().
hypernix.apron RNG-state guard. apron(seed=…) context manager snapshots Python random, NumPy (if installed), torch CPU and every CUDA device's RNG, optionally seeds, and restores the originals on exit.
hypernix.recipe_book Named-config registry. RecipeBook with add / get / save / load / cook(name, **overrides). cook dispatches by kind (instant_pot / cold_brew / espresso). Built-in recipes via RecipeBook.from_builtins().
hypernix.cookbook Chat-template registry. Built-in templates for chatml / hyper-nix.2 / llama3 / llama2 / alpaca / vicuna / plain. for_model(repo_id) picks the right one. Wired into CodeOven._format_chat so a hyper-Nix.2 snapshot Just Works.
hypernix.countertop Multi-turn chat session. Countertop(oven, system=…) with say(user) / reset() / save(path) / load(path). Auto-trims long histories, optional bell= for streaming, optional flour= for chat-quality cleanup.
hypernix.menu Named system-prompt registry: default / concise / code-helper / judge / creative / chef / hyper-nix. Pair with countertop(oven, persona="…") to pick a system prompt by name.
hypernix.bell Streaming-token + done-notification primitive. Bell.iter_chat(oven, messages) yields tokens; stream_chat collects and fires callbacks. stdout_bell() / file_bell(path) ship pre-wired. Also accepts a flour= for live logits processing.
hypernix.flour Chat-quality logits processor — repetition penalty, frequency / presence penalty, no-repeat n-gram, bad-word suppression, role-leak suppression (cuts hallucinated user: follow-ups), decoded-text stop-sequence detection. Flour.smart_default(template="hyper-nix.2") applies all of them in one knob. This is the bit that makes hypernix's chat surface better than raw transformers for chatting.
hypernix.torch_compat Portability shim (RMSNorm + SDPA) for running on old Intel Macs with torch 1.13. See wiki/macOS-legacy.md.
hypernix.convert Safetensors → GGUF at fp32/fp16. Architecture-agnostic tensor naming.
hypernix.quantize llama-quantize driver. v0.51.3 ships a 30-type QUANT_CATALOG (QuantSpec dataclass per type with bits-per-weight, category, recommendation) covering floats (F32 / F16 / BF16), legacy quants (Q4_0 / Q4_1 / Q5_0 / Q5_1 / Q8_0), k-quants (Q2_KQ6_K), and IQ-quants (IQ1_SIQ4_XS). Helpers: quant_recommended(), quant_by_category("k"), quant_for_size(target_bytes, fp16_bytes), quant_estimate_size(name, fp16_bytes), quant_resolve_spec("q4km").
hypernix.upload Push the produced artifacts back to a HuggingFace repo.

Cross-platform: Linux, macOS, Windows. Python 3.10 – 3.13.

Who this is actually for

Put bluntly, hypernix is shaped around one use case: a solo practitioner fine-tuning or building causal LMs on a consumer GPU, then publishing quantized GGUFs. The OldFreezer defaults, the Pascal helpers, the 8 – 10 GB tuning, the FlashFreezer OOM retry, the in-process llama-quantize fetch, the runtime auto-pip for fiddly deps — every design decision points at that workflow. If you're on an H100 cluster with a real trainer, you'd pick DeepSpeed or similar; if you're on a GTX 1080 / 2080 / 3060 trying to ship a QLoRA fine-tune to the Hub, this is extremely on-target.

It's also worth flagging what it's not: the train() loop is a smoke-tester, not a production trainer — it's explicitly single-device, no sharding, no mixed precision. Anything serious goes through a real framework. hypernix's value is in everything around that loop: the snapshot handling, the conversion pipeline, the memory plumbing, the labeling rubrics, the time budgeting, and the pressure_cooker optimizer for when you do want something with more opinion than stock AdamW.


Install

From PyPI:

pip install "hypernix[llama-cpp]"     # + bundled llama-cpp-python
pip install "hypernix[train]"         # + transformers, accelerate
pip install hypernix                  # core only

Need a specific torch build? Install torch first; pip will reuse it rather than replace it:

# CUDA 11.8 — old drivers, Pascal GPUs (GTX 1080 et al.)
pip install --index-url https://download.pytorch.org/whl/cu118 torch
pip install hypernix

# CUDA 12.x — modern default
pip install --index-url https://download.pytorch.org/whl/cu124 torch
pip install hypernix

# CPU-only
pip install --index-url https://download.pytorch.org/whl/cpu torch
pip install hypernix

# Old Intel Mac / torch 1.13 — the compat shim takes over.
pip install --index-url https://download.pytorch.org/whl/cpu 'torch==1.13.1'
pip install 'hypernix[legacy-torch]'

The main install_requires is torch>=1.13,<3 — 2.7+ is the recommended version (native nn.RMSNorm, fused SDPA), but 1.13+ works via hypernix.torch_compat. See wiki/macOS-legacy.md for the full story.

Sanity-check the environment:

hypernix doctor          # report
hypernix doctor --fix    # install missing runtime deps

Automatic dependency management can be disabled with HYPERNIX_AUTO_INSTALL=0.

Quickstart

Chat with any supported model

hypernix chat --repo-id nix2.5 --message "hello"
hypernix chat --repo-id qwen3.5-4b --message "explain rotary embeddings"
hypernix chat --repo-id gemma-4-e4b --message "write a haiku"

Short names resolve via KNOWN_MODELS; see Supported model families.

Convert a snapshot to GGUF

# Default: fp32 + fp16
hypernix --repo-id ray0rf1re/hyper-nix.1 --output-dir ./out

# Opt in to k-quants (needs llama-quantize)
hypernix --repo-id ray0rf1re/hyper-nix.1 --output-dir ./out \
    --quants fp32 fp16 q8_0 q6_k q4_k_m

Train HyperNix 1.5 (~92.1 M params) on a GTX 1080

python examples/train_hypernix_1_5_gtx1080.py \
    --dataset corpus.txt \
    --tokenizer-source ./hyper-nix-v1 \
    --out-dir ./hypernix-1.5 \
    --steps 2000 --batch-size 1 --context-length 1024

Auto-detects compute capability 6.x, forces fp16 (Pascal has no native bf16), disables TF32 / SDPA / torch.compile, and wraps the training loop in a FlashFreezer so OOMs pause-and-halve rather than crash. See wiki/Pascal.md for the full Pascal playbook.

Build a HyperNix 0.1.5 evaluator

python examples/train_hypernix_0_1_5_evaluator.py --out-dir ./eval

Synthesizes a judge-training corpus with mediocre_fridge, freezes embeddings with old_fridge, trains via oven.train, reloads with the other oven, plots the loss curve with new_fridge. Self-contained smoke test for every subsystem.

Python API tour

import hypernix
from hypernix import freezer, old_oven, old_fridge, mediocre_fridge, new_fridge

# 1) Auto-pick a VRAM strategy.  On a GTX 1080 this returns OldFreezer(fp16);
#    on a 3090 it returns NewFreezer(fp32 / bf16 on Ampere).
fz = freezer.flash_freezer(base=freezer.auto_freezer(), slow=True)

# 2) Preheat an oven from a short name (downloads on first call).
oven = old_oven.preheat(repo_id="nix2.5", device="cuda", dtype="float16")

# 3) Memory hygiene.
old_fridge.freeze(oven.model, patterns=("embed_tokens",))
print(old_fridge.parameter_stats(oven.model))

# 4) Training data.
dataset = mediocre_fridge.synthesize_judge_corpus(n=1024, out_path="judge.txt")

# 5) Train inside a FlashFreezer so OOMs don't blow up the run.
fz.guard(lambda: oven.train(dataset, "./trained", steps=500, batch_size=1))

# 6) Graph.
import pathlib
log = pathlib.Path("./trained/train.log").read_text()
new_fridge.plot_loss_curve(new_fridge.parse_training_log(log), "loss.png")

CLI reference

hypernix <subcommand> [options]

  all                   download -> convert -> [quantize]   (default)
  download              fetch a HuggingFace snapshot
  convert               produce fp32 / fp16 GGUF from a snapshot
  quantize              run llama-quantize on an fp16 / fp32 GGUF
  verify                read-validate a GGUF and print headers
  info                  package + optional GGUF header summary
  upload                push files to a HuggingFace repo
  doctor                environment diagnostic  (pass --fix to install deps)
  fetch-llama-quantize  pre-seed the llama-quantize cache
  train init            create a fresh HyperNix snapshot
  train expand          warm-start a bigger model from a smaller one
  train run             minimal causal-LM training loop
  generate              sample text from a local snapshot
  oven                  code-generation wrapper (preheat + complete / fill)
  chat                  interactive chat REPL against any supported model

Quant aliases accepted by --quants and hypernix quantize (v0.51.3 ships 49 aliases mapping to 30 distinct quant types — the table below shows the headline subset; hypernix.quant_list_types() returns the full list at runtime, and hypernix.QUANT_CATALOG[name] gives you the full QuantSpec for any one):

Alias llama.cpp enum bpw Recommended?
fp32, f32 F32 32.0 reference
fp16, f16 F16 16.0 ✓ baseline
bf16 BF16 16.0
q4_0, q4_1, q5_0, q5_1 Q4_0 / Q4_1 / Q5_0 / Q5_1 4.5 – 6.0 legacy
q8, q8_0 Q8_0 8.5 ✓ near-lossless
q2_k, q2_k_s, q3_k_s, q3_k_m, q3_k_l Q2_K … Q3_K_L 2.5 – 4.0
q4_k_s, q4km, q4_k_m Q4_K_S, Q4_K_M 4.5, 4.83 ✓ chat sweet spot
q5_k_s, q5km, q5_k_m Q5_K_S, Q5_K_M 5.5, 5.83
q6, q6_k Q6_K 6.56 ✓ near-fp16
iq1_s, iq1_m, iq2_*, iq3_*, iq4_nl, iq4_xs IQ1_S … IQ4_XS 1.56 – 4.5 imatrix-friendly

Supported model families

Short names (CLI & Python)

Pass any of these to hypernix chat --repo-id, old_oven.preheat, download_model, etc.

Family Short names
HyperNix hyper-nix.1, hyper-nix, hypernix, nano-nano-v4, nano-mini-6.99-v2, nano-nano-927-v3
Nix (ray0rf1re/nix collection) nix, nix2.5, nix2.6-m, nix2.6-mm, nix-2.7a, nix2.7, nix2.6
Llama 3.x llama-3.1-8b, llama-3.1-8b-instruct, llama-3.2-1b, llama-3.2-3b, llama-3.3-70b-instruct
Qwen 2.5 / 3 / 3.5 / 3.6 qwen2.5-*, qwen3-0.6b, qwen3-8b, qwen3.5-{0.8b,2b,4b,9b,27b,35b-a3b,122b-a10b,397b-a17b}, qwen3.6-35b-a3b
Gemma 2 / 3 / 4 gemma-2-{2b,9b,27b}, gemma-3-{1b,4b}, gemma-4-{e2b,e4b,26b-a4b,31b}
Phi 3 / 3.5 / 4 phi-3-mini, phi-3.5-mini, phi-4
DeepSeek deepseek-r1-distill-llama-8b, deepseek-r1-distill-qwen-7b, deepseek-v2-lite, deepseek-v3
GLM 4 / 5 / 5.1 glm-4-9b-chat, glm-4.1v, glm-5, glm-5.1, glm-5.1-fp8
Mistral / Mixtral mistral-7b-instruct, mixtral-8x7b-instruct
NVIDIA nemotron-4-15b, llama-3.1-nemotron-70b-instruct, mistral-nemo-12b
OpenAI gpt-oss gpt-oss-20b, gpt-oss-120b

The full registry lives in hypernix.KNOWN_MODELS.

ARCH_PRESETS (seeds for new_oven)

new_oven(arch="...", ...) spins a fresh, parametric model in the shape of any of these families:

  • hypernix, llama, llama3, llama3.1, llama3.2, llama3.3, llama4
  • qwen2, qwen2.5, qwen3, qwen3.5, qwen3.6
  • gemma, gemma2, gemma3, gemma4
  • mistral, phi3, phi4
  • glm4, glm5, glm5.1
  • deepseek, deepseek-r1, nemotron, gpt-oss / gptoss
  • nix, nix2

Presets are seeds for brand-new parametric models. Loading a pretrained checkpoint for any of these families works without a matching preset because non-HyperNix model_type values route through transformers.AutoModelForCausalLM.

Examples

Wiki / deep dives

Topic-focused reference guides live in the wiki/ directory:

How the GGUF pipeline works

  1. huggingface_hub.snapshot_download pulls weights + tokenizer files.
  2. The converter loads the state dict, infers dimensions from tensor shapes (so any HyperNix size works), and maps tensor names onto llama.cpp's canonical GGUF layout when a recognizable pattern matches (Llama, GPT-NeoX, GPT-2, nanoGPT). Unknown names round-trip verbatim.
  3. llama-quantize consumes the fp16 GGUF to produce each k-quant.

The CLI emits exactly one fp16 intermediate and reuses it for every k-quant in the plan.

Platform notes

  • Linux: full support, every distro tested on: (Ubuntu, Debian, Arch.)
  • macOS: Metal for inference, Homebrew for llama-quantize. (untested)
  • Windows: native support; doctor accepts Windows; llama-quantize auto-downloads Windows binaries; use scoop / chocolatey for system deps. (untested)
  • Pascal (GTX 1080 / 1080 Ti / Titan Xp): install torch from the CUDA 11.8 index first (see above). Use OldFreezer or auto_freezer(); pascal_safe_dtype() picks fp16. hypernix.freezer.pascal_mode_hints() returns the full Pascal cheat sheet.

Build / release

pip install build twine
python -m build
twine check --strict dist/*

Release tags (vX.Y.Z) fire .github/workflows/release.yml which publishes to PyPI via Trusted Publishing and attaches the wheel + sdist + an examples-scripts tarball + SHA256SUMS to a GitHub Release.

License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypernix-0.60.0.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hypernix-0.60.0-py3-none-any.whl (205.7 kB view details)

Uploaded Python 3

File details

Details for the file hypernix-0.60.0.tar.gz.

File metadata

  • Download URL: hypernix-0.60.0.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hypernix-0.60.0.tar.gz
Algorithm Hash digest
SHA256 9916918ee5f48f459ca034403aa6e1967f02422a52be5692c58a7a2ae9decd9b
MD5 f8f649a972935eaea709054c9dc41d56
BLAKE2b-256 32abfcd1541959da140450c8f1bd0bbd0b46b36f8658b0d5bee0910032b52c91

See more details on using hashes here.

Provenance

The following attestation bundles were made for hypernix-0.60.0.tar.gz:

Publisher: public-release.yml on minerofthesoal/HyperNix-pip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hypernix-0.60.0-py3-none-any.whl.

File metadata

  • Download URL: hypernix-0.60.0-py3-none-any.whl
  • Upload date:
  • Size: 205.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hypernix-0.60.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1912b8f7d0c8e8d269596344801687074964d3d0e654ecc01edce6ca01d6702d
MD5 41a118338b95db32e7dc07af95cfcd89
BLAKE2b-256 9187e8e2899f987090de83443d0aab80d9c286601a01132641ecd254ba10ef9a

See more details on using hashes here.

Provenance

The following attestation bundles were made for hypernix-0.60.0-py3-none-any.whl:

Publisher: public-release.yml on minerofthesoal/HyperNix-pip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page