Skip to main content

Distill teacher chains-of-thought into a LoRA adapter via a strict boxed-answer format contract and a two-phase Train→Nudge schedule (silver-medal NVIDIA Nemotron reasoning recipe).

Project description

tracedistill — distill reasoning traces into a LoRA adapter (NVIDIA Nemotron silver medal)

CI PyPI Python License: MIT Kaggle Silver

tracedistill

Distill teacher chains-of-thought into a LoRA adapter — so a model re-derives every answer itself, where no code may run.

tracedistill is the generalized core of team VCDAD's silver-medal solution to the NVIDIA Nemotron Model Reasoning Challenge (65 / 4163, Top 1.6%), extracted into a small, tested library you can run on your own data. The medal-winning code is preserved verbatim in competition/ and pinned to this library byte-for-byte by golden tests.

Give it (problem, teacher chain-of-thought, answer) triples and it trains a LoRA adapter that reasons step-by-step and then emits a parseable \boxed{} — the recipe that wins "the grader can't run your code, the solving procedure has to live inside the model's chain-of-thought" tasks.


Why not just SFTTrainer on your traces?

Four design choices, each implemented as a library piece:

  1. A strict format contract (formatting.py). The SFT target is built byte-for-byte identical to the eval protocol — <think> … </think>\boxed{answer} — and the reasoning (from the teacher trace) is decoupled from the final answer (rewritten with the authoritative label). Train input ≈ eval input, so the model reliably boxes a correct answer instead of trailing off.
  2. Two-phase Train → Nudge (training.py). A hard, fast pass (high LR, clipping off) for broad coverage, then a tiny continuation (1/40 LR, cosine, clipping on) that squeezes the hard problem types while a balanced sprinkle of fresh easy data prevents catastrophic forgetting.
  3. Type-stratified batching (sampling.py). With a tiny effective batch, a naive shuffle can make a whole batch one problem type and swing the gradient. A round-robin "deal the cards" order keeps every effective batch type-balanced.
  4. Architecture-aware LoRA (lora.py). The competition base is a hybrid Mamba-2 + MoE model, so targets cover the SSM in_proj/out_proj and attention and MLP — the detail a vanilla Llama recipe misses.
flowchart LR
  D["CoT dataset<br/>prompt · cot · answer · type"] --> F["format contract<br/>&lt;think&gt;…&lt;/think&gt;\boxed{}"]
  F --> S["two-phase split<br/>(hard in both)"]
  S --> P1["Phase 1 · Train<br/>lr 2e-4 · clip off"]
  P1 --> P2["Phase 2 · Nudge<br/>lr 5e-6 · cosine · clip on"]
  P2 --> A["LoRA adapter"]

Install

pip install tracedistill            # light core: numpy / pandas / pyyaml
pip install "tracedistill[train]"   # + torch / transformers / trl / peft / datasets to train

The core (build_records, the stratified order, the split, target selection, config) is torch-free — it imports and unit-tests without a GPU stack.

60 seconds

import tracedistill as td

# Your data: a DataFrame (or list of dicts) with prompt / generated_cot / answer / type.
records, types = td.build_records(df)      # the <think>…</think>\boxed{} format contract
order = td.build_stratified_index_order(types, batch_size=8, seed=42)  # type-balanced order
targets = td.target_modules_from_model(model)   # attention + Mamba SSM + MLP, auto-detected

# The two non-overlapping training sets for Train → Nudge:
phase1_df, phase2_df = td.two_phase_split(df, hard_types=["cryptarithm_deduce"], seed=42)

Full two-phase training on an already-LoRA'd model:

from tracedistill import TwoPhaseConfig, PhaseConfig, train_two_phase

cfg = TwoPhaseConfig(hard_types=["cryptarithm_deduce", "cryptarithm_guess"],
                     phase1=PhaseConfig.train(), phase2=PhaseConfig.nudge())
train_two_phase(model, tokenizer, df, cfg)   # Phase 2 continues from Phase 1's weights

CLI

One YAML config drives an end-to-end run (load base model → architecture-aware LoRA → Train → Nudge → save / package the adapter):

tracedistill --cfg examples/configs/quickstart.yaml             # small single-GPU
tracedistill --cfg examples/configs/reproduce_competition.yaml  # the medal setup (Kaggle)

Measured: does distilling the trace actually help? (GSM8K, one RTX 4080)

examples/gsm8k_trace_distillation.py runs four arms on Qwen2.5-0.5B-Instruct + LoRA through the public API and scores boxed-answer accuracy on held-out GSM8K (greedy, parse \boxed{} exactly like a grader). The only difference between answer-only SFT and trace-distill is whether a reasoning trace sits between the <think> tags — so the gap isolates the value of distilling the trace.

GSM8K results: answer-only SFT collapses to 4.5%; trace distillation reaches ~100% parse rate and doubles hard-problem accuracy

arm boxed accuracy parse rate hard-problem acc (≥5 steps)
zero-shot (no training) 32.0% 64% 6.1%
answer-only SFT 4.5% 100% 6.1%
trace-distill, 1 phase 28.0% 100% 12.1%
trace-distill, 2 phase (Train→Nudge) 29.5% 99.5% 12.1%

Distil the trace, not the answer. Answer-only SFT learns to always emit a \boxed{} (100% parse rate) but, taught to skip the reasoning, collapses to 4.5%. Trace distillation keeps the reasoning intact and doubles accuracy on the hard, ≥5-step problems (6.1% → 12.1%) while lifting the parse rate from 64% to ~100%.

The Nudge rebalances. Phase 2 recovers the mid-tier (3–4-step: 24.8% → 28.7%) without giving up the hard-tier gain, edging 1-phase 28.0% → 29.5%.

Honest caveat. Qwen2.5-0.5B already has strong native GSM8K chain-of-thought, and the terse GSM8K solutions are a weak teacher, so overall accuracy stays just under the 32% zero-shot baseline — the wins here are reliability (≈100% parse), hard-slice accuracy (2×), and not destroying the model the way answer-only SFT does. On the competition's harder, code-derived puzzles — where the base model can't already solve them and a strong teacher trace exists — the same recipe took silver.

pip install "tracedistill[train]" datasets
python examples/gsm8k_trace_distillation.py        # ~1h on one RTX 4080 (16 GB)

The competition result

On the hidden test set, the two-phase recipe on Nemotron-3-Nano-30B-A3B reached a silver medal (65 / 4163, Top 1.6%). ~84% of the benchmark is "free" points that almost everyone clears (gravity, unit conversion, Roman numerals, ciphers); the ranking is decided by two hard families — cryptarithm and bit-manipulation — which is exactly what the two-phase Nudge and the hard/easy split target. See docs/solution.md, docs/dataset.md and docs/model-card.md for the full methodology (中文), and competition/ for the verbatim solution.

How it compares

vanilla SFTTrainer tracedistill
Target format freeform text strict <think>…</think>\boxed{} contract
Answer source as written in the trace decoupled — official label re-boxed
Schedule single pass two-phase Train → Nudge
Batching shuffle type-stratified round-robin
LoRA targets attention (+ MLP) + Mamba-2 SSM in_proj/out_proj

Provenance & validation

  • competition/ — the original silver-medal solution, unmodified.
  • tests/golden tests: tests/reference_impl.py holds verbatim copies of the competition's build_records / build_stratified_index_order, and the suite asserts tracedistill reproduces them byte-for-byte over hundreds of fuzzed cases. 46 tests, torch-free, run in well under a second.

中文简介

tracedistillVCDAD 队 Kaggle NVIDIA Nemotron Model Reasoning Challenge 银牌方案(65/4163,Top 1.6%) 的可复用内核,打包成一个带测试的开源库。给它 (题目, 教师思维链, 答案),它训练出一个 LoRA adapter,让模型自己一步步推理、再输出可解析的 \boxed{} —— 应对"评测端不能跑代码、解题算法必须蒸馏进思维链"的任务。

核心四件套:格式契约(<think>…</think>\boxed{} 与评测逐字一致,推理用上游 CoT、答案用官方标签解耦)、两阶段 Train→Nudge(大 lr 铺开 + 1/40 lr 精修难题防遗忘)、类型分层采样架构感知 LoRA(覆盖 Mamba 的 in_proj/out_proj)。原始竞赛代码逐字保留在 competition/,并由 tests/ 的 golden 测试逐字节锁定。完整中文方法论见 docs/

Citation

@misc{li2026tracedistill,
  title  = {tracedistill: Two-Phase LoRA Trace-Distillation for Reasoning Models},
  author = {Li, Daoyuan},
  year   = {2026},
  note   = {Silver medal (65/4163), NVIDIA Nemotron Model Reasoning Challenge},
  url    = {https://github.com/DaoyuanLi2816/tracedistill}
}

License

MIT. The license covers the code and documentation in this repository; it does not extend to the competition data or the base model, which remain under their respective terms (see data/README.md).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tracedistill-0.1.0.tar.gz (27.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tracedistill-0.1.0-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file tracedistill-0.1.0.tar.gz.

File metadata

  • Download URL: tracedistill-0.1.0.tar.gz
  • Upload date:
  • Size: 27.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tracedistill-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9ff0d5b55cc67ab9639c5367b938b5540723e5f899ef7812953e939c4c9412b5
MD5 41061ec77c7162472d2645b1e75504c1
BLAKE2b-256 e5a62e78d48d14d65506f773724906dc6fe3c153742c7413e540e8d53206874b

See more details on using hashes here.

Provenance

The following attestation bundles were made for tracedistill-0.1.0.tar.gz:

Publisher: release.yml on DaoyuanLi2816/tracedistill

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tracedistill-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tracedistill-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tracedistill-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a885446f49f28c3a0e279856793a2f384bb32117babd7a4cb5718cc65c333ab5
MD5 1f0dd304781009fe0f9538fd8b0059bc
BLAKE2b-256 cb6e1787e4d7a9d9a4fd146c490914107a037976b62350c4df94054748465169

See more details on using hashes here.

Provenance

The following attestation bundles were made for tracedistill-0.1.0-py3-none-any.whl:

Publisher: release.yml on DaoyuanLi2816/tracedistill

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page