Verified-on-Spark patterns lifted from the ai-field-notes blog into one importable Python package.
Project description
fieldkit
Verified-on-Spark patterns lifted from the ai-field-notes blog into one importable Python package.
Every essay in ai-field-notes ends with evidence/ — a folder of working code that produced the article's numbers. After 30+ articles the same patterns kept reappearing: the same NIM client wrapper, the same chunk-embed-store dance, the same bench harness, the same verifier-loop math. fieldkit is what those evidence/ folders look like once the boilerplate is lifted into a real package.
The blog stays the long-form rationale. fieldkit is the pip install-able surface so you can reproduce — and extend — the work without re-pasting 80 lines of NIM-client setup per article.
Install
pip install fieldkit
For the bleeding edge between releases, install from the git tag instead:
pip install "git+https://github.com/manavsehgal/ai-field-notes.git@fieldkit/v0.2.0#subdirectory=fieldkit"
Quickstart
from fieldkit.nim import NIMClient
client = NIMClient(base_url="http://localhost:8000/v1", model="meta/llama-3.1-8b-instruct")
print(client.chat([{"role": "user", "content": "Hello, Spark."}]))
What's in v0.2.0
| Module | Purpose | Source articles |
|---|---|---|
fieldkit.capabilities |
Typed Python facade over spark-capabilities.json — KV cache math, weight bytes, inference envelope. |
kv-cache-arithmetic-at-inference, gpu-sizing-math-for-fine-tuning |
fieldkit.nim |
OpenAI-compatible NIM client wrapper with retry, chunking, and the 8192-token context guard. | nim-first-inference-dgx-spark and friends |
fieldkit.rag |
Pipeline(embed_url, rerank_url, pgvector_dsn, generator) — ingest → retrieve → rerank → fuse. |
naive-rag-on-spark and friends |
fieldkit.eval |
Bench, Judge, Trajectory — plus v0.2's AssertionGrader, PassAtK, AgentRun, MatchedBaseComparison. |
every article with a bench.py or benchmark.py, plus clawgym-on-spark, autoresearchbench-on-spark, pass-at-k-after-the-seventh-patch |
fieldkit.training (new in v0.2) |
LoraReferenceSnapshot (sidesteps peft 0.19's offloader bug), WeightDeltaTracker — for any RL or SFT loop. Lazy torch import; pure-inference envs don't pay. |
clawgym-on-spark-grpo |
fieldkit.cli |
fieldkit bench rag, fieldkit feasibility <id>, fieldkit envelope <size>. |
discoverability |
What v0.2 adds
fieldkit.training— new module.LoraReferenceSnapshotis a CPU-resident snapshot of a peft adapter's LoRA tensors plus a context manager that swaps the snapshot in for one no-grad forward pass and restores trainable weights on exit. Solves a real peft 0.19 bug:model.load_adapter(adapter_name="reference", is_trainable=False)crashes withKeyErrorunderdevice_map="auto"whenever the GPU has anything else resident — peft's offload-detection over-triggers on Spark unified memory.WeightDeltaTrackeris a pre/post snapshot of trainable params with L2 + max|Δ| reporting — sanity-check that any fine-tuning step actually moved weights.fieldkit.eval.AssertionGrader— pure-function grader over five file-system assertion primitives (file_exists,file_not_exists,file_contents_contain,file_contents_match_regex,file_unchanged). Lifted fromclawgym-on-spark's deterministic grader; no LLM, no fuzzy matching.fieldkit.eval.PassAtK+pass_at_k_estimator— verifier-loop with the Chen 2021 unbiased pass@k estimator (lower variance than the naive1 - (1-p)^kfor finite n).fieldkit.eval.AgentRun+TurnDetail+summarize_agent_runs— per-question agent-bench schema with overrideable field-name path tuples for non-AutoResearchBench layouts.fieldkit.eval.MatchedBaseComparison+GroupStats— two-rollout B−A driver with per-group and per-assertion-kind delta and a markdown.report(). Reusable for any LoRA / adapter ablation, fine-tuned-vs-base, or system-prompt-A-vs-B comparison.
Deferred to v0.3+: fieldkit.agents (Persona / WorkspaceSeed / SynthTask / TaskAuthor / Sandbox / RolloutDriver / Trajectory + TurnRecord — 7 symbols), fieldkit.inference.VLLMClient, and replay_messages_from_trajectory. Each needs a second consuming article before its public API locks.
Hardware
Every code path is verified on a DGX Spark (GB10, 128 GB unified memory, NIM 8B + embed NIM + pgvector co-resident). fieldkit.training's torch + safetensors imports are lazy, so the package costs nothing on inference-only boxes — install torch and safetensors yourself in the training environment when you need the training primitives. NeMo / Triton / pytorch-base containers ship them; pure-inference envs don't.
Portability to non-Spark CUDA 12.x boxes lands when there's demand.
License
Apache-2.0. See LICENSE.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fieldkit-0.2.0.post1.tar.gz.
File metadata
- Download URL: fieldkit-0.2.0.post1.tar.gz
- Upload date:
- Size: 91.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9af04ae5f21af6f259ed49d1bf79d480df1b568eace518e63da535fcfc51f704
|
|
| MD5 |
27356691a641323604a38ddf85fcbfcc
|
|
| BLAKE2b-256 |
b17ee7c18b35ec5011b450074018bd3ca75e73f352596ead72866e8a0b029e4a
|
File details
Details for the file fieldkit-0.2.0.post1-py3-none-any.whl.
File metadata
- Download URL: fieldkit-0.2.0.post1-py3-none-any.whl
- Upload date:
- Size: 54.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fbf6a6515ee3d44d3f665444e77f794335622bce3b1e0c88d164b2bb9af04c4
|
|
| MD5 |
2d0db06a3977331db75b592c089e8c83
|
|
| BLAKE2b-256 |
b75575a97e02560b7e66d8d9d9b9a0d7d925bc194458db557d4881a666c9e280
|