A coding agent that learns from your corrections in real-time via on-policy self-distillation
Project description
continualcode
A coding agent that learns from your corrections in real-time. Built on Tinker.
When you deny a tool call with feedback, the model uses your correction as context to teach itself via on-policy self-distillation, takes a gradient step on LoRA, and retries with updated weights.
You: "fix the test"
Agent: write(test.py, ...) # overwrites the file
You: n → "use edit_lines; don't overwrite"
→ SDPO update runs immediately
→ agent retries with updated weights
Agent: edit_lines(test.py, 14, 17, ...)
You: y
Install
pip install continualcode
export TINKER_API_KEY=<your-key>
continualcode
How it works
Four feedback types, one training signal. Your correction becomes privileged context for a self-teacher (same model, richer input). Per-token KL between teacher and student = dense training signal — O(N) bits per correction, not O(1). One gradient step on LoRA, retry with updated weights.
Why not DPO / GRPO / PPO / SFT
DPO needs preference pairs and has no per-token credit. GRPO needs 64 samples per prompt — absurd UX for a CLI. PPO doubles memory with a critic network. SFT on corrections is off-policy and causes catastrophic forgetting. Self-distillation is the unique intersection: dense signal, on-policy, no extra models, mode-seeking stability.
Code layout
train.py— SDPO core: teacher prompt, logprob scoring, IS update, sampler refreshtui.py— interactive CLI: approve/deny/edit, correction prompt,/metricstools.py— tool implementations + structured feedbackbenchmarks/auto_train.py— automated training loop (LCB, multi-rollout GRPO + SDPO)demo/— deny → train → retry end-to-end
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file continualcode-0.6.0.tar.gz.
File metadata
- Download URL: continualcode-0.6.0.tar.gz
- Upload date:
- Size: 10.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f974e4e3a9ee9ee4a05bff4dddbc17ffe6afe2da0bfc765b829532b0e3b3f2e0
|
|
| MD5 |
238a2f073f330d5d2e6def09d76584c0
|
|
| BLAKE2b-256 |
d50c6325a31fc1cceba5ac58620edb116a85fca5a006ad7c445525a50dd03002
|
File details
Details for the file continualcode-0.6.0-py3-none-any.whl.
File metadata
- Download URL: continualcode-0.6.0-py3-none-any.whl
- Upload date:
- Size: 37.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0da9b056f14fc66580522ec31a04dfd25cef89e1541f994009f4ef0e367aa31a
|
|
| MD5 |
ea89be238a393b14b7318f7f422ae633
|
|
| BLAKE2b-256 |
8025c928a83cd9ef54596a74e4f53d07b7913184e660d64ed95afa5f1e410fa1
|