Skip to main content

A coding agent that learns from your corrections in real-time via on-policy self-distillation

Project description

continualcode

A coding agent that learns from your corrections in real-time. Built on Tinker.

When you deny a tool call with feedback, the model uses your correction as context to teach itself via on-policy self-distillation, takes a gradient step on LoRA, and retries with updated weights.

You: "fix the test"
Agent: write(test.py, ...)       # overwrites the file
You: n → "use edit_lines; don't overwrite"
  → SDPO update runs immediately
  → agent retries with updated weights
Agent: edit_lines(test.py, 14, 17, ...)
You: y

Install

pip install continualcode
export TINKER_API_KEY=<your-key>
continualcode

How it works

Four feedback types, one training signal. Your correction becomes privileged context for a self-teacher (same model, richer input). Per-token KL between teacher and student = dense training signal — O(N) bits per correction, not O(1). One gradient step on LoRA, retry with updated weights.

Full explanation →

Why not DPO / GRPO / PPO / SFT

DPO needs preference pairs and has no per-token credit. GRPO needs 64 samples per prompt — absurd UX for a CLI. PPO doubles memory with a critic network. SFT on corrections is off-policy and causes catastrophic forgetting. Self-distillation is the unique intersection: dense signal, on-policy, no extra models, mode-seeking stability.

Full reasoning →

Code layout

  • train.py — SDPO core: teacher prompt, logprob scoring, IS update, sampler refresh
  • tui.py — interactive CLI: approve/deny/edit, correction prompt, /metrics
  • tools.py — tool implementations + structured feedback
  • benchmarks/auto_train.py — automated training loop (LCB, multi-rollout GRPO + SDPO)
  • demo/ — deny → train → retry end-to-end

References

  • SDPO — Hübotter et al. 2026
  • SDFT — Shenfeld, Damani, Guestrin 2026
  • GKD — Agarwal et al. 2023
  • Tinker — training API

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

continualcode-0.6.0.tar.gz (10.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

continualcode-0.6.0-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file continualcode-0.6.0.tar.gz.

File metadata

  • Download URL: continualcode-0.6.0.tar.gz
  • Upload date:
  • Size: 10.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for continualcode-0.6.0.tar.gz
Algorithm Hash digest
SHA256 f974e4e3a9ee9ee4a05bff4dddbc17ffe6afe2da0bfc765b829532b0e3b3f2e0
MD5 238a2f073f330d5d2e6def09d76584c0
BLAKE2b-256 d50c6325a31fc1cceba5ac58620edb116a85fca5a006ad7c445525a50dd03002

See more details on using hashes here.

File details

Details for the file continualcode-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: continualcode-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 37.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for continualcode-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0da9b056f14fc66580522ec31a04dfd25cef89e1541f994009f4ef0e367aa31a
MD5 ea89be238a393b14b7318f7f422ae633
BLAKE2b-256 8025c928a83cd9ef54596a74e4f53d07b7913184e660d64ed95afa5f1e410fa1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page