Connect a git repo to the Code2LoRA-GRU hypernetwork and incrementally regenerate a LoRA adapter every few commits.
Project description
code2lora
Connect a git repository to the pre-trained Code2LoRA-GRU hypernetwork and incrementally regenerate a LoRA adapter for your code model every few commits — no per-repo training required.
code2lora streams your commit history through a frozen embedder and a trained
GRU hypernetwork. The GRU's hidden state is persisted between runs, so each
update only has to process the commits that arrived since last time. Every N
commits it emits a standard PEFT adapter
you can load on top of the base model.
flowchart LR
newCommits["New commits since cursor"] --> embed["Qwen3 diff embedding"]
embed --> gruStep["GRU step (per commit)"]
state["Persisted hidden state + cursor SHA"] --> gruStep
gruStep --> state
gruStep -->|"every N commits"| head["LoRA head -> A,B per type"]
head --> exporter["PEFT adapter (safetensors + config)"]
exporter --> consume["peft.PeftModel / generate"]
Install
pip install code2lora # core: stream commits + export adapters
pip install 'code2lora[infer]' # + run generation with the adapter (peft)
pip install 'code2lora[all]' # + watcher extra
The first run downloads the base model config and the hypernetwork checkpoint
(gru_head.best.pt) from the Hugging Face Hub.
Quick start (CLI)
cd my-repo
code2lora init # write .code2lora/config.toml
code2lora install-hook # update automatically after every `git commit`
code2lora sync # process new commits now; export adapter when due
code2lora status # cursor SHA, pending commits, adapter path
code2lora generate "def add(a, b):"
Wire it into CI/cron instead of (or alongside) the hook by running
code2lora sync in your pipeline, or run the foreground watcher:
code2lora watch
Quick start (Python)
from code2lora import Code2Lora
c2l = Code2Lora.from_repo(".")
result = c2l.sync() # fold in new commits, export if due
print(result.summary())
model = c2l.load_model() # base model + latest adapter (needs [infer])
print(c2l.generate("def add(a, b):"))
# the exported adapter is a normal PEFT adapter:
print(c2l.adapter_path()) # .code2lora/adapters/<branch>/
You can also load it yourself, anywhere:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-1.5B")
model = PeftModel.from_pretrained(base, ".code2lora/adapters/main")
How "update every few commits" works
The GRU consumes one commit at a time. code2lora persists, per branch, the
GRU hidden state plus a cursor (the last commit folded in) in
.code2lora/state.pt. A sync:
- lists new first-parent commits after the cursor,
- keeps the ones matching
commit_filter, - steps the GRU on each (one cheap embedder forward + one GRU cell — no LLM),
- re-exports the adapter once at least
update.everycommits have accrued since the last export (and always on the first export, or with--force).
Stepping is cheap; only the export touches disk. Nothing re-processes history you've already folded in.
Configuration
Everything lives in .code2lora/config.toml. Defaults work out of the box; edit
to taste.
[model]
base_model = "Qwen/Qwen2.5-Coder-1.5B"
hypernetwork = "code2lora/code2lora-gru" # HF repo holding gru_head.*.pt
checkpoint_file = "gru_head.best.pt"
checkpoint_path = "" # local gru_head.*.pt (overrides HF)
embedder = "Qwen/Qwen3-Embedding-0.6B"
device = "auto" # auto | cpu | cuda | cuda:N
rank = 16
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "up_proj", "gate_proj", "down_proj"]
max_repo_state_files = 400 # cap files embedded for the cold-start seed
[update]
every = 5 # re-export adapter every N commits
commit_filter = "production" # production | test-touching | all
branch_strategy = "current" # current | per-branch | main-only
main_branch = "main"
export_on_every_sync = false
[hook]
background = true # don't block `git commit`
[watch]
interval = 30 # seconds between HEAD polls
update.every
How many newly-processed commits must accrue before the adapter is re-exported. Set it low for tight feedback, high to amortise export cost.
update.commit_filter
Which commits advance the GRU:
| value | commits that count |
|---|---|
production |
commits touching non-test source (.py/.md/.rst) — the default |
test-touching |
commits that change a python test file (approximates the training regime) |
all |
every first-parent commit |
The diff fed at each step is the filtered production-code diff from the previous selected commit to the current one, so changes in skipped commits still flow in.
branch_strategy
How branches map onto GRU state:
| value | behaviour |
|---|---|
current |
track the checked-out branch (default) |
main-only |
always follow main_branch, regardless of what's checked out |
per-branch |
keep an independent hidden state + adapter per branch, forked from the parent at branch point |
For per-branch, when a new branch first appears its state is forked from a
parent branch whose cursor is an ancestor of the new tip; otherwise that branch
cold-starts. Each branch's adapter lands in .code2lora/adapters/<branch>/.
Using a private / local checkpoint
[model]
checkpoint_path = "/path/to/gru_head.best.pt"
or via environment variables (also honoured): CODE2LORA_CKPT,
CODE2LORA_CKPT_REPO, CODE2LORA_CKPT_FILE.
CLI reference
| command | description |
|---|---|
init |
write .code2lora/config.toml (+ --install-hook) |
install-hook |
add a git post-commit hook that runs sync |
uninstall-hook |
remove the hook |
sync [--force] |
process new commits; export when due (--force exports now) |
status |
show cursor / pending commits / adapter per branch |
watch |
run the polling watcher in the foreground |
export [-o DIR] |
force-generate the adapter into a directory |
generate PROMPT |
load base + adapter and generate |
Add --repo PATH before the command to target a repo other than the cwd.
What gets written
.code2lora/
config.toml # your settings
state.pt # persisted GRU state + cursor per branch
adapters/<branch>/ # adapter_model.safetensors + adapter_config.json
.gitignore # ignores state.pt and adapters/ by default
Limitations
- Inference-only: the hypernetwork is not fine-tuned on your repo.
- Trained on Python repositories with test suites; other languages will run but are out of the trained distribution.
test-touchinguses a "changes a test file" heuristic, a lighter, incremental-friendly approximation of the trainer's "introduces a new assertion" selection.- Defaults target
Qwen/Qwen2.5-Coder-1.5B; the PEFT key layout assumes a Qwen2/Llama-style decoder (self_attn.*/mlp.*).
License
Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file code2lora-0.1.0.tar.gz.
File metadata
- Download URL: code2lora-0.1.0.tar.gz
- Upload date:
- Size: 46.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09146e0463469a61326e10ce50a076df4303abc1f561ca3c6315d495dc45ad68
|
|
| MD5 |
6c6a6f4d275e54fedc20e6dcfa88068c
|
|
| BLAKE2b-256 |
5eb21f43081b99c79d4a5d5885d8e5028fe7967771c1f0ebbab5184bfded3dc5
|
File details
Details for the file code2lora-0.1.0-py3-none-any.whl.
File metadata
- Download URL: code2lora-0.1.0-py3-none-any.whl
- Upload date:
- Size: 50.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb6c5449e0a8436aabbb66c8e28bededca46725ddc43a8bbdb231a9a4ed19261
|
|
| MD5 |
a80feb27c9e2db0af6be3542e1eda061
|
|
| BLAKE2b-256 |
3ea195f9422fa818d8f05476016726038373649aa32cff580514d903ca0ec435
|