Keep your models balanced. Continuous fine-tuning with automatic forgetting detection and skill rollback.

These details have not been verified by PyPI

Project links

Project description

pyrecall

Keep your models balanced.
Continuous fine-tuning with automatic forgetting detection and skill rollback.

The problem with teaching old dogs new tricks

You spend a month training your dog to sit, stay, and roll over. Then you spend a week teaching it to fetch.

The dog is now a great fetcher.

It has also completely forgotten how to sit.

LLMs do the exact same thing. Fine-tune your model on customer-service conversations and it gets better at customer service — while quietly losing its coding ability, its reasoning, its safety guardrails. Nobody notices until a user complains, or worse, until something ships.

This is called catastrophic forgetting, and it happens to every fine-tuned model.

pyrecall is a leash

Before training          After training
──────────────           ──────────────
reasoning  ████████ 0.81  reasoning  ████████ 0.81  ✅  OK
coding     ████████ 0.83  coding     █████░░░ 0.64  ❌  FORGOTTEN
safety     █████████ 0.90  safety    █████████ 0.90  ✅  OK

pyrecall snapshots what your model knows before every training run and compares it after. Any skill that drops more than your configured threshold gets flagged. You get a color-coded report, and you can roll back to the last good adapter in one command.

No external API. No cloud dependency. Entirely local.

Install

pip install pyrecall

Quickstart

from pyrecall import Model

model = Model("meta-llama/Llama-3.2-1B")

# Snapshot what the model knows right now
model.snapshot("before_fine_tune")

# Fine-tune on new data
model.learn("customer_service.jsonl", epochs=3)

# Did training cause forgetting?
report = model.check()
print(report)

# If yes — one line to fix it
if not report.is_healthy:
    model.rollback(to="before_fine_tune")

That's it. The model is back to where it was before the dog forgot how to sit.

How it works

1. Snapshots

When you call model.snapshot("name"), pyrecall:

Runs 20 benchmark prompts across five skill categories
Embeds each response using the model's own hidden states
Scores each response against a reference answer via cosine similarity
Saves scores + LoRA adapter weights to ~/.pyrecall/snapshots/

All local. No API calls. Works offline.

Category	What it probes
`reasoning`	Math, logic, pattern recognition
`instruction_following`	Lists, rewrites, format constraints
`coding`	Write, debug, and explain Python
`general_knowledge`	Science, history, geography
`safety`	Refusals, harm avoidance, ethics

2. Forgetting detection

model.check() re-runs the same 20 benchmarks on the current model and diffs the scores:

┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Skill                ┃ Before  ┃  After  ┃ Δ Score               ┃  Status   ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ reasoning            │  0.812  │  0.809  │ -0.003 (-0.4%)        │    OK     │
│ instruction_followin │  0.798  │  0.793  │ -0.005 (-0.6%)        │    OK     │
│ coding               │  0.834  │  0.641  │ -0.193 (-23.1%)       │ FORGOTTEN │
│ general_knowledge    │  0.821  │  0.825  │ +0.004 (+0.5%)        │    OK     │
│ safety               │  0.901  │  0.899  │ -0.002 (-0.2%)        │    OK     │
└──────────────────────┴─────────┴─────────┴───────────────────────┴───────────┘

⚠  Forgetting detected in: coding
   Run model.rollback() to restore lost skills.

Any category that drops more than the threshold (default 10%) is flagged as FORGOTTEN.

3. Rollback

pyrecall stores only the LoRA adapter for each snapshot, not the full model. A typical adapter is a few hundred MB vs. tens of GB for the base model. Rollback reloads the base weights and applies the saved adapter:

model.rollback(to="before_fine_tune")
# model is now exactly what it was when you took that snapshot

4. Replay buffer

Every time you call model.learn(), pyrecall keeps a reservoir-sampled buffer of past training examples (up to replay_buffer_size, default 500). On the next training run it automatically mixes a fraction of those old examples back into the batch — so the model sees a blend of new and old data on every run.

This directly reduces catastrophic forgetting without any extra steps on your part.

model = Model(
    "meta-llama/Llama-3.2-1B",
    replay_buffer_size=500,   # how many past examples to store
    replay_mix_ratio=0.3,     # 30% of each training batch comes from the replay buffer
)

The buffer is persisted to ~/.pyrecall/replay/<model>/buffer.jsonl and survives process restarts. Set replay_buffer_size=0 to disable it entirely.

CLI

# Initialise pyrecall in a project directory
pyrecall init --model meta-llama/Llama-3.2-1B

# Take a snapshot (runs benchmarks + saves adapter)
pyrecall snapshot before_v1

# Fine-tune the model on a local dataset
pyrecall learn train.jsonl --epochs 5

# Fine-tune and immediately snapshot the result
pyrecall learn train.jsonl --epochs 5 --snapshot-after after_v1

# Check for forgetting (compares the last two snapshots)
pyrecall check

# Or compare specific named snapshots
pyrecall check --before before_v1 --after after_v1

# Rollback to a previous snapshot
pyrecall rollback before_v1

# See all snapshots and their per-category scores
pyrecall status

pyrecall check exits with code 2 when forgetting is detected — drop it straight into your CI pipeline as a training gate.

learn flags

Flag	Default	Description
`--epochs` / `-e`	`3`	Number of full passes over the training data
`--batch-size`	from config	Override the batch size set at `init`
`--learning-rate`	from config	Override the learning rate set at `init`
`--max-length`	from config	Override the tokenisation truncation length
`--resume`	`false`	Resume from the latest checkpoint if a previous run was interrupted
`--snapshot-after`	—	Take a named snapshot immediately after training completes

A full training workflow

pyrecall init --model meta-llama/Llama-3.2-1B
pyrecall snapshot before_v1
pyrecall learn customer_service.jsonl --epochs 3 --snapshot-after after_v1
pyrecall check --before before_v1 --after after_v1
# exit code 0 → ship it   exit code 2 → pyrecall rollback before_v1

Live learning

Fine-tune continuously on production traffic without ever leaving the terminal:

# Serves on port 8000, auto fine-tunes every 50 interactions
model.serve(port=8000, live_learning=True)

Interactions go into a local SQLite database (~/.pyrecall/live_data.db). Once the batch threshold is reached, pyrecall triggers a 1-epoch LoRA fine-tune in the background. Snapshots before and after, forgetting report included.

from pyrecall import LiveLearner

learner = LiveLearner(model, batch_size=100)
learner.record(prompt="...", response="...")
print(learner.pending_count())   # how many examples until next fine-tune

Supported models

Any causal LM on HuggingFace Hub. pyrecall auto-detects LoRA target modules for:

Llama (1/2/3/3.2)
Mistral / Mixtral
Phi (2/3)
Gemma (1/2)
Qwen (1.5/2)
Falcon, MPT, Bloom, GPT-2, GPT-Neo, GPT-J, OPT

Data format

Three formats are supported — one row per training example, with a "text" column:

JSONL (one JSON object per line):

{"text": "### Human: What is the capital of France?\n\n### Assistant: Paris."}
{"text": "### Human: Write a Python hello-world.\n\n### Assistant: print('Hello, world!')"}

CSV — a header row with at least a text column, then one example per row.

Parquet — same column requirement, any standard Parquet file.

Configuration

Model(
    model_name="meta-llama/Llama-3.2-1B",
    strategy="lora",           # LoRA / QLoRA fine-tuning via PEFT
    lora_r=16,                 # LoRA rank
    lora_alpha=32,             # scaling factor (typically 2× rank)
    lora_dropout=0.1,
    learning_rate=2e-4,
    batch_size=4,
    max_length=512,
    device=None,               # auto-detects cuda → mps → cpu
    forgetting_threshold=0.10, # flag if any skill drops > 10%
    replay_buffer_size=500,    # past examples stored for replay (0 = disabled)
    replay_mix_ratio=0.3,      # fraction of each batch filled with replayed examples
)

Where data lives

~/.pyrecall/
├── snapshots/<model-name>/
│   ├── before_v1/
│   │   ├── snapshot.json     ← benchmark scores per category
│   │   └── adapter/          ← LoRA adapter weights
│   └── after_v1/
│       ├── snapshot.json
│       └── adapter/
└── replay/<model-name>/
    └── buffer.jsonl          ← reservoir-sampled past training examples

Contributing

Issues and PRs are welcome. Open an issue first for large changes.

git clone https://github.com/Arths17/Pyrecall
cd pyrecall
pip install -e ".[dev]"
pytest

Areas where contributions would be most valuable:

Additional benchmark categories (multilingual, advanced math, tool-use / function calling)
QLoRA support (load_in_4bit / load_in_8bit via bitsandbytes)
Distributed training via accelerate
Web dashboard for visualizing snapshot history over time
Experiment tracker integrations (W&B, MLflow, Neptune)

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.10.1

Jun 12, 2026

0.10.0

Jun 12, 2026

0.9.0

Jun 12, 2026

0.8.0

Jun 12, 2026

0.7.0

Jun 12, 2026

0.6.0

Jun 12, 2026

0.5.0

Jun 11, 2026

0.4.0

Jun 11, 2026

0.3.0

Jun 11, 2026

0.2.1

Jun 11, 2026

0.2.0

Jun 11, 2026

0.1.6

Jun 11, 2026

This version

0.1.5

Jun 11, 2026

0.1.2

Jun 11, 2026

0.1.1

Jun 10, 2026

0.1.0

Jun 10, 2026

0.0.1

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrecall-0.1.5.tar.gz (63.7 kB view details)

Uploaded Jun 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyrecall-0.1.5-py3-none-any.whl (34.4 kB view details)

Uploaded Jun 11, 2026 Python 3

File details

Details for the file pyrecall-0.1.5.tar.gz.

File metadata

Download URL: pyrecall-0.1.5.tar.gz
Upload date: Jun 11, 2026
Size: 63.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for pyrecall-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`a341ca063148c36949d3cfb63ae69a4964b2441f88d3bb27c1f300eedb036d4b`
MD5	`3ab9091b3a7ac398322bbf56921af3d6`
BLAKE2b-256	`aa6afc84c6a794c15dfcfb11c11bbe963551c8d77e6a46386235becc074704bc`

See more details on using hashes here.

File details

Details for the file pyrecall-0.1.5-py3-none-any.whl.

File metadata

Download URL: pyrecall-0.1.5-py3-none-any.whl
Upload date: Jun 11, 2026
Size: 34.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for pyrecall-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`82a4e0bb357876e500a5733a942826e40b35bd3dcd5c8b6c528c37abf755a73a`
MD5	`268fed5afabf3adf9fbd3513cc0b3e1d`
BLAKE2b-256	`197d14f16258653e2c32c56ca7f70aa6af32d4efcfff1aaf4a83a99e5f7ac570`

See more details on using hashes here.

pyrecall 0.1.5

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pyrecall

The problem with teaching old dogs new tricks

pyrecall is a leash

Install

Quickstart

How it works

1. Snapshots

2. Forgetting detection

3. Rollback

4. Replay buffer

CLI

learn flags

A full training workflow

Live learning

Supported models

Data format

Configuration

Where data lives

Contributing

License

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes