Engineering notebook for AI-assisted development

These details have not been verified by PyPI

Project links

Project description

buildlog

A measurable learning loop for AI-assisted work

Track what works. Prove it. Drop what doesn't.

RE: The art. Yes, it's AI-generated. Yes, that's hypocritical for a project about rigor over vibes. Looking for an actual artist to pay for a real logo. If you know someone good, open an issue or DM me. Budget exists.

Read the full documentation | Landing page

The Problem

Every AI-assisted work session produces decisions, corrections, and outcomes. Almost all of it gets discarded. The next session starts from scratch with the same blind spots.

buildlog captures structured trajectories from real work, extracts decision patterns, and uses Thompson Sampling to select which patterns to surface. Then it measures whether that selection actually reduced mistakes.

How It Works

1. Capture structured work trajectories

Each session is a dated entry documenting what you did, what went wrong, and what you learned -- a structured record of decisions and outcomes, not a chat transcript.

buildlog init               # scaffold a project
buildlog new my-feature      # start a session
# ... work ...
buildlog commit -m "feat: add auth"

2. Extract decision patterns as seeds

The seed engine watches your development patterns and extracts seeds: atomic observations about what works. A seed might be "always define interfaces before implementations" or "mock at the boundary, not the implementation." Each seed carries a category, a confidence score, and source provenance.

Extraction runs through a pipeline: sources -> extractors -> categorizers -> generators. Extractors range from regex-based (fast, cheap, brittle) to LLM-backed (accurate, expensive). The pipeline deduplicates semantically using embeddings.

3. Review with the gauntlet

The gauntlet is an automated quality gate with curated reviewer personas. It runs on your code and files findings categorized by severity. When a reviewer cites a rule in their review, that rule gets credited -- this is the sole feedback signal that drives learning.

4. Select which patterns to surface using Thompson Sampling

Seeds compete for inclusion in your agent's instruction set. The system treats each seed as an arm in a multi-armed bandit and uses Thompson Sampling (via qortex-learning) to balance exploration (trying under-tested rules) against exploitation (surfacing rules with strong track records).

Each seed maintains a Beta posterior updated by gauntlet review outcomes. Over time, the system converges on the rules that actually reduce mistakes in your specific codebase and workflow, not rules that sound good in the abstract.

5. Render to every agent format

Selected rules are written into the instruction files your agents actually read:

CLAUDE.md (Claude Code)
.cursorrules (Cursor)
.github/copilot-instructions.md (GitHub Copilot)
Windsurf, Continue.dev, generic settings.json

buildlog skills   # render current policy to agent files

6. Close the loop

The gauntlet closes the loop automatically. Every gauntlet run credits the rules its reviewers cite, and log_reward(outcome="accepted") after PR approval updates the Thompson Sampling posteriors. No extra ceremony required.

For teams that want longitudinal tracking across many sessions, buildlog also ships optional experiment/session commands that measure Repeated Mistake Rate (RMR) over time:

# Optional — for longitudinal RMR tracking
buildlog experiment start
# ... work across sessions ...
buildlog experiment end
buildlog experiment report

The Learning Loop

The feedback loop is fully closed and mechanically proven:

Gauntlet Review
    |
    v
gauntlet_process_issues()
    |-- credits rules cited by reviewers
    |-- persists credited rule IDs to SQLite (gauntlet_credits table)
    v
log_reward(outcome="accepted")
    |-- reads latest gauntlet_credits from SQLite
    |-- calls bandit.batch_update(rules, reward)
    v
qortex Learner (Thompson Sampling)
    |-- Beta(alpha, beta) posteriors shift
    |-- next select() favors rules with higher posteriors

The gauntlet is the sole feedback source. Rules get credited when cited in reviews, not from session selection. This eliminates the credit assignment problem: only rules that demonstrably contributed to review quality get reinforced.

Each gauntlet citation followed by a reward acceptance increments alpha in the Beta posterior. A rule that starts at Beta(1,1) with mean 0.5 (uniform prior / no evidence) converges toward 1.0 as it accumulates positive evidence, making it increasingly likely to be selected for future sessions.

What Else Is In the Box

LLM-backed extraction: when regex isn't enough, the seed engine can use OpenAI, Anthropic, or Ollama to extract patterns from code and logs. Metered backend tracks token usage and cost.
Global SQLite storage: all buildlog data is stored in a single global database at ~/.buildlog/buildlog.db (SQLite with WAL mode, schema v7). Project isolation via hashed project IDs derived from git remote URLs. Legacy per-project JSON/JSONL files are still supported as a fallback.
Migration and export: buildlog migrate converts legacy JSON/JSONL files to the global database (idempotent, non-destructive). buildlog export dumps data back to JSONL for portability or backup.
Ambient emission protocol: mistakes and learned rules are automatically emitted as JSON artifacts to ~/.buildlog/emissions/pending/ for offline ingestion by downstream systems (knowledge graphs, analytics). Fire-and-forget -- emission failure never breaks the primary operation.
Workflow enforcement: buildlog verify checks your setup (CLAUDE.md workflow section, MCP registration, branch protection hooks) and --fix repairs it. buildlog init installs pre-commit hooks that prevent direct commits to main.
Interactive dashboard: buildlog viz launches a marimo notebook in your browser with live visualizations of reward trends, bandit posteriors, session history, mistake analysis, and insight breakdowns.
Posterior history: Every gauntlet credit and reward event snapshots the bandit's alpha/beta/mean for credited rules. Query evolution over time with buildlog_posterior_history() to verify convergence or detect stale rules.
MCP server: buildlog exposes 36 tools as an MCP server so agents can query seeds, skills, and build history programmatically during sessions.
npm wrapper: npx @peleke.s/buildlog for JS/TS projects. Thin shim that finds and invokes the Python CLI.

Current Limits

This is v0.20, not the end state.

Extraction quality is uneven. Regex extractors miss nuance; LLM extractors are accurate but expensive. The middle ground is still being found.
Single-agent only. Multi-agent coordination (shared learning across agents) is designed but not implemented.
Long-horizon learning is not modeled. The bandit operates per-gauntlet-citation. Sessions are optional grouping containers. Longer arcs of competence building need richer policy models.

What's next

Two layers building on the global SQLite backend and qortex integration:

Cross-project convergence -- detect rules independently rediscovered across projects, track salience
Emergent rule graphs -- cluster embeddings into concept nodes, derive edges from co-occurrence and bandit correlation, contextual bandits with embedding-space context vectors (LinUCB)

Embedding persistence via sqlite-vec is already available through the qortex learning backend.

See the full roadmap for details.

Installation

Requires Python >= 3.11

Always-On Mode (recommended)

We run buildlog as an ambient data capture layer across all projects. One command, works everywhere:

pipx install buildlog         # or: uv tool install buildlog
buildlog init-mcp --global -y # registers MCP + writes instructions to ~/.claude/CLAUDE.md

That's it. Claude Code now has all 35 buildlog tools and knows how to use them in every project you open. No per-project setup needed.

The --global flag:

Registers the MCP server in ~/.claude.json (Claude Code's global config)
Creates ~/.claude/CLAUDE.md with usage instructions so Claude proactively uses buildlog
Works immediately in any repo, even without a local buildlog/ directory

The -y flag skips confirmation prompts (useful for scripts and CI).

This is how we use buildlog ourselves: always on, capturing structured trajectories from every session, feeding downstream systems that generate engineering logs, courses, and content.

Per-project setup

If you prefer explicit per-project control:

pip install buildlog          # MCP server included by default
buildlog init --defaults      # scaffold buildlog/, register MCP, update CLAUDE.md

This creates a buildlog/ directory with templates and configures Claude Code for that specific project.

For JS/TS projects

npx @peleke.s/buildlog init

Dependencies

Core dependencies installed automatically:

Package	Purpose
`qortex-learning`	Thompson Sampling backend (default learning engine)
`mcp`	MCP server for Claude Code integration
`sqlite-vec`	Vector similarity for semantic deduplication
`numpy`	Numerical operations for bandit computations

Optional extras:

pip install buildlog[viz]         # marimo dashboard + plotly
pip install buildlog[embeddings]  # local sentence-transformers
pip install buildlog[llm]         # Ollama + Anthropic extractors
pip install buildlog[openai]      # OpenAI embeddings
pip install buildlog[qortex-full] # full qortex KG + REST + MCP
pip install buildlog[all]         # everything

Verify installation

buildlog mcp-test          # verify all 36 tools are registered
buildlog overview          # check project state (works without init in global mode)

Quick Start

buildlog init --defaults      # scaffold + MCP + CLAUDE.md
buildlog new my-feature       # start a session
# ... work ...
buildlog commit -m "feat: add auth"
buildlog gauntlet-loop --target src/  # review with curated personas
buildlog log-reward --outcome accepted  # close the feedback loop

Sessions and experiments are optional. If you want longitudinal RMR tracking:

# Optional — for tracking RMR across many sessions
buildlog experiment start
# ... work across sessions ...
buildlog experiment end
buildlog experiment report

Want the full picture? The Learning Loop E2E Trace walks through all 13 steps with explicit code citations: installation, Thompson Sampling, gauntlet review, bandit updates, emission pipeline, cross-domain discovery via qortex, and rule re-export. Every claim above has a mechanical proof.

Configuration

Learning backend

buildlog defaults to qortex-learning for Thompson Sampling. To force the builtin bandit fallback:

export BUILDLOG_LEARNING_BACKEND=builtin

If qortex-learning is not installed, buildlog falls back to the builtin bandit automatically with a warning.

Session ceremony

Sessions and experiments are optional. log_mistake() works without an active session, and the gauntlet can credit rules and update posteriors without any session ceremony.

Documentation

Section	Description
Installation	Setup, extras, and initialization
Quick Start	Full pipeline walkthrough
Learning Loop E2E	Complete 13-step trace with code citations -- the proof
Core Concepts	The problem, the claim, and the metric
Theory	From restaurant intuition to contextual bandits -- the full tutorial
CLI Reference	Every command documented
MCP Integration	Claude Code setup and available tools
Storage Architecture	Global SQLite backend, migration, and export
Experiments	Optional longitudinal RMR tracking across sessions
Dashboard	Interactive marimo dashboard (`buildlog viz`)
Review Gauntlet	Reviewer personas and the gauntlet loop
Multi-Agent Setup	Render rules to any AI coding agent
Roadmap	Embeddings, cross-project convergence, rule graphs
Philosophy	Principles and honest limitations

Contributing

git clone https://github.com/Peleke/buildlog-template
cd buildlog-template
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
pytest

We're especially interested in better context representations, credit assignment approaches, statistical methodology improvements, and real-world experiment results (positive or negative).

License

MIT License. See LICENSE

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.23.0

Mar 14, 2026

0.22.0

Mar 14, 2026

0.21.1

Mar 12, 2026

This version

0.21.0

Mar 12, 2026

0.20.0

Mar 7, 2026

0.18.4

Feb 15, 2026

0.18.2

Feb 14, 2026

0.18.1

Feb 14, 2026

0.15.0

Feb 13, 2026

0.13.1

Feb 7, 2026

0.13.0

Feb 6, 2026

0.12.0

Feb 5, 2026

0.11.1

Feb 5, 2026

0.11.0

Feb 5, 2026

0.10.5

Feb 4, 2026

0.10.4

Feb 4, 2026

0.10.3

Feb 4, 2026

0.10.2

Feb 4, 2026

0.10.1

Feb 4, 2026

0.10.0

Feb 4, 2026

0.9.0

Feb 2, 2026

0.8.0

Jan 31, 2026

0.7.0

Jan 22, 2026

0.6.1

Jan 22, 2026

0.6.0

Jan 22, 2026

0.5.0

Jan 22, 2026

0.4.0

Jan 17, 2026

0.3.0

Jan 17, 2026

0.2.0

Jan 16, 2026

0.1.0

Jan 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

buildlog-0.21.0.tar.gz (198.1 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

buildlog-0.21.0-py3-none-any.whl (235.1 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file buildlog-0.21.0.tar.gz.

File metadata

Download URL: buildlog-0.21.0.tar.gz
Upload date: Mar 12, 2026
Size: 198.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for buildlog-0.21.0.tar.gz
Algorithm	Hash digest
SHA256	`580cb4098eed6bb8dd871a127cccea2a31a90f3ae5115f87edf997023f9c26d5`
MD5	`6aed4f84fb5a6fd627aad0962881e664`
BLAKE2b-256	`8ce7534e8be9821af21088d1be79fa421e1110d12fc413eed53143359ed80fe6`

See more details on using hashes here.

File details

Details for the file buildlog-0.21.0-py3-none-any.whl.

File metadata

Download URL: buildlog-0.21.0-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 235.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for buildlog-0.21.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`28ad3e74c93f4fd194d977fde8c0a799eaadf648d76cf23583da85772fabe1f3`
MD5	`d9cb5050c049183150bfebd5414e0e67`
BLAKE2b-256	`ba0b7fdf4fc8c11b1f8fa1a0218d99cfa334c3c2c0244e68108ba493ff5c289c`

See more details on using hashes here.

buildlog 0.21.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

buildlog

A measurable learning loop for AI-assisted work

The Problem

How It Works

1. Capture structured work trajectories

2. Extract decision patterns as seeds

3. Review with the gauntlet

4. Select which patterns to surface using Thompson Sampling

5. Render to every agent format

6. Close the loop

The Learning Loop

What Else Is In the Box

Current Limits

What's next

Installation

Always-On Mode (recommended)

Per-project setup

For JS/TS projects

Dependencies

Verify installation

Quick Start

Configuration

Learning backend

Session ceremony

Documentation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes