Skip to main content

Software Poet Conscience — language-agnostic code aesthetic evaluation

Project description

eigenhelm

Catch low-quality AI-generated code before it lands.

License: AGPL-3.0


The problem

AI agents write working code fast. But "working" isn't "good." Tests pass, the diff looks plausible, and it gets merged — but complexity concentrates in the wrong places, patterns repeat where they should be abstracted, and structure decays toward GitHub average.

LLM reviewers help, but they share the agent's blind spots. They reason from text, not structure. Run the same review twice, get different comments.

The argument against caring: it works, tests pass, ship it. But structural quality isn't aesthetics — it predicts what happens next. High cyclomatic density and concentrated complexity produce more post-merge defects. Agents generate code at machine speed; without measurement, you accumulate structural debt just as fast. And review doesn't scale to agent output volume — the human eye glazes over at 500 lines.

eigenhelm measures code structure using information theory — not an LLM. It parses the AST, extracts a structural fingerprint, and scores how closely the code resembles curated high-quality corpora. Deterministic, trainable on your code, zero API cost.


Before and after

An agent writes a module. eigenhelm evaluates it:

src/pipeline.py
  decision: reject
  score:    0.72
  directives:
    [high] reduce_complexity → process_batch (lines 15-89)
    [high] extract_repeated_logic → validate_row (lines 42-67)

The agent reads the directives, refactors, tests still pass. Re-evaluate:

src/pipeline.py
  decision: accept
  score:    0.35

0.72 → 0.35. Structurally sound. No human reviewed it.

In controlled benchmarks, agents using eigenhelm produced code rated 46% higher on design, robustness, and spec compliance — with zero correctness regressions.


Install

pip install eigenhelm

Or with uv (no venv required):

uv tool install eigenhelm

A bundled model is included — no setup needed.

eh evaluate path/to/file.py --classify

What the scores mean

  • accept (score < 0.4): Structurally sound. Move on.
  • marginal (score 0.4-0.6): Acceptable; review directives if improvement is straightforward.
  • reject (score > 0.6): Worth reviewing. Read the directives for guidance.

Scores are relative to elite open-source training corpora. Most production code scores marginal — that's normal, not a problem.


How is this different from CodeRabbit?

eigenhelm LLM reviewer
Input AST structure (69-dim vector) Source text
Deterministic Yes — same code, same score No
Trainable on your corpus Yes — eh train No
Hard CI gate Yes — exit codes, strict mode Suggestions only
Tracks quality over time Yes — comparable scores No stable metric
Catches logic bugs No Yes
Cost Zero (local) Per-token LLM cost

They're complementary. eigenhelm runs first — in the agent's inner loop. LLM review runs second, on the PR. Full comparison.


Agent integration

eh skill --install

The skill teaches AI agents the correct workflow: evaluate after tests pass, two passes maximum, never sacrifice correctness for score. In a controlled benchmark (3 scenarios, scored by a separate reviewer not involved in generation), agents using the skill produced code rated 46% higher on quality metrics. Full guide.


CLI Reference

All commands are available as eigenhelm <command> or eh <command>:

Command Description
eh evaluate Evaluate source files against the aesthetic manifold
eh train Train a new eigenspace model from a corpus directory
eh inspect Inspect a saved model's metadata
eh serve Run the evaluation HTTP server
eh harness Run a statistical comparison harness across two code sets
eh benchmark Run real-world use case benchmarks
eh skill Install the agent skill file
eh model Manage eigenhelm models (list, pull, info)
eh init Generate a starter .eigenhelm.toml configuration
eh corpus Manage training corpora (sync from manifest)
eh mcp Start the MCP stdio server

Run eh --help or eh <command> --help for details.


HTTP API

Endpoint Method Description
/health GET Liveness probe
/ready GET Readiness probe (model loaded)
/v1/evaluate POST Evaluate a code unit
/v1/evaluate/batch POST Evaluate multiple code units

Supported Languages

Trained models: Python, JavaScript, TypeScript, Go, Rust.

Parser support (feature extraction available, bring your own model): Java, C, C++, Ruby, Kotlin.


Development Setup

git clone https://github.com/metacogdev/eigenhelm.git
cd eigenhelm
uv sync --extra dev --extra serve
uv run pytest
uv run ruff check .

Architecture

eigenhelm/
├── virtue_extractor.py   — Tree-sitter + Lizard → FeatureVector (69 dimensions)
├── critic/               — AestheticCritic: 5-dim scoring (drift, alignment, entropy, compression, NCD)
├── declarations/         — Declaration-aware scoring (type defs, barrel files, data tables)
├── regions/              — Test/production code region detection
├── eigenspace/           — EigenspaceModel: PCA projection, drift scoring
├── attribution/          — Score attribution and directive generation
├── training/             — PCA training, calibration, exemplar selection
├── helm/                 — DynamicHelm: threshold-calibrated evaluation + PID steering
├── config/               — .eigenhelm.toml loader and models
├── output/               — SARIF 2.1.0 and JSON formatters
├── scoring/              — Per-repo scorecard (M1-M5, Q1-Q5)
├── harness/              — Statistical evaluation harness (Mann-Whitney U)
├── parsers/              — Language parsing (tree-sitter integration)
├── mcp/                  — Model Context Protocol stdio server
├── registry/             — Model registry and resolution
├── trained_models/       — Bundled .npz models
└── serve/                — HTTP evaluation server (requires `eigenhelm[serve]` extra)

Current Status

  • 5-dim scoring: manifold drift, alignment, entropy, compression, NCD exemplar distance
  • 5 languages: Python, JavaScript, TypeScript, Go, Rust — all discriminating (Cohen's d > 0.5)
  • Human correlation: Spearman rho = 0.54 overall (n = 92, 5 languages), 0.66 Python-only (n = 52)
  • Declaration-aware: Automatically detects type-definition and data-table files, adjusts scoring and directives
  • Agent-tested: Skill contract validated in controlled arena (3 scenarios, 46% quality improvement)

License

eigenhelm is licensed under the GNU Affero General Public License v3.0.

Commercial Licensing

Looking to use eigenhelm in a proprietary SaaS or enterprise product without AGPL-3.0 obligations? A commercial license is available.

Contact us at licensing@eigenhelm.sh to discuss terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eigenhelm-0.8.0.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eigenhelm-0.8.0-py3-none-any.whl (1.5 MB view details)

Uploaded Python 3

File details

Details for the file eigenhelm-0.8.0.tar.gz.

File metadata

  • Download URL: eigenhelm-0.8.0.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eigenhelm-0.8.0.tar.gz
Algorithm Hash digest
SHA256 fbf367097b7b0986814979457c58397350cdb27b400c9e8b3b1540ea0d24d91d
MD5 0cc9ff28483d66904bc4a3137dd1ff28
BLAKE2b-256 552580a91f000b2feb23aab14e7db6b599e5f6f57e1ac10a34e92881bc10cf8e

See more details on using hashes here.

Provenance

The following attestation bundles were made for eigenhelm-0.8.0.tar.gz:

Publisher: publish.yml on metacogdev/eigenhelm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eigenhelm-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: eigenhelm-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eigenhelm-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c198561cf2ec9e87cc39aaf5c90728a19a35ccc87e2151ea56f1b094d27c5e5c
MD5 c19898164d0eeb8580c7f03771e0add9
BLAKE2b-256 63743cc18c0b789ab520432e74222db3d3ab4edd2fc6a8c521a61bf97e47d24e

See more details on using hashes here.

Provenance

The following attestation bundles were made for eigenhelm-0.8.0-py3-none-any.whl:

Publisher: publish.yml on metacogdev/eigenhelm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page