Skip to main content

A complete toolkit for validating LLM-generated code

Project description

vallm

A complete toolkit for validating LLM-generated code.

PyPI PyPI - Downloads CI License: Apache-2.0 Python Code style: ruff Coverage Type checking: mypy Security: bandit Pre-commit CodeQL DOI GitHub stars GitHub forks GitHub issues GitHub pull requests Release Last commit Maintained PRs Welcome

vallm validates code proposals through a four-tier pipeline — from millisecond syntax checks to LLM-as-judge semantic review — before a single line ships.

Features

  • Multi-language AST parsing via tree-sitter (165+ languages)
  • Syntax validation with ast.parse (Python) and tree-sitter error detection
  • Import resolution checking for Python, JavaScript/TypeScript, Go, Rust, Java, C/C++
  • Complexity metrics via radon (Python) and lizard (16 languages)
  • Security scanning with language-specific patterns and optional bandit integration
  • LLM-as-judge semantic review via Ollama, litellm, or direct HTTP
  • Code graph analysis — import/call graph diffing for structural regression detection
  • AST similarity scoring with normalized fingerprinting
  • Pluggy-based plugin system for custom validators
  • Rich CLI with JSON/text output formats

Supported Languages

Language Syntax Imports Complexity Security
Python ✅ AST + tree-sitter ✅ Full resolution (22 methods) ✅ radon + lizard ✅ bandit + patterns
JavaScript ✅ tree-sitter ✅ Node.js builtins ✅ lizard ✅ XSS, eval patterns
TypeScript ✅ tree-sitter ✅ Node.js builtins ✅ lizard ✅ XSS, eval patterns
Go ✅ tree-sitter ✅ stdlib + modules ✅ lizard ✅ SQL injection, exec
Rust ✅ tree-sitter ✅ crates ✅ lizard ✅ unsafe, unwrap
Java ✅ tree-sitter ✅ stdlib packages ✅ lizard ✅ Runtime.exec, SQL
C/C++ ✅ tree-sitter ✅ std headers ✅ lizard ✅ buffer overflow, system
Ruby ✅ tree-sitter ⚠️ Limited ✅ lizard ⚠️ Limited
PHP ✅ tree-sitter ⚠️ Limited ✅ lizard ⚠️ Limited
Swift ✅ tree-sitter ⚠️ Limited ✅ lizard ⚠️ Limited
Kotlin ✅ tree-sitter ⚠️ Limited ✅ lizard ⚠️ Limited
Scala ✅ tree-sitter ⚠️ Limited ✅ lizard ⚠️ Limited

Installation

pip install vallm

With optional dependencies:

pip install vallm[all]        # Everything
pip install vallm[llm]        # Ollama + litellm for semantic review
pip install vallm[security]   # bandit integration
pip install vallm[semantic]   # CodeBERTScore
pip install vallm[graph]      # NetworkX graph analysis

Quick Start

Validate Entire Project

# Install with LLM support
pip install vallm[llm]

# Setup Ollama (for semantic review)
ollama pull qwen2.5-coder:7b
ollama serve

# Validate entire project recursively
vallm batch . --recursive --semantic --model qwen2.5-coder:7b

Python API

from vallm import Proposal, validate, VallmSettings

code = """
def fibonacci(n: int) -> list[int]:
    if n <= 0:
        return []
    fib = [0, 1]
    for i in range(2, n):
        fib.append(fib[i-1] + fib[i-2])
    return fib
"""

proposal = Proposal(code=code, language="python")
result = validate(proposal)
print(f"Verdict: {result.verdict.value}")  # pass / review / fail
print(f"Score: {result.weighted_score:.2f}")

CLI Commands Reference

# Batch validation (best for entire projects)
vallm batch . --recursive --semantic --model qwen2.5-coder:7b
vallm batch src/ --recursive --include "*.py,*.js" --exclude "*/test/*"
vallm batch . --recursive --format json --fail-fast
vallm batch . --recursive --verbose --show-issues  # Detailed per-file results

# Output formats for batch results
vallm batch . --recursive --format json   # Machine-readable JSON
vallm batch . --recursive --format yaml   # YAML format
vallm batch . --recursive --format toon   # Compact TOON format
vallm batch . --recursive --format text   # Plain text

# Single file validation
vallm validate --file mycode.py --semantic --model qwen2.5-coder:7b
vallm validate --file app.js --security
vallm validate --file mycode.py --format json  # JSON output

# Quick syntax check only
vallm check mycode.py
vallm check src/main.go

# Configuration and info
vallm info

Batch Command Options

Option Short Description
--recursive -r Recurse into subdirectories
--include File patterns to include (e.g., ".py,.js")
--exclude File patterns to exclude
--use-gitignore Respect .gitignore patterns (default: true)
--format -f Output format: rich, json, yaml, toon, text
--fail-fast -x Stop on first failure
--semantic Enable LLM-as-judge semantic review
--security Enable security checks
--model -m LLM model for semantic review
--verbose -v Show detailed validation results for each file
--show-issues -i Show issues for failed files

With Ollama (LLM-as-judge)

# 1. Install and start Ollama
ollama pull qwen2.5-coder:7b

# 2. Run with semantic review
vallm validate --file mycode.py --semantic
from vallm import Proposal, validate, VallmSettings

settings = VallmSettings(
    enable_semantic=True,
    llm_provider="ollama",
    llm_model="qwen2.5-coder:7b",
)

proposal = Proposal(
    code=new_code,
    language="python",
    reference_code=existing_code,  # optional: compare against reference
)
result = validate(proposal, settings)

Validation Pipeline

Tier Speed Validators What it catches
1 ms syntax, imports Parse errors, missing modules
2 seconds complexity, security High CC, dangerous patterns
3 seconds semantic (LLM) Logic errors, poor practices
4 minutes regression (tests) Behavioral regressions

The pipeline fails fast — Tier 1 errors stop execution immediately.

Configuration

Via environment variables (VALLM_*), vallm.toml, or pyproject.toml [tool.vallm]:

# vallm.toml
pass_threshold = 0.8
review_threshold = 0.5
max_cyclomatic_complexity = 15
enable_semantic = true
llm_provider = "ollama"
llm_model = "qwen2.5-coder:7b"

Plugin System

Write custom validators using pluggy:

from vallm.hookspecs import hookimpl
from vallm.scoring import ValidationResult

class MyValidator:
    tier = 2
    name = "custom"
    weight = 1.0

    @hookimpl
    def validate_proposal(self, proposal, context):
        # Your validation logic
        return ValidationResult(validator=self.name, score=1.0, weight=self.weight)

Register via pyproject.toml:

[project.entry-points."vallm.validators"]
custom = "mypackage.validators:MyValidator"

Multi-Language Support

vallm supports 30+ programming languages via tree-sitter parsers:

Auto-Detection

from vallm import detect_language, Language

# Auto-detect from file path
lang = detect_language("main.rs")  # → Language.RUST
print(lang.display_name)  # "Rust"
print(lang.is_compiled)     # True

CLI with Auto-Detection

# Language auto-detected from file extension
vallm validate --file script.py      # → Python
vallm check main.go                   # → Go  
vallm validate --file lib.rs          # → Rust

# Batch validation with mixed languages
vallm batch src/ --recursive --include "*.py,*.js,*.ts,*.go,*.rs"

Supported Languages

Language Category Complexity Syntax
Python Scripting ✓ radon + lizard ✓ ast + tree-sitter
JavaScript Web/Scripting ✓ lizard ✓ tree-sitter
TypeScript Web/Scripting ✓ lizard ✓ tree-sitter
Go Compiled ✓ lizard ✓ tree-sitter
Rust Compiled ✓ lizard ✓ tree-sitter
Java Compiled ✓ lizard ✓ tree-sitter
C/C++ Compiled ✓ lizard ✓ tree-sitter
Ruby Scripting ✓ lizard ✓ tree-sitter
PHP Web ✓ lizard ✓ tree-sitter
Swift Compiled ✓ lizard ✓ tree-sitter
+ 20 more via tree-sitter ✓ tree-sitter ✓ tree-sitter

See examples/07_multi_language/ for a comprehensive demo.

Examples

Each example lives in its own folder with main.py and README.md. Run all at once:

cd examples && ./run.sh
Example What it demonstrates
01_basic_validation/ Default pipeline — good, bad, and complex code
02_ast_comparison/ AST similarity scoring, tree-sitter multi-language parsing
03_security_check/ Security pattern detection (eval, exec, hardcoded secrets)
04_graph_analysis/ Import/call graph building and structural diffing
05_llm_semantic_review/ Ollama Qwen 2.5 Coder 7B LLM-as-judge review
06_multilang_validation/ JavaScript and C validation via tree-sitter
07_multi_language/ Comprehensive multi-language support — 8+ languages with auto-detection
08_code2llm_integration/ Project analysis integration with code2llm
09_code2logic_integration/ Call graph analysis with code2logic
10_mcp_ollama_demo/ MCP (Model Context Protocol) demo with Ollama
11_claude_code_autonomous/ Autonomous refactoring with Claude Code
12_ollama_simple_demo/ Simplified Ollama integration example

Architecture

src/vallm/
├── cli.py                 # Typer CLI (401L, 8 methods, CC=42) - needs refactoring
├── config.py              # pydantic-settings (VALLM_* env vars)
├── hookspecs.py           # pluggy hook specifications
├── scoring.py             # Weighted scoring + verdict engine (CC=18 validate function)
├── core/
│   ├── languages.py       # Language enum, auto-detection, 30+ languages
│   ├── proposal.py        # Proposal model
│   ├── ast_compare.py     # tree-sitter + Python AST similarity
│   ├── graph_builder.py   # Import/call graph construction
│   └── graph_diff.py      # Before/after graph comparison
├── validators/
│   ├── syntax.py          # Tier 1: ast.parse + tree-sitter (multi-lang)
│   ├── imports.py         # Tier 1: module resolution (653L, 22 methods) - god module
│   ├── complexity.py      # Tier 2: radon (Python) + lizard (16+ langs)
│   ├── security.py        # Tier 2: patterns + bandit
│   └── semantic.py        # Tier 3: LLM-as-judge
└── sandbox/
    └── runner.py          # subprocess / Docker execution

Code Health Metrics

Current codebase metrics (generated by code2llm analysis):

Metric Current Target
Avg Cyclomatic Complexity (CC̄) 3.5 ≤2.4
Max CC 42 ≤20
God Modules (>500L) 2 0
High CC Functions (≥15) 2 ≤1
Total Functions 91 -
Total Classes 19 -

Critical Functions (CC ≥ 10):

Function Location CC Fan-out Priority
batch cli.py:140 42 34 🔴 Split immediately
validate scoring.py:122 18 20 🟡 Refactor
_check_lizard complexity.py 12 9 🟡 Simplify
_parse_response semantic.py 12 17 🟡 Simplify

God Modules:

  • src/vallm/validators/imports.py (653L, 22 methods, 22 dependent imports)
  • src/vallm/cli.py (401L, 8 methods, CC=42)

See project/ directory for full analysis files:

  • analysis.toon - Health diagnostics and complexity metrics
  • evolution.toon - Refactoring queue with ranked priorities
  • context.md - Architecture summary for LLM assistance

Roadmap

v0.2 — Completeness

  • Wire pluggy plugin manager (entry_point-based validator discovery)
  • Add LogicalErrorValidator (pyflakes) and LintValidator (ruff)
  • TOML config loading (vallm.toml, [tool.vallm])
  • Pre-commit hook integration
  • GitHub Actions CI/CD
  • Refactoring: Split batch function (CC=42)
  • Refactoring: Modularize imports.py god module

v0.3 — Depth

  • AST edit distance via apted/zss
  • CodeBERTScore embedding similarity
  • NetworkX cycle detection and centrality in graph analysis
  • RegressionValidator (Tier 4) with pytest-json-report
  • TypeCheckValidator (mypy/pyright)
  • Refactoring: Extract output formatters

v0.4 — Intelligence

  • --fix auto-repair mode (LLM-based retry loop)
  • hypothesis/crosshair property-based test generation
  • E2B cloud sandbox backend
  • Streaming LLM output

See TODO.md for the full task breakdown.

License

Apache License 2.0 - see LICENSE for details.

Author

Created by Tom Sapletta - tom@sapletta.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vallm-0.1.10.tar.gz (231.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vallm-0.1.10-py3-none-any.whl (60.2 kB view details)

Uploaded Python 3

File details

Details for the file vallm-0.1.10.tar.gz.

File metadata

  • Download URL: vallm-0.1.10.tar.gz
  • Upload date:
  • Size: 231.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for vallm-0.1.10.tar.gz
Algorithm Hash digest
SHA256 694c2ff1a5138c271f2c3e058f3fc59aa43ee153ece7d9dab0a3ce00fd64b8fb
MD5 bf8c3a69177fa3db5f64e39c4e135c33
BLAKE2b-256 5c301f72d52b87fc3e7100d0f63d933db45711f72991e2e63f79e053d566839a

See more details on using hashes here.

File details

Details for the file vallm-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: vallm-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 60.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for vallm-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 52a17472966cc57ba848c158373f4f941353af57f2596a87db92ff2bae09def3
MD5 9add59c8e88c8b3576019fc68b76cb7d
BLAKE2b-256 45e6226ab0592f0b761b7a576e36f732d17b3f8170df67714b588473add1c335

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page