A complete toolkit for validating LLM-generated code
Project description
vallm
A complete toolkit for validating LLM-generated code.
vallm validates code proposals through a four-tier pipeline — from millisecond syntax checks to LLM-as-judge semantic review — before a single line ships.
Features
- Multi-language AST parsing via tree-sitter (165+ languages)
- Syntax validation with ast.parse (Python) and tree-sitter error detection
- Import resolution checking for Python, JavaScript/TypeScript, Go, Rust, Java, C/C++
- Complexity metrics via radon (Python) and lizard (16 languages)
- Security scanning with language-specific patterns and optional bandit integration
- LLM-as-judge semantic review via Ollama, litellm, or direct HTTP
- Code graph analysis — import/call graph diffing for structural regression detection
- AST similarity scoring with normalized fingerprinting
- Pluggy-based plugin system for custom validators
- Rich CLI with JSON/text output formats
Supported Languages
| Language | Syntax | Imports | Complexity | Security |
|---|---|---|---|---|
| Python | ✅ AST + tree-sitter | ✅ Full resolution (22 methods) | ✅ radon + lizard | ✅ bandit + patterns |
| JavaScript | ✅ tree-sitter | ✅ Node.js builtins | ✅ lizard | ✅ XSS, eval patterns |
| TypeScript | ✅ tree-sitter | ✅ Node.js builtins | ✅ lizard | ✅ XSS, eval patterns |
| Go | ✅ tree-sitter | ✅ stdlib + modules | ✅ lizard | ✅ SQL injection, exec |
| Rust | ✅ tree-sitter | ✅ crates | ✅ lizard | ✅ unsafe, unwrap |
| Java | ✅ tree-sitter | ✅ stdlib packages | ✅ lizard | ✅ Runtime.exec, SQL |
| C/C++ | ✅ tree-sitter | ✅ std headers | ✅ lizard | ✅ buffer overflow, system |
| Ruby | ✅ tree-sitter | ⚠️ Limited | ✅ lizard | ⚠️ Limited |
| PHP | ✅ tree-sitter | ⚠️ Limited | ✅ lizard | ⚠️ Limited |
| Swift | ✅ tree-sitter | ⚠️ Limited | ✅ lizard | ⚠️ Limited |
| Kotlin | ✅ tree-sitter | ⚠️ Limited | ✅ lizard | ⚠️ Limited |
| Scala | ✅ tree-sitter | ⚠️ Limited | ✅ lizard | ⚠️ Limited |
Installation
pip install vallm
With optional dependencies:
pip install vallm[all] # Everything
pip install vallm[llm] # Ollama + litellm for semantic review
pip install vallm[security] # bandit integration
pip install vallm[semantic] # CodeBERTScore
pip install vallm[graph] # NetworkX graph analysis
Quick Start
Validate Entire Project
# Install with LLM support
pip install vallm[llm]
# Setup Ollama (for semantic review)
ollama pull qwen2.5-coder:7b
ollama serve
# Validate entire project recursively
vallm batch . --recursive --semantic --model qwen2.5-coder:7b
Python API
from vallm import Proposal, validate, VallmSettings
code = """
def fibonacci(n: int) -> list[int]:
if n <= 0:
return []
fib = [0, 1]
for i in range(2, n):
fib.append(fib[i-1] + fib[i-2])
return fib
"""
proposal = Proposal(code=code, language="python")
result = validate(proposal)
print(f"Verdict: {result.verdict.value}") # pass / review / fail
print(f"Score: {result.weighted_score:.2f}")
CLI Commands Reference
# Batch validation (best for entire projects)
vallm batch . --recursive --semantic --model qwen2.5-coder:7b
vallm batch src/ --recursive --include "*.py,*.js" --exclude "*/test/*"
vallm batch . --recursive --format json --fail-fast
vallm batch . --recursive --verbose --show-issues # Detailed per-file results
# Output formats for batch results
vallm batch . --recursive --format json # Machine-readable JSON
vallm batch . --recursive --format yaml # YAML format
vallm batch . --recursive --format toon # Compact TOON format
vallm batch . --recursive --format text # Plain text
# Single file validation
vallm validate --file mycode.py --semantic --model qwen2.5-coder:7b
vallm validate --file app.js --security
vallm validate --file mycode.py --format json # JSON output
# Quick syntax check only
vallm check mycode.py
vallm check src/main.go
# Configuration and info
vallm info
Batch Command Options
| Option | Short | Description |
|---|---|---|
--recursive |
-r |
Recurse into subdirectories |
--include |
File patterns to include (e.g., ".py,.js") | |
--exclude |
File patterns to exclude | |
--use-gitignore |
Respect .gitignore patterns (default: true) | |
--format |
-f |
Output format: rich, json, yaml, toon, text |
--fail-fast |
-x |
Stop on first failure |
--semantic |
Enable LLM-as-judge semantic review | |
--security |
Enable security checks | |
--model |
-m |
LLM model for semantic review |
--verbose |
-v |
Show detailed validation results for each file |
--show-issues |
-i |
Show issues for failed files |
With Ollama (LLM-as-judge)
# 1. Install and start Ollama
ollama pull qwen2.5-coder:7b
# 2. Run with semantic review
vallm validate --file mycode.py --semantic
from vallm import Proposal, validate, VallmSettings
settings = VallmSettings(
enable_semantic=True,
llm_provider="ollama",
llm_model="qwen2.5-coder:7b",
)
proposal = Proposal(
code=new_code,
language="python",
reference_code=existing_code, # optional: compare against reference
)
result = validate(proposal, settings)
Validation Pipeline
| Tier | Speed | Validators | What it catches |
|---|---|---|---|
| 1 | ms | syntax, imports | Parse errors, missing modules |
| 2 | seconds | complexity, security | High CC, dangerous patterns |
| 3 | seconds | semantic (LLM) | Logic errors, poor practices |
| 4 | minutes | regression (tests) | Behavioral regressions |
The pipeline fails fast — Tier 1 errors stop execution immediately.
Configuration
Via environment variables (VALLM_*), vallm.toml, or pyproject.toml [tool.vallm]:
# vallm.toml
pass_threshold = 0.8
review_threshold = 0.5
max_cyclomatic_complexity = 15
enable_semantic = true
llm_provider = "ollama"
llm_model = "qwen2.5-coder:7b"
Plugin System
Write custom validators using pluggy:
from vallm.hookspecs import hookimpl
from vallm.scoring import ValidationResult
class MyValidator:
tier = 2
name = "custom"
weight = 1.0
@hookimpl
def validate_proposal(self, proposal, context):
# Your validation logic
return ValidationResult(validator=self.name, score=1.0, weight=self.weight)
Register via pyproject.toml:
[project.entry-points."vallm.validators"]
custom = "mypackage.validators:MyValidator"
Multi-Language Support
vallm supports 30+ programming languages via tree-sitter parsers:
Auto-Detection
from vallm import detect_language, Language
# Auto-detect from file path
lang = detect_language("main.rs") # → Language.RUST
print(lang.display_name) # "Rust"
print(lang.is_compiled) # True
CLI with Auto-Detection
# Language auto-detected from file extension
vallm validate --file script.py # → Python
vallm check main.go # → Go
vallm validate --file lib.rs # → Rust
# Batch validation with mixed languages
vallm batch src/ --recursive --include "*.py,*.js,*.ts,*.go,*.rs"
Supported Languages
| Language | Category | Complexity | Syntax |
|---|---|---|---|
| Python | Scripting | ✓ radon + lizard | ✓ ast + tree-sitter |
| JavaScript | Web/Scripting | ✓ lizard | ✓ tree-sitter |
| TypeScript | Web/Scripting | ✓ lizard | ✓ tree-sitter |
| Go | Compiled | ✓ lizard | ✓ tree-sitter |
| Rust | Compiled | ✓ lizard | ✓ tree-sitter |
| Java | Compiled | ✓ lizard | ✓ tree-sitter |
| C/C++ | Compiled | ✓ lizard | ✓ tree-sitter |
| Ruby | Scripting | ✓ lizard | ✓ tree-sitter |
| PHP | Web | ✓ lizard | ✓ tree-sitter |
| Swift | Compiled | ✓ lizard | ✓ tree-sitter |
| + 20 more via tree-sitter | ✓ tree-sitter | ✓ tree-sitter |
See examples/07_multi_language/ for a comprehensive demo.
Examples
Each example lives in its own folder with main.py and README.md. Run all at once:
cd examples && ./run.sh
| Example | What it demonstrates |
|---|---|
01_basic_validation/ |
Default pipeline — good, bad, and complex code |
02_ast_comparison/ |
AST similarity scoring, tree-sitter multi-language parsing |
03_security_check/ |
Security pattern detection (eval, exec, hardcoded secrets) |
04_graph_analysis/ |
Import/call graph building and structural diffing |
05_llm_semantic_review/ |
Ollama Qwen 2.5 Coder 7B LLM-as-judge review |
06_multilang_validation/ |
JavaScript and C validation via tree-sitter |
07_multi_language/ |
Comprehensive multi-language support — 8+ languages with auto-detection |
08_code2llm_integration/ |
Project analysis integration with code2llm |
09_code2logic_integration/ |
Call graph analysis with code2logic |
10_mcp_ollama_demo/ |
MCP (Model Context Protocol) demo with Ollama |
11_claude_code_autonomous/ |
Autonomous refactoring with Claude Code |
12_ollama_simple_demo/ |
Simplified Ollama integration example |
Architecture
src/vallm/
├── cli.py # Typer CLI (401L, 8 methods, CC=42) - needs refactoring
├── config.py # pydantic-settings (VALLM_* env vars)
├── hookspecs.py # pluggy hook specifications
├── scoring.py # Weighted scoring + verdict engine (CC=18 validate function)
├── core/
│ ├── languages.py # Language enum, auto-detection, 30+ languages
│ ├── proposal.py # Proposal model
│ ├── ast_compare.py # tree-sitter + Python AST similarity
│ ├── graph_builder.py # Import/call graph construction
│ └── graph_diff.py # Before/after graph comparison
├── validators/
│ ├── syntax.py # Tier 1: ast.parse + tree-sitter (multi-lang)
│ ├── imports.py # Tier 1: module resolution (653L, 22 methods) - god module
│ ├── complexity.py # Tier 2: radon (Python) + lizard (16+ langs)
│ ├── security.py # Tier 2: patterns + bandit
│ └── semantic.py # Tier 3: LLM-as-judge
└── sandbox/
└── runner.py # subprocess / Docker execution
Code Health Metrics
Current codebase metrics (generated by code2llm analysis):
| Metric | Current | Target |
|---|---|---|
| Avg Cyclomatic Complexity (CC̄) | 3.5 | ≤2.4 |
| Max CC | 42 | ≤20 |
| God Modules (>500L) | 2 | 0 |
| High CC Functions (≥15) | 2 | ≤1 |
| Total Functions | 91 | - |
| Total Classes | 19 | - |
Critical Functions (CC ≥ 10):
| Function | Location | CC | Fan-out | Priority |
|---|---|---|---|---|
batch |
cli.py:140 |
42 | 34 | 🔴 Split immediately |
validate |
scoring.py:122 |
18 | 20 | 🟡 Refactor |
_check_lizard |
complexity.py |
12 | 9 | 🟡 Simplify |
_parse_response |
semantic.py |
12 | 17 | 🟡 Simplify |
God Modules:
src/vallm/validators/imports.py(653L, 22 methods, 22 dependent imports)src/vallm/cli.py(401L, 8 methods, CC=42)
See project/ directory for full analysis files:
analysis.toon- Health diagnostics and complexity metricsevolution.toon- Refactoring queue with ranked prioritiescontext.md- Architecture summary for LLM assistance
Roadmap
v0.2 — Completeness
- Wire pluggy plugin manager (entry_point-based validator discovery)
- Add LogicalErrorValidator (pyflakes) and LintValidator (ruff)
- TOML config loading (
vallm.toml,[tool.vallm]) - Pre-commit hook integration
- GitHub Actions CI/CD
- Refactoring: Split
batchfunction (CC=42) - Refactoring: Modularize
imports.pygod module
v0.3 — Depth
- AST edit distance via apted/zss
- CodeBERTScore embedding similarity
- NetworkX cycle detection and centrality in graph analysis
- RegressionValidator (Tier 4) with pytest-json-report
- TypeCheckValidator (mypy/pyright)
- Refactoring: Extract output formatters
v0.4 — Intelligence
--fixauto-repair mode (LLM-based retry loop)- hypothesis/crosshair property-based test generation
- E2B cloud sandbox backend
- Streaming LLM output
See TODO.md for the full task breakdown.
License
Apache License 2.0 - see LICENSE for details.
Author
Created by Tom Sapletta - tom@sapletta.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vallm-0.1.10.tar.gz.
File metadata
- Download URL: vallm-0.1.10.tar.gz
- Upload date:
- Size: 231.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
694c2ff1a5138c271f2c3e058f3fc59aa43ee153ece7d9dab0a3ce00fd64b8fb
|
|
| MD5 |
bf8c3a69177fa3db5f64e39c4e135c33
|
|
| BLAKE2b-256 |
5c301f72d52b87fc3e7100d0f63d933db45711f72991e2e63f79e053d566839a
|
File details
Details for the file vallm-0.1.10-py3-none-any.whl.
File metadata
- Download URL: vallm-0.1.10-py3-none-any.whl
- Upload date:
- Size: 60.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52a17472966cc57ba848c158373f4f941353af57f2596a87db92ff2bae09def3
|
|
| MD5 |
9add59c8e88c8b3576019fc68b76cb7d
|
|
| BLAKE2b-256 |
45e6226ab0592f0b761b7a576e36f732d17b3f8170df67714b588473add1c335
|