Nightjar — formal verification CLI for AI-generated code
Project description
╔╗╔╦╔═╗╦ ╦╔╦╗ ╦╔═╗╦═╗
║║║║║ ╦╠═╣ ║ ║╠═╣╠╦╝
╝╚╝╩╚═╝╩ ╩ ╩╚╝╩╩ ╩╩╚═
Your LLM writes code. Nightjar proves it.
Not tested. Proved.
$ nightjar verify --spec .card/payment.card.md
Stage 0 (preflight) PASS 12ms
Stage 1 (deps) PASS 45ms
Stage 2 (schema) PASS 23ms
Stage 3 (pbt) FAIL 340ms
INV-01 violated: counterexample x=0 -> ZeroDivisionError
Stage 4 (formal) SKIP
Result: 1 violation found
Trust: PROPERTY_VERIFIED (0.60)
Generate, verify, prove. In 60 seconds.
The problem
"I 'Accept All' always, I don't read the diffs anymore." -- Andrej Karpathy
84% of developers use AI coding tools. 96% don't fully trust what comes out (Sonar 2025). Only half even bother to verify before committing.
Meanwhile, 45% of AI-generated code ships with OWASP vulnerabilities (Veracode 2025). Georgetown CSET measured an 86% XSS failure rate. AI-coauthored code carries 2.74× higher vulnerability rates than human-written code — and 35 new CVEs from AI-generated code landed in March 2026 alone.
There's a gap between "the AI wrote it" and "you can trust it." Nightjar closes it with math.
"The limiting factor will not be the technology, but the culture change required for people to realise that formal methods have become viable in practice." — Martin Kleppmann, 2025
Nightjar's job is to make that culture change unnecessary. Vericoding — formally verified code generation — should require zero PhD and five minutes of setup.
"An AI that generates provably correct code is qualitatively different from one that merely generates plausible code." -- Leonardo de Moura, creator of Lean and Z3
What Nightjar does
pip install nightjar
nightjar verify --spec .card/payment.card.md
Five verification stages, cheapest first, short-circuit on failure:
| Stage | What | Time | What you get |
|---|---|---|---|
| 0. Preflight | Syntax, imports | <100ms | Valid Python |
| 1. Dependencies | CVE scan, SBOM | <500ms | No known vulns |
| 2. Schema | Type checking | <200ms | Type-correct |
| 3. Property | Hypothesis PBT | 300ms-8s | Statistical confidence |
| 4. Formal | Dafny/CrossHair | 1-30s | Mathematical proof |
Simple functions skip Dafny and go straight to CrossHair (about 70% faster). Complex ones get the full treatment. Nightjar figures out which is which by measuring cyclomatic complexity and AST depth.
The difference between finding bugs and proving they can't exist.
Quick start
# Install
pip install nightjar
# Describe what you want in plain English
nightjar auto "payment processor that charges credit cards"
# Verify the generated code
nightjar verify --spec .card/payment.card.md
# Or leave it running in the background
nightjar watch
You need Python 3.11+ and optionally Dafny 4.x for Stage 4. Without Dafny, verification gracefully falls back to CrossHair and Hypothesis. You still get a confidence score; it just won't hit 100.
See it work
# Generate a safe spec from plain English
$ nightjar auto "payment processor with balance >= 0 invariant"
Created spec: .card/payment.card.md (8 invariants, 6 approved)
# Verify the generated code
$ nightjar verify --spec .card/payment.card.md
VERIFIED -- all stages passed
# Fast check (skip Dafny)
$ nightjar verify --spec .card/payment.card.md --fast
VERIFIED -- all stages passed
What makes it different
We looked at what else exists. The short version: nobody else does formal proof + auto-invariants + watch mode in a single CLI tool.
| Nightjar | Cursor | Kiro | Tessl | Axiom | CodeRabbit | |
|---|---|---|---|---|---|---|
| Generates code | yes | yes | yes | yes | yes | no |
| Formal proof | yes | no | no | no | yes | no |
| Security proof | yes | no | no | no | no | no |
| Auto invariants | yes | no | partial | partial | no | no |
| Watch mode | yes | no | no | no | no | no |
| Safety gate | yes | no | no | no | no | no |
| Price | free | $20/mo | free | beta | enterprise | $15/seat |
| Open source | AGPL | no | no | partial | no | no |
Features worth knowing about
Shadow CI. A GitHub Action that runs verification on every PR but never fails the build. It just leaves a comment. The first time it catches a real SQL injection that passed all your tests, it sells itself.
- uses: nightjar/verify@v1
with:
mode: shadow
security-pack: owasp
Zero-friction mode. You type nightjar auto "payment processor" and it generates invariants from your description, classifies them (numerical, behavioral, state, formal), lets you approve each one, and writes the .card.md. The whole thing takes about 30 seconds of your time.
Watch mode. Runs in the background. When you save a .card.md file, it runs four tiers of verification (syntax in <100ms, structural in <2s, property in <10s, formal in 1-30s). Repeat runs with no changes finish in under 50ms thanks to Salsa-style per-stage caching.
Confidence score. Every verification produces a score from 0 to 100: pyright (+15), deal static analysis (+10), CrossHair symbolic (+35), Hypothesis PBT (+20), Dafny formal proof (+20). You always know exactly where you stand.
Safety gate. Before regenerating code, Nightjar compares new proofs against the previous verified state. If any invariants would be lost, it blocks the regeneration. If the confidence score drops, it warns you.
OWASP security pack. Invariant templates that formally prove the absence of SQL injection, XSS, and command injection. Not pattern-matching. Actual proof. The EU CRA compliance deadline is September 2026, and Nightjar can generate the structured compliance certificates you'll need.
Spec preprocessing. 19 deterministic rewrite rules (from Proven, MIT) normalize your specs before they touch the LLM. This roughly doubles Dafny success rates on local models, and pushes Claude Sonnet from 65% to 78%.
CEGIS retry. When Dafny fails, Nightjar parses the concrete counterexample and includes it in the retry prompt. "Your spec fails on input X=5, Y=-3 because..." is a lot more useful to an LLM than a generic error message.
The immune system
Nightjar learns from your production failures. When something breaks, it gets smarter.
Sentry errors + runtime traces
|
v
Collector (sys.monitoring, PEP 669, <5% overhead)
|
v
Miner (19 Ernst/Daikon templates)
|
v
Quality gate (Wonda scoring, filters out trivial invariants)
|
v
Adversarial debate (a skeptic agent tries to refute each candidate)
|
v
Houdini filter (Z3 finds the maximal inductive subset)
|
v
CrossHair + Hypothesis (verify with 1000+ inputs)
|
v
Auto-append to .card.md (your spec evolves)
Old invariants decay over time if no new observations confirm them. New observations can supersede stale ones. The spec stays current with how your code actually behaves.
Three tiers of mining run in parallel: semantic (LLM-based), runtime (Daikon + Houdini), and API-level (MINES, from OTel logs).
CLI
nightjar init [module] Scaffold a .card.md spec
nightjar auto "intent" Generate spec from plain English
nightjar generate Generate code from spec via LLM
nightjar verify Run the verification pipeline
nightjar verify --fast Stages 0-3 only (skip Dafny)
nightjar build Generate + verify + compile
nightjar ship Build + sign artifact
nightjar retry Counterexample-guided repair loop
nightjar lock Freeze deps into deps.lock
nightjar explain Root-cause diagnosis (LP dual variables)
nightjar watch Background daemon, 4-tier streaming
nightjar badge Shields.io badge from last verification
nightjar optimize DSPy SIMBA prompt optimization
nightjar immune Run immune system mining cycle
Architecture
Intent (.card.md)
|
+-> nightjar auto -> Generate invariants -> Approve (30s)
|
v
Spec Preprocessing (19 rewrite rules)
|
v
Generation (litellm -> any LLM)
|
v
Verification Pipeline
+-- Stage 0: Preflight (AST + YAML)
+-- Stage 1: Dependencies (pip-audit + SBOM)
+-- Stage 2: Schema (Pydantic v2)
+-- Stage 2.5: Negation-Proof (catch weak specs early)
+-- Stage 3: Property (Hypothesis PBT)
+-- Stage 4: Formal (Dafny + CrossHair + fallback)
|
v
Verified Artifact + Proof Certificate (.card/verify.json)
|
v
Immune System (Sentry + runtime mining -> stronger specs over time)
The .card.md format
You write specs. AI generates code. Nightjar proves the code matches the spec.
---
card-version: "1.0"
id: payment
title: Payment Processor
contract:
inputs:
- name: balance
type: float
constraints: ">= 0"
outputs:
- name: new_balance
type: float
constraints: ">= 0"
invariants:
- tier: formal
rule: "deduct(balance, amount) requires amount <= balance ensures result >= 0"
- tier: property
rule: "for any balance >= 0: deposit(balance, amount) > balance"
---
## Intent
A payment processor that charges credit cards safely.
Invariants come in three tiers. example generates unit tests (any developer can write these). property generates Hypothesis property-based tests (requires some testing experience). formal generates Dafny proofs or CrossHair symbolic checks (for security-critical code).
Model agnostic
All LLM calls go through litellm. Swap models with an environment variable:
NIGHTJAR_MODEL=claude-sonnet-4-6 # Default
NIGHTJAR_MODEL=deepseek/deepseek-chat # Budget option
NIGHTJAR_MODEL=openai/o3 # Premium option
MCP server
Nightjar is also an MCP server, so it works inside any IDE that supports the protocol: Cursor, Windsurf, Claude Code, VS Code.
Three tools: verify_contract (run verification), get_violations (get the report), suggest_fix (LLM-suggested repair).
Contractual computing
The idea behind Nightjar is that the contract (the .card.md spec) is the permanent artifact, not the code. Code gets regenerated every build. The spec is what survives.
This leads to some interesting properties: contracts are discoverable (the immune system mines them from runtime), transferable (any agent can verify any code against a spec), and they compound (your codebase gets safer the longer you run it, because more invariants accumulate).
We call this "contractual computing." As far as we can tell, nobody else has claimed the term.
Tech stack
| Component | Tool |
|---|---|
| Language | Python 3.11+ |
| CLI | Click |
| TUI | Textual |
| LLM | litellm |
| Formal verification | Dafny 4.x |
| Property-based testing | Hypothesis |
| Symbolic execution | CrossHair |
| Runtime contracts | icontract |
| Schema | Pydantic v2 |
| Invariant mining | sys.monitoring (PEP 669) |
| Invariant filtering | Z3 via Houdini algorithm |
| Quality scoring | Wonda-style AST normalization |
| Spec preprocessing | Proven (MIT) |
| File watching | watchdog |
| Error tracking | Sentry |
| MCP | MCP SDK |
References
The algorithms in Nightjar come from published papers. If you want to understand why something works the way it does, these are the places to look:
- Proven - spec preprocessing rewrite rules
- DafnyPro - diff-checker, invariant pruner, hint-augmentation
- VerMCTS - BFS proof search
- SpecLoop - CEGIS counterexample-guided retry
- Wonda - invariant quality scoring
- Ernst 2007 - dynamic invariant detection
- Houdini - greatest-fixpoint invariant filter
- SafePilot - complexity-discriminated routing
Full citation library: docs/REFERENCES.md
夜鹰 (Nightjar)
你的 LLM 写代码,夜鹰证明代码是正确的。
问题
84% 的开发者在使用 AI 编程工具,但 96% 的人不完全信任生成的代码。45% 的 AI 生成代码包含 OWASP 安全漏洞。我们写了越来越多的代码,却越来越少地去验证它。
夜鹰是什么
夜鹰是一个面向 AI 生成代码的形式化验证平台。你用 .card.md 文件描述意图和约束条件,AI 生成代码,夜鹰用数学方法证明代码满足所有约束。
核心流程:
# 安装
pip install nightjar
# 用自然语言描述你的需求
nightjar auto "支付处理器,余额不能为负"
# 验证生成的代码
nightjar verify --spec .card/payment.card.md
# 后台持续监控
nightjar watch
五阶段验证流水线
| 阶段 | 内容 | 耗时 | 保证 |
|---|---|---|---|
| 0. 预检 | 语法、导入 | <100ms | 合法 Python |
| 1. 依赖 | CVE 扫描 | <500ms | 无已知漏洞 |
| 2. 模式 | 类型检查 | <200ms | 类型正确 |
| 3. 属性 | Hypothesis PBT | 300ms-8s | 统计覆盖 |
| 4. 形式化 | Dafny/CrossHair | 1-30s | 数学证明 |
核心功能
影子 CI - GitHub Action,只报告不阻塞构建。第一次捕获真正的 SQL 注入时,它就会成为团队标配。
零摩擦模式 - nightjar auto "你的需求" 自动生成所有不变式,30 秒内完成审批。
监控模式 - 保存文件时自动验证,首次反馈 <100ms。
免疫系统 - 从生产故障中学习。Sentry 错误自动进入挖掘流水线,生成新的不变式。每次失败都让下次构建更安全。
OWASP 安全包 - 形式化证明不存在 SQL 注入、XSS、命令注入。不是模式匹配,是数学证明。
安全闸门 - 如果重新生成的代码丢失了之前已证明的不变式,夜鹰会阻止部署。
合约计算
夜鹰实现了"合约计算"范式: 合约 (.card.md) 是唯一的永久产物,代码每次构建都重新生成。合约可以被发现 (免疫系统从运行时挖掘)、可以转移 (任何代理都能验证任何代码)、可以累积 (代码库随时间变得更安全)。
不是测试。是证明。
Verified with Nightjar
Nightjar's own parser and spec engine are verified using Nightjar itself.
nightjar verify --spec .card/parser.card.md --fast
# ✓ Stage 0: preflight 12ms
# ✓ Stage 1: deps 8ms
# ✓ Stage 2: schema 3ms
# ✓ Stage 3: property-tests 847ms
# VERIFIED — 4 stages passed
Commercial License
Nightjar is AGPL-3.0. If your organization cannot comply with AGPL (e.g., embedding Nightjar in a proprietary product), a commercial license is available.
Pricing: $2,400/yr (teams) · $12,000/yr (enterprise) Contact: nightjar-license@proton.me
One enterprise deal funds months of development. The AGPL is not punitive — it's a structural incentive to keep the ecosystem open.
License
AGPL-3.0. Free for open source. Commercial license for enterprises — see above.
Support
- GitHub Sponsors — $5/mo supporter · $25/mo cloud beta · $99/mo priority support
- Commercial license for enterprise embedding
- Star the repo if Nightjar saved you from a bug
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nightjarzzz-0.1.0.tar.gz.
File metadata
- Download URL: nightjarzzz-0.1.0.tar.gz
- Upload date:
- Size: 239.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8466f5dbfcded50676e1db4a46a6521a5b778abe17c5fa891ac66c58b54f049d
|
|
| MD5 |
96d958841e71fd18b2b1e654528e5495
|
|
| BLAKE2b-256 |
1c4d60a71f32e7accc7b269bf13677fdae37a4711c6e2dd37695feba0fde6e78
|
File details
Details for the file nightjarzzz-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nightjarzzz-0.1.0-py3-none-any.whl
- Upload date:
- Size: 266.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c01436b1429bbfb74ab0d0111e8ce614173a4d96cc2be52e7128aad6b093b72a
|
|
| MD5 |
2ae0158bafe0963a9e602ab644cd2167
|
|
| BLAKE2b-256 |
2554960a62326c6b6892bbb0b0a2014b43cbe9b4eb25f12ddd9c39bebff317c8
|