Skip to main content

Nightjar — formal verification CLI for AI-generated code

Project description

    ╔╗╔╦╔═╗╦ ╦╔╦╗ ╦╔═╗╦═╗
    ║║║║║ ╦╠═╣ ║  ║╠═╣╠╦╝
    ╝╚╝╩╚═╝╩ ╩ ╩╚╝╩╩ ╩╩╚═
  

Your LLM writes code. Nightjar proves it.

Not tested. Proved.

Build Tests License Verified Python


$ nightjar verify --spec .card/payment.card.md

  Stage 0 (preflight)    PASS    12ms
  Stage 1 (deps)         PASS    45ms
  Stage 2 (schema)       PASS    23ms
  Stage 3 (pbt)          FAIL    340ms
    INV-01 violated: counterexample x=0 -> ZeroDivisionError
  Stage 4 (formal)       SKIP

  Result: 1 violation found
  Trust: PROPERTY_VERIFIED (0.60)

Generate, verify, prove. In 60 seconds.


The problem

"I 'Accept All' always, I don't read the diffs anymore." -- Andrej Karpathy

84% of developers use AI coding tools. 96% don't fully trust what comes out (Sonar 2025). Only half even bother to verify before committing.

Meanwhile, 45% of AI-generated code ships with OWASP vulnerabilities (Veracode 2025). Georgetown CSET measured an 86% XSS failure rate. AI-coauthored code carries 2.74× higher vulnerability rates than human-written code — and 35 new CVEs from AI-generated code landed in March 2026 alone.

There's a gap between "the AI wrote it" and "you can trust it." Nightjar closes it with math.

"The limiting factor will not be the technology, but the culture change required for people to realise that formal methods have become viable in practice."Martin Kleppmann, 2025

Nightjar's job is to make that culture change unnecessary. Vericoding — formally verified code generation — should require zero PhD and five minutes of setup.

"An AI that generates provably correct code is qualitatively different from one that merely generates plausible code." -- Leonardo de Moura, creator of Lean and Z3


What Nightjar does

pip install nightjar
nightjar verify --spec .card/payment.card.md

Five verification stages, cheapest first, short-circuit on failure:

Stage What Time What you get
0. Preflight Syntax, imports <100ms Valid Python
1. Dependencies CVE scan, SBOM <500ms No known vulns
2. Schema Type checking <200ms Type-correct
3. Property Hypothesis PBT 300ms-8s Statistical confidence
4. Formal Dafny/CrossHair 1-30s Mathematical proof

Simple functions skip Dafny and go straight to CrossHair (about 70% faster). Complex ones get the full treatment. Nightjar figures out which is which by measuring cyclomatic complexity and AST depth.

The difference between finding bugs and proving they can't exist.


Quick start

# Install
pip install nightjar

# Describe what you want in plain English
nightjar auto "payment processor that charges credit cards"

# Verify the generated code
nightjar verify --spec .card/payment.card.md

# Or leave it running in the background
nightjar watch

You need Python 3.11+ and optionally Dafny 4.x for Stage 4. Without Dafny, verification gracefully falls back to CrossHair and Hypothesis. You still get a confidence score; it just won't hit 100.


See it work

# Generate a safe spec from plain English
$ nightjar auto "payment processor with balance >= 0 invariant"
Created spec: .card/payment.card.md (8 invariants, 6 approved)

# Verify the generated code
$ nightjar verify --spec .card/payment.card.md
VERIFIED -- all stages passed

# Fast check (skip Dafny)
$ nightjar verify --spec .card/payment.card.md --fast
VERIFIED -- all stages passed

What makes it different

We looked at what else exists. The short version: nobody else does formal proof + auto-invariants + watch mode in a single CLI tool.

Nightjar Cursor Kiro Tessl Axiom CodeRabbit
Generates code yes yes yes yes yes no
Formal proof yes no no no yes no
Security proof yes no no no no no
Auto invariants yes no partial partial no no
Watch mode yes no no no no no
Safety gate yes no no no no no
Price free $20/mo free beta enterprise $15/seat
Open source AGPL no no partial no no

Features worth knowing about

Shadow CI. A GitHub Action that runs verification on every PR but never fails the build. It just leaves a comment. The first time it catches a real SQL injection that passed all your tests, it sells itself.

- uses: nightjar/verify@v1
  with:
    mode: shadow
    security-pack: owasp

Zero-friction mode. You type nightjar auto "payment processor" and it generates invariants from your description, classifies them (numerical, behavioral, state, formal), lets you approve each one, and writes the .card.md. The whole thing takes about 30 seconds of your time.

Watch mode. Runs in the background. When you save a .card.md file, it runs four tiers of verification (syntax in <100ms, structural in <2s, property in <10s, formal in 1-30s). Repeat runs with no changes finish in under 50ms thanks to Salsa-style per-stage caching.

Confidence score. Every verification produces a score from 0 to 100: pyright (+15), deal static analysis (+10), CrossHair symbolic (+35), Hypothesis PBT (+20), Dafny formal proof (+20). You always know exactly where you stand.

Safety gate. Before regenerating code, Nightjar compares new proofs against the previous verified state. If any invariants would be lost, it blocks the regeneration. If the confidence score drops, it warns you.

OWASP security pack. Invariant templates that formally prove the absence of SQL injection, XSS, and command injection. Not pattern-matching. Actual proof. The EU CRA compliance deadline is September 2026, and Nightjar can generate the structured compliance certificates you'll need.

Spec preprocessing. 19 deterministic rewrite rules (from Proven, MIT) normalize your specs before they touch the LLM. This roughly doubles Dafny success rates on local models, and pushes Claude Sonnet from 65% to 78%.

CEGIS retry. When Dafny fails, Nightjar parses the concrete counterexample and includes it in the retry prompt. "Your spec fails on input X=5, Y=-3 because..." is a lot more useful to an LLM than a generic error message.


The immune system

Nightjar learns from your production failures. When something breaks, it gets smarter.

Sentry errors + runtime traces
        |
        v
   Collector (sys.monitoring, PEP 669, <5% overhead)
        |
        v
   Miner (19 Ernst/Daikon templates)
        |
        v
   Quality gate (Wonda scoring, filters out trivial invariants)
        |
        v
   Adversarial debate (a skeptic agent tries to refute each candidate)
        |
        v
   Houdini filter (Z3 finds the maximal inductive subset)
        |
        v
   CrossHair + Hypothesis (verify with 1000+ inputs)
        |
        v
   Auto-append to .card.md (your spec evolves)

Old invariants decay over time if no new observations confirm them. New observations can supersede stale ones. The spec stays current with how your code actually behaves.

Three tiers of mining run in parallel: semantic (LLM-based), runtime (Daikon + Houdini), and API-level (MINES, from OTel logs).


CLI

nightjar init [module]       Scaffold a .card.md spec
nightjar auto "intent"       Generate spec from plain English
nightjar generate            Generate code from spec via LLM
nightjar verify              Run the verification pipeline
nightjar verify --fast       Stages 0-3 only (skip Dafny)
nightjar build               Generate + verify + compile
nightjar ship                Build + sign artifact
nightjar retry               Counterexample-guided repair loop
nightjar lock                Freeze deps into deps.lock
nightjar explain             Root-cause diagnosis (LP dual variables)
nightjar watch               Background daemon, 4-tier streaming
nightjar badge               Shields.io badge from last verification
nightjar optimize            DSPy SIMBA prompt optimization
nightjar immune              Run immune system mining cycle

Architecture

Intent (.card.md)
    |
    +-> nightjar auto -> Generate invariants -> Approve (30s)
    |
    v
Spec Preprocessing (19 rewrite rules)
    |
    v
Generation (litellm -> any LLM)
    |
    v
Verification Pipeline
    +-- Stage 0: Preflight (AST + YAML)
    +-- Stage 1: Dependencies (pip-audit + SBOM)
    +-- Stage 2: Schema (Pydantic v2)
    +-- Stage 2.5: Negation-Proof (catch weak specs early)
    +-- Stage 3: Property (Hypothesis PBT)
    +-- Stage 4: Formal (Dafny + CrossHair + fallback)
    |
    v
Verified Artifact + Proof Certificate (.card/verify.json)
    |
    v
Immune System (Sentry + runtime mining -> stronger specs over time)

The .card.md format

You write specs. AI generates code. Nightjar proves the code matches the spec.

---
card-version: "1.0"
id: payment
title: Payment Processor
contract:
  inputs:
    - name: balance
      type: float
      constraints: ">= 0"
  outputs:
    - name: new_balance
      type: float
      constraints: ">= 0"
invariants:
  - tier: formal
    rule: "deduct(balance, amount) requires amount <= balance ensures result >= 0"
  - tier: property
    rule: "for any balance >= 0: deposit(balance, amount) > balance"
---

## Intent
A payment processor that charges credit cards safely.

Invariants come in three tiers. example generates unit tests (any developer can write these). property generates Hypothesis property-based tests (requires some testing experience). formal generates Dafny proofs or CrossHair symbolic checks (for security-critical code).


Model agnostic

All LLM calls go through litellm. Swap models with an environment variable:

NIGHTJAR_MODEL=claude-sonnet-4-6       # Default
NIGHTJAR_MODEL=deepseek/deepseek-chat  # Budget option
NIGHTJAR_MODEL=openai/o3               # Premium option

MCP server

Nightjar is also an MCP server, so it works inside any IDE that supports the protocol: Cursor, Windsurf, Claude Code, VS Code.

Three tools: verify_contract (run verification), get_violations (get the report), suggest_fix (LLM-suggested repair).


Contractual computing

The idea behind Nightjar is that the contract (the .card.md spec) is the permanent artifact, not the code. Code gets regenerated every build. The spec is what survives.

This leads to some interesting properties: contracts are discoverable (the immune system mines them from runtime), transferable (any agent can verify any code against a spec), and they compound (your codebase gets safer the longer you run it, because more invariants accumulate).

We call this "contractual computing." As far as we can tell, nobody else has claimed the term.


Tech stack

Component Tool
Language Python 3.11+
CLI Click
TUI Textual
LLM litellm
Formal verification Dafny 4.x
Property-based testing Hypothesis
Symbolic execution CrossHair
Runtime contracts icontract
Schema Pydantic v2
Invariant mining sys.monitoring (PEP 669)
Invariant filtering Z3 via Houdini algorithm
Quality scoring Wonda-style AST normalization
Spec preprocessing Proven (MIT)
File watching watchdog
Error tracking Sentry
MCP MCP SDK

References

The algorithms in Nightjar come from published papers. If you want to understand why something works the way it does, these are the places to look:

  • Proven - spec preprocessing rewrite rules
  • DafnyPro - diff-checker, invariant pruner, hint-augmentation
  • VerMCTS - BFS proof search
  • SpecLoop - CEGIS counterexample-guided retry
  • Wonda - invariant quality scoring
  • Ernst 2007 - dynamic invariant detection
  • Houdini - greatest-fixpoint invariant filter
  • SafePilot - complexity-discriminated routing

Full citation library: docs/REFERENCES.md


夜鹰 (Nightjar)

你的 LLM 写代码,夜鹰证明代码是正确的。

问题

84% 的开发者在使用 AI 编程工具,但 96% 的人不完全信任生成的代码。45% 的 AI 生成代码包含 OWASP 安全漏洞。我们写了越来越多的代码,却越来越少地去验证它。

夜鹰是什么

夜鹰是一个面向 AI 生成代码的形式化验证平台。你用 .card.md 文件描述意图和约束条件,AI 生成代码,夜鹰用数学方法证明代码满足所有约束。

核心流程:

# 安装
pip install nightjar

# 用自然语言描述你的需求
nightjar auto "支付处理器,余额不能为负"

# 验证生成的代码
nightjar verify --spec .card/payment.card.md

# 后台持续监控
nightjar watch

五阶段验证流水线

阶段 内容 耗时 保证
0. 预检 语法、导入 <100ms 合法 Python
1. 依赖 CVE 扫描 <500ms 无已知漏洞
2. 模式 类型检查 <200ms 类型正确
3. 属性 Hypothesis PBT 300ms-8s 统计覆盖
4. 形式化 Dafny/CrossHair 1-30s 数学证明

核心功能

影子 CI - GitHub Action,只报告不阻塞构建。第一次捕获真正的 SQL 注入时,它就会成为团队标配。

零摩擦模式 - nightjar auto "你的需求" 自动生成所有不变式,30 秒内完成审批。

监控模式 - 保存文件时自动验证,首次反馈 <100ms。

免疫系统 - 从生产故障中学习。Sentry 错误自动进入挖掘流水线,生成新的不变式。每次失败都让下次构建更安全。

OWASP 安全包 - 形式化证明不存在 SQL 注入、XSS、命令注入。不是模式匹配,是数学证明。

安全闸门 - 如果重新生成的代码丢失了之前已证明的不变式,夜鹰会阻止部署。

合约计算

夜鹰实现了"合约计算"范式: 合约 (.card.md) 是唯一的永久产物,代码每次构建都重新生成。合约可以被发现 (免疫系统从运行时挖掘)、可以转移 (任何代理都能验证任何代码)、可以累积 (代码库随时间变得更安全)。

不是测试。是证明


Verified with Nightjar

Nightjar's own parser and spec engine are verified using Nightjar itself.

nightjar verify --spec .card/parser.card.md --fast
# ✓ Stage 0: preflight   12ms
# ✓ Stage 1: deps         8ms
# ✓ Stage 2: schema       3ms
# ✓ Stage 3: property-tests  847ms
# VERIFIED — 4 stages passed

Commercial License

Nightjar is AGPL-3.0. If your organization cannot comply with AGPL (e.g., embedding Nightjar in a proprietary product), a commercial license is available.

Pricing: $2,400/yr (teams) · $12,000/yr (enterprise) Contact: nightjar-license@proton.me

One enterprise deal funds months of development. The AGPL is not punitive — it's a structural incentive to keep the ecosystem open.


License

AGPL-3.0. Free for open source. Commercial license for enterprises — see above.


Support

  • GitHub Sponsors — $5/mo supporter · $25/mo cloud beta · $99/mo priority support
  • Commercial license for enterprise embedding
  • Star the repo if Nightjar saved you from a bug

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nightjarzzz-0.1.0.tar.gz (239.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nightjarzzz-0.1.0-py3-none-any.whl (266.4 kB view details)

Uploaded Python 3

File details

Details for the file nightjarzzz-0.1.0.tar.gz.

File metadata

  • Download URL: nightjarzzz-0.1.0.tar.gz
  • Upload date:
  • Size: 239.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nightjarzzz-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8466f5dbfcded50676e1db4a46a6521a5b778abe17c5fa891ac66c58b54f049d
MD5 96d958841e71fd18b2b1e654528e5495
BLAKE2b-256 1c4d60a71f32e7accc7b269bf13677fdae37a4711c6e2dd37695feba0fde6e78

See more details on using hashes here.

File details

Details for the file nightjarzzz-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nightjarzzz-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 266.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nightjarzzz-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c01436b1429bbfb74ab0d0111e8ce614173a4d96cc2be52e7128aad6b093b72a
MD5 2ae0158bafe0963a9e602ab644cd2167
BLAKE2b-256 2554960a62326c6b6892bbb0b0a2014b43cbe9b4eb25f12ddd9c39bebff317c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page