nightjarzzz

Nightjar — formal verification CLI for AI-generated code

These details have not been verified by PyPI

Project links

Project description

    ╔╗╔╦╔═╗╦ ╦╔╦╗ ╦╔═╗╦═╗
    ║║║║║ ╦╠═╣ ║  ║╠═╣╠╦╝
    ╝╚╝╩╚═╝╩ ╩ ╩╚╝╩╩ ╩╩╚═

Your LLM writes code. Nightjar proves it.

Not tested. Proved.

$ nightjar verify --spec .card/payment.card.md

  Stage 0 (preflight)    PASS    12ms
  Stage 1 (deps)         PASS    45ms
  Stage 2 (schema)       PASS    23ms
  Stage 3 (pbt)          FAIL    340ms
    INV-01 violated: counterexample x=0 -> ZeroDivisionError
  Stage 4 (formal)       SKIP

  Result: 1 violation found
  Trust: PROPERTY_VERIFIED (0.60)

_{Generate, verify, prove. In 60 seconds.}

The problem

"I 'Accept All' always, I don't read the diffs anymore." -- Andrej Karpathy

84% of developers use AI coding tools. 96% don't fully trust what comes out (Sonar 2025). Only half even bother to verify before committing.

Meanwhile, 45% of AI-generated code ships with OWASP vulnerabilities (Veracode 2025). Georgetown CSET measured an 86% XSS failure rate. AI-coauthored code carries 2.74× higher vulnerability rates than human-written code — and 35 new CVEs from AI-generated code landed in March 2026 alone.

There's a gap between "the AI wrote it" and "you can trust it." Nightjar closes it with math.

"The limiting factor will not be the technology, but the culture change required for people to realise that formal methods have become viable in practice." — Martin Kleppmann, 2025

Nightjar's job is to make that culture change unnecessary. Vericoding — formally verified code generation — should require zero PhD and five minutes of setup.

"An AI that generates provably correct code is qualitatively different from one that merely generates plausible code." -- Leonardo de Moura, creator of Lean and Z3

What Nightjar does

pip install nightjar
nightjar verify --spec .card/payment.card.md

Five verification stages, cheapest first, short-circuit on failure:

Stage	What	Time	What you get
0. Preflight	Syntax, imports	<100ms	Valid Python
1. Dependencies	CVE scan, SBOM	<500ms	No known vulns
2. Schema	Type checking	<200ms	Type-correct
3. Property	Hypothesis PBT	300ms-8s	Statistical confidence
4. Formal	Dafny/CrossHair	1-30s	Mathematical proof

Simple functions skip Dafny and go straight to CrossHair (about 70% faster). Complex ones get the full treatment. Nightjar figures out which is which by measuring cyclomatic complexity and AST depth.

The difference between finding bugs and proving they can't exist.

Quick start

# Install
pip install nightjar

# Describe what you want in plain English
nightjar auto "payment processor that charges credit cards"

# Verify the generated code
nightjar verify --spec .card/payment.card.md

# Or leave it running in the background
nightjar watch

You need Python 3.11+ and optionally Dafny 4.x for Stage 4. Without Dafny, verification gracefully falls back to CrossHair and Hypothesis. You still get a confidence score; it just won't hit 100.

See it work

# Generate a safe spec from plain English
$ nightjar auto "payment processor with balance >= 0 invariant"
Created spec: .card/payment.card.md (8 invariants, 6 approved)

# Verify the generated code
$ nightjar verify --spec .card/payment.card.md
VERIFIED -- all stages passed

# Fast check (skip Dafny)
$ nightjar verify --spec .card/payment.card.md --fast
VERIFIED -- all stages passed

What makes it different

We looked at what else exists. The short version: nobody else does formal proof + auto-invariants + watch mode in a single CLI tool.

	Nightjar	Cursor	Kiro	Tessl	Axiom	CodeRabbit
Generates code	yes	yes	yes	yes	yes	no
Formal proof	yes	no	no	no	yes	no
Security proof	yes	no	no	no	no	no
Auto invariants	yes	no	partial	partial	no	no
Watch mode	yes	no	no	no	no	no
Safety gate	yes	no	no	no	no	no
Price	free	$20/mo	free	beta	enterprise	$15/seat
Open source	AGPL	no	no	partial	no	no

Features worth knowing about

Shadow CI. A GitHub Action that runs verification on every PR but never fails the build. It just leaves a comment. The first time it catches a real SQL injection that passed all your tests, it sells itself.

- uses: nightjar/verify@v1
  with:
    mode: shadow
    security-pack: owasp

Zero-friction mode. You type nightjar auto "payment processor" and it generates invariants from your description, classifies them (numerical, behavioral, state, formal), lets you approve each one, and writes the .card.md. The whole thing takes about 30 seconds of your time.

Watch mode. Runs in the background. When you save a .card.md file, it runs four tiers of verification (syntax in <100ms, structural in <2s, property in <10s, formal in 1-30s). Repeat runs with no changes finish in under 50ms thanks to Salsa-style per-stage caching.

Confidence score. Every verification produces a score from 0 to 100: pyright (+15), deal static analysis (+10), CrossHair symbolic (+35), Hypothesis PBT (+20), Dafny formal proof (+20). You always know exactly where you stand.

Safety gate. Before regenerating code, Nightjar compares new proofs against the previous verified state. If any invariants would be lost, it blocks the regeneration. If the confidence score drops, it warns you.

OWASP security pack. Invariant templates that formally prove the absence of SQL injection, XSS, and command injection. Not pattern-matching. Actual proof. The EU CRA compliance deadline is September 2026, and Nightjar can generate the structured compliance certificates you'll need.

Spec preprocessing. 19 deterministic rewrite rules (from Proven, MIT) normalize your specs before they touch the LLM. This roughly doubles Dafny success rates on local models, and pushes Claude Sonnet from 65% to 78%.

CEGIS retry. When Dafny fails, Nightjar parses the concrete counterexample and includes it in the retry prompt. "Your spec fails on input X=5, Y=-3 because..." is a lot more useful to an LLM than a generic error message.

The immune system

Nightjar learns from your production failures. When something breaks, it gets smarter.

Sentry errors + runtime traces
        |
        v
   Collector (sys.monitoring, PEP 669, <5% overhead)
        |
        v
   Miner (19 Ernst/Daikon templates)
        |
        v
   Quality gate (Wonda scoring, filters out trivial invariants)
        |
        v
   Adversarial debate (a skeptic agent tries to refute each candidate)
        |
        v
   Houdini filter (Z3 finds the maximal inductive subset)
        |
        v
   CrossHair + Hypothesis (verify with 1000+ inputs)
        |
        v
   Auto-append to .card.md (your spec evolves)

Old invariants decay over time if no new observations confirm them. New observations can supersede stale ones. The spec stays current with how your code actually behaves.

Three tiers of mining run in parallel: semantic (LLM-based), runtime (Daikon + Houdini), and API-level (MINES, from OTel logs).

CLI

nightjar init [module]       Scaffold a .card.md spec
nightjar auto "intent"       Generate spec from plain English
nightjar generate            Generate code from spec via LLM
nightjar verify              Run the verification pipeline
nightjar verify --fast       Stages 0-3 only (skip Dafny)
nightjar build               Generate + verify + compile
nightjar ship                Build + sign artifact
nightjar retry               Counterexample-guided repair loop
nightjar lock                Freeze deps into deps.lock
nightjar explain             Root-cause diagnosis (LP dual variables)
nightjar watch               Background daemon, 4-tier streaming
nightjar badge               Shields.io badge from last verification
nightjar optimize            DSPy SIMBA prompt optimization
nightjar immune              Run immune system mining cycle

Architecture

Intent (.card.md)
    |
    +-> nightjar auto -> Generate invariants -> Approve (30s)
    |
    v
Spec Preprocessing (19 rewrite rules)
    |
    v
Generation (litellm -> any LLM)
    |
    v
Verification Pipeline
    +-- Stage 0: Preflight (AST + YAML)
    +-- Stage 1: Dependencies (pip-audit + SBOM)
    +-- Stage 2: Schema (Pydantic v2)
    +-- Stage 2.5: Negation-Proof (catch weak specs early)
    +-- Stage 3: Property (Hypothesis PBT)
    +-- Stage 4: Formal (Dafny + CrossHair + fallback)
    |
    v
Verified Artifact + Proof Certificate (.card/verify.json)
    |
    v
Immune System (Sentry + runtime mining -> stronger specs over time)

The .card.md format

You write specs. AI generates code. Nightjar proves the code matches the spec.

---
card-version: "1.0"
id: payment
title: Payment Processor
contract:
  inputs:
    - name: balance
      type: float
      constraints: ">= 0"
  outputs:
    - name: new_balance
      type: float
      constraints: ">= 0"
invariants:
  - tier: formal
    rule: "deduct(balance, amount) requires amount <= balance ensures result >= 0"
  - tier: property
    rule: "for any balance >= 0: deposit(balance, amount) > balance"
---

## Intent
A payment processor that charges credit cards safely.

Invariants come in three tiers. example generates unit tests (any developer can write these). property generates Hypothesis property-based tests (requires some testing experience). formal generates Dafny proofs or CrossHair symbolic checks (for security-critical code).

Model agnostic

All LLM calls go through litellm. Swap models with an environment variable:

NIGHTJAR_MODEL=claude-sonnet-4-6       # Default
NIGHTJAR_MODEL=deepseek/deepseek-chat  # Budget option
NIGHTJAR_MODEL=openai/o3               # Premium option

MCP server

Nightjar is also an MCP server, so it works inside any IDE that supports the protocol: Cursor, Windsurf, Claude Code, VS Code.

Three tools: verify_contract (run verification), get_violations (get the report), suggest_fix (LLM-suggested repair).

Contractual computing

The idea behind Nightjar is that the contract (the .card.md spec) is the permanent artifact, not the code. Code gets regenerated every build. The spec is what survives.

This leads to some interesting properties: contracts are discoverable (the immune system mines them from runtime), transferable (any agent can verify any code against a spec), and they compound (your codebase gets safer the longer you run it, because more invariants accumulate).

We call this "contractual computing." As far as we can tell, nobody else has claimed the term.

Tech stack

Component	Tool
Language	Python 3.11+
CLI	Click
TUI	Textual
LLM	litellm
Formal verification	Dafny 4.x
Property-based testing	Hypothesis
Symbolic execution	CrossHair
Runtime contracts	icontract
Schema	Pydantic v2
Invariant mining	sys.monitoring (PEP 669)
Invariant filtering	Z3 via Houdini algorithm
Quality scoring	Wonda-style AST normalization
Spec preprocessing	Proven (MIT)
File watching	watchdog
Error tracking	Sentry
MCP	MCP SDK

References

The algorithms in Nightjar come from published papers. If you want to understand why something works the way it does, these are the places to look:

Proven - spec preprocessing rewrite rules
DafnyPro - diff-checker, invariant pruner, hint-augmentation
VerMCTS - BFS proof search
SpecLoop - CEGIS counterexample-guided retry
Wonda - invariant quality scoring
Ernst 2007 - dynamic invariant detection
Houdini - greatest-fixpoint invariant filter
SafePilot - complexity-discriminated routing

Full citation library: docs/REFERENCES.md

夜鹰 (Nightjar)

你的 LLM 写代码，夜鹰证明代码是正确的。

问题

84% 的开发者在使用 AI 编程工具，但 96% 的人不完全信任生成的代码。45% 的 AI 生成代码包含 OWASP 安全漏洞。我们写了越来越多的代码，却越来越少地去验证它。

夜鹰是什么

夜鹰是一个面向 AI 生成代码的形式化验证平台。你用 .card.md 文件描述意图和约束条件，AI 生成代码，夜鹰用数学方法证明代码满足所有约束。

核心流程:

# 安装
pip install nightjar

# 用自然语言描述你的需求
nightjar auto "支付处理器，余额不能为负"

# 验证生成的代码
nightjar verify --spec .card/payment.card.md

# 后台持续监控
nightjar watch

五阶段验证流水线

阶段	内容	耗时	保证
0. 预检	语法、导入	<100ms	合法 Python
1. 依赖	CVE 扫描	<500ms	无已知漏洞
2. 模式	类型检查	<200ms	类型正确
3. 属性	Hypothesis PBT	300ms-8s	统计覆盖
4. 形式化	Dafny/CrossHair	1-30s	数学证明

核心功能

影子 CI - GitHub Action，只报告不阻塞构建。第一次捕获真正的 SQL 注入时，它就会成为团队标配。

零摩擦模式 - nightjar auto "你的需求" 自动生成所有不变式，30 秒内完成审批。

监控模式 - 保存文件时自动验证，首次反馈 <100ms。

免疫系统 - 从生产故障中学习。Sentry 错误自动进入挖掘流水线，生成新的不变式。每次失败都让下次构建更安全。

OWASP 安全包 - 形式化证明不存在 SQL 注入、XSS、命令注入。不是模式匹配，是数学证明。

安全闸门 - 如果重新生成的代码丢失了之前已证明的不变式，夜鹰会阻止部署。

合约计算

夜鹰实现了"合约计算"范式: 合约 (.card.md) 是唯一的永久产物，代码每次构建都重新生成。合约可以被发现 (免疫系统从运行时挖掘)、可以转移 (任何代理都能验证任何代码)、可以累积 (代码库随时间变得更安全)。

不是测试。是证明。

Verified with Nightjar

Nightjar's own parser and spec engine are verified using Nightjar itself.

nightjar verify --spec .card/parser.card.md --fast
# ✓ Stage 0: preflight   12ms
# ✓ Stage 1: deps         8ms
# ✓ Stage 2: schema       3ms
# ✓ Stage 3: property-tests  847ms
# VERIFIED — 4 stages passed

Commercial License

Nightjar is AGPL-3.0. If your organization cannot comply with AGPL (e.g., embedding Nightjar in a proprietary product), a commercial license is available.

Pricing: $2,400/yr (teams) · $12,000/yr (enterprise) Contact: nightjar-license@proton.me

One enterprise deal funds months of development. The AGPL is not punitive — it's a structural incentive to keep the ecosystem open.

License

AGPL-3.0. Free for open source. Commercial license for enterprises — see above.

Support

GitHub Sponsors — $5/mo supporter · $25/mo cloud beta · $99/mo priority support
Commercial license for enterprise embedding
Star the repo if Nightjar saved you from a bug

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Mar 29, 2026

This version

0.1.0

Mar 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nightjarzzz-0.1.0.tar.gz (239.0 kB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nightjarzzz-0.1.0-py3-none-any.whl (266.4 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file nightjarzzz-0.1.0.tar.gz.

File metadata

Download URL: nightjarzzz-0.1.0.tar.gz
Upload date: Mar 28, 2026
Size: 239.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nightjarzzz-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8466f5dbfcded50676e1db4a46a6521a5b778abe17c5fa891ac66c58b54f049d`
MD5	`96d958841e71fd18b2b1e654528e5495`
BLAKE2b-256	`1c4d60a71f32e7accc7b269bf13677fdae37a4711c6e2dd37695feba0fde6e78`

See more details on using hashes here.

File details

Details for the file nightjarzzz-0.1.0-py3-none-any.whl.

File metadata

Download URL: nightjarzzz-0.1.0-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 266.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nightjarzzz-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c01436b1429bbfb74ab0d0111e8ce614173a4d96cc2be52e7128aad6b093b72a`
MD5	`2ae0158bafe0963a9e602ab644cd2167`
BLAKE2b-256	`2554960a62326c6b6892bbb0b0a2014b43cbe9b4eb25f12ddd9c39bebff317c8`

See more details on using hashes here.

nightjarzzz 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Your LLM writes code. Nightjar proves it.

The problem

What Nightjar does

Quick start

See it work

What makes it different

Features worth knowing about

The immune system

CLI

Architecture

The .card.md format

Model agnostic

MCP server

Contractual computing

Tech stack

References

夜鹰 (Nightjar)

问题

夜鹰是什么

五阶段验证流水线

核心功能

合约计算

Verified with Nightjar

Commercial License

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes