A pydoclint-style metadata-quality linter for VGI workers.
Project description
vgi-lint
A pydoclint-style metadata-quality linter for VGI workers. It attaches to an
arbitrary VGI worker, reads everything the worker contributes through DuckDB
system tables, and reports quality findings — missing descriptions, undocumented
columns/functions, absent or malformed example queries, untagged objects, and
more — with a quality score, per-data-version baselines, and machine output for
coding agents.
It works with any VGI worker regardless of implementation language (Python, Go, Rust, Java, TypeScript, …): it treats the worker as a black box and inspects only what surfaces post-attach.
Install / run
uv sync # haybarn is RC-only; prerelease = "allow" is set
uv run vgi-lint --help
Quick start
# Lint a local subprocess worker
uv run vgi-lint 'uv run volcano_worker.py'
# Lint a no-auth HTTP worker
uv run vgi-lint http://localhost:9009
# Machine output for a coding agent / CI
uv run vgi-lint http://localhost:9009 --format agent
uv run vgi-lint http://localhost:9009 --format json
In a worker's own repo, add a [tool.vgi-lint-check] block (see vgi-lint init)
with a location, then just run vgi-lint with no arguments.
v1 supports local subprocess and no-auth HTTP workers. Authenticated (OAuth) workers are not yet supported.
What it checks
Object coverage: schemas, tables, views, columns, scalar/aggregate functions, macros, settings, pragmas, and constraints. Rule families:
| Family | Codes | Examples |
|---|---|---|
| Descriptions | VGI1xx | schema/table/view comment, vgi.description_llm, vgi.description_md |
| Columns | VGI2xx | column-comment coverage (tables and views), comment-not-echo |
| Functions | VGI3xx | description (+ quality), documented parameters, named arguments, examples |
| Tags | VGI4xx | required tag keys (opt-in), reserved-tag validity |
| Examples | VGI5xx | vgi.example_queries present, valid JSON, complete entries, catalog-qualified |
| Settings | VGI6xx | setting descriptions |
| Pragmas | VGI7xx | pragma descriptions |
| Constraints | VGI8xx | foreign-key/PK/check validity — references must point at real tables & columns |
| Structure | VGI11x | schema object-count cap (opt-in) |
| Execution | VGI9xx | example queries & CHECK constraints bind/execute (opt-in, --execute) |
See RULES.md for the full per-rule reference (codes, default
severities, and what each checks). Run vgi-lint rules to list them from your
installed version, or vgi-lint explain VGI112 for one.
Data versions
A VGI worker can publish multiple data versions whose metadata differs. The tool can lint one or all of them and compare quality across versions:
uv run vgi-lint versions <location> # list published versions
uv run vgi-lint <location> --data-version 2.0.0
uv run vgi-lint <location> --all-data-versions # per-version report + comparison
Baselines (grandfathering)
Adopt the linter on an existing worker without a wall of failures: record current
findings as a baseline, then fail CI only on new findings. Baselines are
per data version (<prefix>.<version>.json).
uv run vgi-lint <location> --baseline vgi-lint-baseline --update-baseline
uv run vgi-lint <location> --baseline vgi-lint-baseline --fail-on warning
Configuration
[tool.vgi-lint-check] in pyproject.toml (or a dedicated vgi-lint.toml):
[tool.vgi-lint-check]
location = "uv run worker.py"
select = ["ALL"]
ignore = ["VGI113"]
fail_on = "error"
[tool.vgi-lint-check.severity]
VGI201 = "error"
[tool.vgi-lint-check.options]
column_comment_min_ratio = 0.8
# Required tags are opt-in (empty by default) — set them if your workers have a
# tagging convention you want enforced:
# required_schema_tags = ["provider", "domain"]
[tool.vgi-lint-check.per-object]
"volcanos.hans.*" = { ignore = ["VGI112"] }
Precedence: defaults < pyproject.toml < vgi-lint.toml < CLI flags.
Exit codes
0 clean (or below --fail-on) · 1 config/tool error · 2 findings ≥
--fail-on (regressions only when a baseline is set) · 3 connection error.
Security / trust boundary
A subprocess LOCATION is executed as a command to launch the worker (the
vgi extension spawns it). Treat location like any shell command: never pass
an attacker-controlled value, and in CI never derive it from untrusted input
(e.g. a fork PR title/branch). Prefer a fixed path or HTTP URL you control.
GitHub Action (reusable)
This repo ships a composite action so a worker repo can lint itself in CI with a
single step — it installs uv, runs the linter (the signed vgi community
extension is installed automatically), gates on fail-on, and posts the findings
to the job summary. Build the worker first, then point the action at it:
# .github/workflows/ci.yml — inside a job that has already built the worker
- name: VGI metadata quality
uses: Query-farm/vgi-lint-check@v1
with:
location: "$PWD/target/release/units-worker" # binary, command, or HTTP URL
fail-on: warning # info | warning | error | never
Gate releases harder than everyday CI — e.g. fail-on: warning on push/PR while
the worker's quality is being raised, and fail-on: error (plus execute: true)
in the publish workflow:
- uses: Query-farm/vgi-lint-check@v1
with:
location: "$PWD/target/release/units-worker"
fail-on: error
execute: true # also run example queries / CHECK constraints (VGI9xx)
Key inputs: location (required), fail-on (default error), version (pin the
linter, e.g. 0.2.0), working-directory, data-version / all-data-versions,
baseline, execute, format (terminal|json|agent|jsonl), config, args,
summary. The action's exit-code is exposed as an output. The action ref @v1
tracks the latest v1.x of the action; pin to a tag or SHA for full reproducibility.
Development
uv run pytest # unit tests (offline)
uv run pytest --run-live # also run live tests against real workers
uv build # build sdist + wheel into dist/
Releasing (GitHub Actions → PyPI)
Publishing is automated via GitHub Actions using PyPI Trusted Publishing (OIDC — no API token secret to store):
.github/workflows/ci.ymlruns the offline test suite (Python 3.11–3.13) and a smoke build on every push/PR..github/workflows/publish.ymlbuilds, validates (twine check), and uploads to PyPI when a GitHub Release is published. It first checks that the release tag matches theversioninpyproject.toml.
One-time setup on PyPI (Trusted Publisher), under the project's Publishing settings (use a "pending publisher" before the first release):
| Field | Value |
|---|---|
| Owner | Query-farm |
| Repository | vgi-lint-check |
| Workflow | publish.yml |
| Environment | pypi |
Also create a GitHub Environment named pypi in the repo settings (it gates the
publish job and is referenced for the OIDC claim).
To cut a release:
# bump version in pyproject.toml, commit, then tag + create the release
git tag v0.1.0 && git push origin v0.1.0
gh release create v0.1.0 --generate-notes
The release publishing event triggers the workflow. (Prefer a token instead of
OIDC? Replace the publish job's trusted-publishing step with
pypa/gh-action-pypi-publish configured with password: ${{ secrets.PYPI_API_TOKEN }}
and add that repository secret.)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vgi_lint_check-0.2.1.tar.gz.
File metadata
- Download URL: vgi_lint_check-0.2.1.tar.gz
- Upload date:
- Size: 44.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c674f38677360254d4f6756fc6f9fc138fcabc54a7a234ceddc2d174d3b12f20
|
|
| MD5 |
942f5f45429507b6a4fdecbeb07d85ac
|
|
| BLAKE2b-256 |
ba2522f23955c809529393b7323483d95579c6dc2fbd40e5529d25f650acc313
|
Provenance
The following attestation bundles were made for vgi_lint_check-0.2.1.tar.gz:
Publisher:
publish.yml on Query-farm/vgi-lint-check
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vgi_lint_check-0.2.1.tar.gz -
Subject digest:
c674f38677360254d4f6756fc6f9fc138fcabc54a7a234ceddc2d174d3b12f20 - Sigstore transparency entry: 1935418756
- Sigstore integration time:
-
Permalink:
Query-farm/vgi-lint-check@2e653607952639a4b288ae2d44c3f3d9804a28a2 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/Query-farm
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2e653607952639a4b288ae2d44c3f3d9804a28a2 -
Trigger Event:
release
-
Statement type:
File details
Details for the file vgi_lint_check-0.2.1-py3-none-any.whl.
File metadata
- Download URL: vgi_lint_check-0.2.1-py3-none-any.whl
- Upload date:
- Size: 60.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55d02372ad1f44a72f03eb07f335fd5dd4d7787e4de4be5118d4087f99a3dd79
|
|
| MD5 |
441f74ce58c4095963c0c00b2378aedc
|
|
| BLAKE2b-256 |
e3d26e3b4ffe9e487b234dc430866846c7bfc501fd23e255ce2a5013b67e023a
|
Provenance
The following attestation bundles were made for vgi_lint_check-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on Query-farm/vgi-lint-check
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vgi_lint_check-0.2.1-py3-none-any.whl -
Subject digest:
55d02372ad1f44a72f03eb07f335fd5dd4d7787e4de4be5118d4087f99a3dd79 - Sigstore transparency entry: 1935418777
- Sigstore integration time:
-
Permalink:
Query-farm/vgi-lint-check@2e653607952639a4b288ae2d44c3f3d9804a28a2 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/Query-farm
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2e653607952639a4b288ae2d44c3f3d9804a28a2 -
Trigger Event:
release
-
Statement type: