A pluggable framework for parsing data tool artifacts into typed Python models — dbt-core first.
Project description
artifact-parser
A small, pluggable framework for turning the JSON artifacts that data tools spit out into typed, validated Python objects. Point it at a blob, get back a pydantic model — no manual key-spelunking, no guessing which schema version you're holding.
The framework is deliberately source-agnostic. Each plugin owns one family of artifacts and registers itself with a shared registry. The first one ships in the box: a full dbt-core parser (catalog, manifest, run-results, sources).
Install
uv add artifact-parser # or: pip install artifact-parser
Quick start
The headline entry point sniffs any supported artifact and routes it to the right plugin — you don't have to know what you're holding:
import json
from artifact_parser import parse
artifact = json.loads(open("target/manifest.json").read())
model = parse(artifact) # -> a ManifestV12 (or whatever version it is)
print(model.metadata.dbt_schema_version)
When you do know the artifact family, the dbt plugin's typed helpers are more precise (and give better editor autocomplete):
from artifact_parser.dbt import parse_manifest, parse_catalog
manifest = parse_manifest(json.loads(open("target/manifest.json").read()))
catalog = parse_catalog(json.loads(open("target/catalog.json").read()))
Hand it something it doesn't recognise and it tells you so, loudly, instead of returning a half-populated object:
from artifact_parser import parse, UnknownArtifactError
try:
parse({"metadata": {"dbt_schema_version": "made-up/v99.json"}})
except UnknownArtifactError as exc:
print(exc) # No registered parser recognises this artifact. Tried: dbt.
Supported dbt artifacts
| Artifact | Versions | Generic parser | Version-pinned parsers |
|---|---|---|---|
catalog |
v1 | parse_catalog |
parse_catalog_v1 |
manifest |
v1–v12 | parse_manifest |
parse_manifest_v1 … _v12 |
run-results |
v1–v6 | parse_run_results |
parse_run_results_v1 … _v6 |
sources |
v1–v3 | parse_sources |
parse_sources_v1 … _v3 |
Architecture
src/artifact_parser/
├── core/ # the framework — no knowledge of any specific tool
│ ├── base.py # BaseArtifactModel (shared pydantic root)
│ ├── parser.py # ArtifactParser (the plugin contract)
│ ├── registry.py # ParserRegistry + the shared `registry` instance
│ └── exceptions.py # ArtifactParserError + friends
└── dbt/ # the first plugin: dbt-core artifacts
├── plugin.py # DbtArtifactParser (implements ArtifactParser)
├── utils.py # schema-version sniffing
├── resources/ # committed dbt-core JSON schemas (codegen input)
└── generated/ # droppable, rebuilt by `codegen dbt`
├── parser.py # parse_<artifact>[_vN] public API
├── version_map.py# schema-version URL -> model class
└── models/ # typed pydantic models, one module per version
The generated code is walled off in generated/. You can rm -rf that whole
directory and rebuild it with codegen dbt (the package still imports while it's
gone — the dbt plugin just sits out until you regenerate).
The flow: a plugin answers "is this mine?" (can_parse) and "make it typed"
(parse). The registry tries plugins in registration order and returns the first
match. dbt registers itself on import, so parse(...) works out of the box.
Adding a new parser
The whole point of the core/ framework is that the second parser is cheap.
By hand:
- Create
src/artifact_parser/<tool>/. - Define your models on
BaseArtifactModel. - Implement
ArtifactParser(name,can_parse,parse) inplugin.py. - Register it in the package
__init__.py:registry.register(MyParser()). - Import your plugin from the top-level
artifact_parser/__init__.py.
That's it — parse() now routes matching artifacts to your plugin.
Development
This project uses uv and Task. Common targets:
| Goal | Task |
|---|---|
| Sync the environment | task install |
| Format + autofix | task format |
| Lint (format-check + ruff) | task lint |
| Run tests at 100% coverage | task test |
task --list shows everything. The test suite enforces 100% coverage of the
framework and dbt dispatch code (the generated dbt models are excluded — they're
schema, not logic). Beyond the synthetic fixtures, real artifacts from a live dbt
build live in tests/data/ and round-trip through the public parse() in
tests/artifact_parser/dbt/test_roundtrip.py — the only tests that exercise
populated nodes end to end.
One non-obvious rule the generator enforces: the generated models are relaxed to
pydantic extra="ignore" (not the extra="forbid" dbt's schemas imply), because
real artifacts carry fields the published schema omits. A strict model would
reject a perfectly good manifest.json. See CLAUDE.md for the why.
CI
GitHub Actions back the same gates:
| Workflow | What it does |
|---|---|
ci.yml |
Lint + 100%-coverage tests on Python 3.10–3.13, plus a codegen-in-sync job that fails if the committed generated/ drifts from a fresh regen. |
schema-watch.yml |
Weekly (and on demand): probes dbt's published schemas, regenerates, and opens a PR if a new version appeared. |
release.yml |
Build + coverage gate, then PyPI Trusted Publishing on a published Release (or TestPyPI via manual dispatch). |
Action versions and Python deps are kept current by Dependabot.
Agentic setup
This repo is wired for Claude Code: a project
CLAUDE.md, a parser-author subagent that owns src/, slash commands
(/test, /codegen), secret-blocking and post-edit lint hooks, and the
context7 MCP for pulling fresh library
docs. See CLAUDE.md for the full tour. It will not write your code for you, but
it tries hard to keep you from shipping a failing coverage gate.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file artifact_parser-1.0.0.tar.gz.
File metadata
- Download URL: artifact_parser-1.0.0.tar.gz
- Upload date:
- Size: 201.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73864c6b2bdcf761adcb4217125bb997d8f1b635f395d080fb4883623e5ac592
|
|
| MD5 |
bdedd48de0d229b293ebcab86b9c933c
|
|
| BLAKE2b-256 |
e5d3b8523c74bf12393a7e659b855cf3b42bb2b379acd64c9a95e0d2c6d51579
|
Provenance
The following attestation bundles were made for artifact_parser-1.0.0.tar.gz:
Publisher:
release.yml on datnguye/artifact-parser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
artifact_parser-1.0.0.tar.gz -
Subject digest:
73864c6b2bdcf761adcb4217125bb997d8f1b635f395d080fb4883623e5ac592 - Sigstore transparency entry: 1804356850
- Sigstore integration time:
-
Permalink:
datnguye/artifact-parser@d1e73b0d44c5a2d0d680546b25ef4daf7951fe0a -
Branch / Tag:
refs/tags/1.0.0 - Owner: https://github.com/datnguye
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d1e73b0d44c5a2d0d680546b25ef4daf7951fe0a -
Trigger Event:
release
-
Statement type:
File details
Details for the file artifact_parser-1.0.0-py3-none-any.whl.
File metadata
- Download URL: artifact_parser-1.0.0-py3-none-any.whl
- Upload date:
- Size: 261.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47a827e3dffc087ef04efb1119a7118018a1fb1a6fa508162e07e511518ac1d9
|
|
| MD5 |
4fca43e9a4f14d51d746e1fbb4403a86
|
|
| BLAKE2b-256 |
2a4119ecd43052c8185ebdefbf41f5a16fbf94348b7b3a4387d0db2752b7d5c4
|
Provenance
The following attestation bundles were made for artifact_parser-1.0.0-py3-none-any.whl:
Publisher:
release.yml on datnguye/artifact-parser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
artifact_parser-1.0.0-py3-none-any.whl -
Subject digest:
47a827e3dffc087ef04efb1119a7118018a1fb1a6fa508162e07e511518ac1d9 - Sigstore transparency entry: 1804357282
- Sigstore integration time:
-
Permalink:
datnguye/artifact-parser@d1e73b0d44c5a2d0d680546b25ef4daf7951fe0a -
Branch / Tag:
refs/tags/1.0.0 - Owner: https://github.com/datnguye
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d1e73b0d44c5a2d0d680546b25ef4daf7951fe0a -
Trigger Event:
release
-
Statement type: