Find unused, redundant and orphaned Hiera data in a Puppet code tree
Project description
hiera-gc
Find unused, redundant and orphaned Hiera data in a deployed Puppet code tree.
hiera-gc is a static analyser. It runs fully offline against
/etc/puppetlabs/code (or a copy of it) and reports:
- Unused keys: data keys with no visible consumer. Consumers it
understands: automatic parameter lookup (class parameters),
lookup()/hiera*()/Deferred('lookup', ...)calls in manifests, EPP and ERB templates and Ruby plugins, and%{lookup(...)}/%{alias(...)}interpolation inside data itself. - Possibly used keys: keys it cannot prove unused, with the reason
(a dynamic
lookup("${var}::key")pattern match, alookup_optionsreference, ascope[...]variable read, or the key name appearing verbatim somewhere). - Stale parameters: keys shaped
class::paramwhere the class exists but the parameter does not. Strong removal candidates. - Redundant overrides: a key re-defined at a higher hierarchy
level with the same value it would resolve to anyway. The report
names the copy to remove. Keys consumed by merging lookups
(
hiera_hash,hiera_array, deeplookup_options) are excluded. - Shadowed definitions: a definition that can never win because an always-loaded higher-priority level defines a different value. Usually a latent bug.
- Orphaned data files: files no hierarchy path or glob can ever
load (including
.ymlvs.yamltraps). - Stale data files: group files (
nodegroups/<x>.yamletc.) whose hierarchy variable never takes that value in any manifest selector, andnodes/<fqdn>.yamlfiles matching no node definition (e.g. decommissioned hosts). Skipped when the evidence is not static.
Safety
The tool never prints data values: reports contain key names, file
paths, line numbers and reason descriptions only, so the report itself
is safe to share. eyaml ENC[...] values (GPG or PKCS7, in any file
extension) are treated as opaque; no decryption keys are needed or
used. Value comparisons for redundancy detection use SHA-256 digests.
A test suite canary asserts no value can reach the output.
Installation
Requires Python >= 3.10 and PyYAML. Either:
pip install .
or build a self-contained zipapp (PyYAML bundled) to copy onto a puppetserver:
make zipapp
scp dist/hiera-gc puppet:/tmp/
ssh puppet python3 /tmp/hiera-gc --stats
Usage
hiera-gc [--code-dir /etc/puppetlabs/code] \
[--global-hiera /etc/puppetlabs/puppet/hiera.yaml] \
[--env production ...] [--env-glob 'prod*'] \
[--env-dir /data/extra-envs ...] \
[--format text|json] [--output report.txt] \
[--show unused,possibly_used,redundant,...] \
[--fail-on unused,redundant] \
[--allowlist allow.txt] [--extra-datadir PATH] \
[--fix --env production] [--fix-kinds unused,redundant,...] [--dry-run] \
[--strict] [--stats] [-v]
- The tool reads the deployed tree: environments under
<code-dir>/environments, global modules at<code-dir>/modules, plus any datadir referenced by hiera.yaml files (absolute datadirs such as/etc/puppetlabs/code/hieradataare rebased under--code-dirwhen analysing a copied tree). --env-dir PATHadds another environments-root directory to search, like an extra entry on Puppet'senvironmentpath. Each root holds environment subdirectories; the default<code-dir>/environmentsis searched first, then each--env-dirin order. If the same environment name appears under more than one root the first wins (the rest are reported as shadowed), matching Puppet.--envand--env-globfilter across all roots.- When
--envor--env-globnarrows the run to a single environment, the report (and the exit status) covers only that environment's own files. Findings and warnings about shared, global or module data are visible to other environments the run did not analyse, so they are unreliable here as well as not fixable from this environment; they are listed by an all-environments run (no--envfilter, or one matching more than one environment) instead. Warnings about files outside the environment's own tree (a module's hiera.yaml, alookup()in a module manifest) are dropped the same way, except parse errors, which are kept because a file that fails to parse blinds the analysis. The report names the environment and how many findings it hid. This matches the scope of--fix. environment.confmodulepath entries (e.g.site:modules:$basemodulepath) are honoured.- Exit codes: 0 clean, 1 findings matched
--fail-on, 2 usage or (with--strict) parse errors. Diagnostics go to stderr; the report goes to stdout.
The allowlist file holds one Python regex per line (# comments);
keys whose full name matches are reported separately and never fail
the run. Use it for keys consumed by systems the analyser cannot see.
Allowlisted keys are never fixed.
Fixing findings
--fix removes fixable findings from the one environment named by
--env. It acts on exactly one environment per run, so --env is
mandatory and must name a single environment (--fix will not fix
every environment at once, and cannot be combined with --env-glob):
hiera-gc --fix --env production --dry-run # show what would change
hiera-gc --fix --env production --fail-on none # apply it
- Only data files inside that environment's own directory are
touched, so each run yields one reviewable commit in one repo.
Findings in shared, global or module layer data (visible to other
environments, and module data is usually vendored by r10k) are
listed as out of scope; run
--fix --env NAMEagain per environment for the rest. --fix-kindsselects what to fix:unused,stale_params(just the stale-parameter subset of unused),redundant(removes the higher-priority copy),orphansandstale_files(deletes the file). Default:unused,redundant,orphans,stale_files. Shadowed definitions are never auto-fixed; they are likely bugs and the right fix is ambiguous.- Key removals are line-based edits driven by the parsed YAML node
positions: comments, ordering, anchors and eyaml
ENC[...]values elsewhere in the file are preserved. Definitions that are not safe to cut out mechanically are skipped with a reason: values anchored and aliased by another key, duplicate top-level keys, keys introduced via YAML merge, flow-style root mappings (JSON) and files without line information. - With any parse errors in the run,
--fixrefuses to act: a file the analyser could not read may hold the only consumer of a key. - Exit codes are unchanged, so a fix run usually wants
--fail-on none. Run against a clean checkout and review the diff before pushing; the tool does not keep backups.
Before you delete anything
Treat the report as a list of removal candidates, not a verdict:
- Keys may be consumed by things outside the code tree: cron jobs
running
puppet lookup, monitoring scripts, other repositories. Grep your ops repos before deleting. lookup($variable)calls with runtime-built keys are reported as warnings; keys they consume cannot be detected.- A key reported as used via automatic parameter lookup only proves the class and parameter exist, not that the class is ever included on a node.
Remove data in small batches and watch catalog compilation (e.g. an r10k catalog-diff run) before merging.
Development
python3 -m venv .venv && .venv/bin/pip install -e . pytest
.venv/bin/pytest
tox # full matrix
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hiera_gc-0.1.0.tar.gz.
File metadata
- Download URL: hiera_gc-0.1.0.tar.gz
- Upload date:
- Size: 65.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37109a99caae5cb4d8504d62d07c3744c7e79df79383db411e33b0ae76df72e1
|
|
| MD5 |
1f9bdd6dbbcd9232936d15754c0263ef
|
|
| BLAKE2b-256 |
7b208562b5fdd7e2d1f784185aa03bc08e45ec4ed311a780923483ccf775fa14
|
File details
Details for the file hiera_gc-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hiera_gc-0.1.0-py3-none-any.whl
- Upload date:
- Size: 54.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ebbf9cbf3b7986b996dafd5fd97f15ab6b00b58dd2f2d1e3b4efff414de118e
|
|
| MD5 |
9e2664a5371aedbd6cc1e162e4d09ed1
|
|
| BLAKE2b-256 |
b75732879baf0c752a2ae0e4836273daf5f6b9541ca8635d4f6b1420eb42034d
|