Skip to main content

Find unused, redundant and orphaned Hiera data in a Puppet code tree

Project description

hiera-gc

Find unused, redundant and orphaned Hiera data in a deployed Puppet code tree.

hiera-gc is a static analyser. It runs fully offline against /etc/puppetlabs/code (or a copy of it) and reports:

  • Unused keys: data keys with no visible consumer. Consumers it understands: automatic parameter lookup (class parameters), lookup() / hiera*() / Deferred('lookup', ...) calls in manifests, EPP and ERB templates and Ruby plugins, and %{lookup(...)} / %{alias(...)} interpolation inside data itself.
  • Possibly used keys: keys it cannot prove unused, with the reason (a dynamic lookup("${var}::key") pattern match, a lookup_options reference, a scope[...] variable read, or the key name appearing verbatim somewhere).
  • Stale parameters: keys shaped class::param where the class exists but the parameter does not. Strong removal candidates.
  • Redundant overrides: a key re-defined at a higher hierarchy level with the same value it would resolve to anyway. The report names the copy to remove. Keys consumed by merging lookups (hiera_hash, hiera_array, deep lookup_options) are excluded.
  • Shadowed definitions: a definition that can never win because an always-loaded higher-priority level defines a different value. Usually a latent bug.
  • Orphaned data files: files no hierarchy path or glob can ever load (including .yml vs .yaml traps).
  • Stale data files: group files (nodegroups/<x>.yaml etc.) whose hierarchy variable never takes that value in any manifest selector, and nodes/<fqdn>.yaml files matching no node definition (e.g. decommissioned hosts). Skipped when the evidence is not static.

Safety

The tool never prints data values: reports contain key names, file paths, line numbers and reason descriptions only, so the report itself is safe to share. eyaml ENC[...] values (GPG or PKCS7, in any file extension) are treated as opaque; no decryption keys are needed or used. Value comparisons for redundancy detection use SHA-256 digests. A test suite canary asserts no value can reach the output.

Installation

Requires Python >= 3.10 and PyYAML. Either:

pip install .

or build a self-contained zipapp (PyYAML bundled) to copy onto a puppetserver:

make zipapp
scp dist/hiera-gc puppet:/tmp/
ssh puppet python3 /tmp/hiera-gc --stats

Usage

hiera-gc [--code-dir /etc/puppetlabs/code] \
         [--global-hiera /etc/puppetlabs/puppet/hiera.yaml] \
         [--env production ...] [--env-glob 'prod*'] \
         [--env-dir /data/extra-envs ...] \
         [--format text|json] [--output report.txt] \
         [--show unused,possibly_used,redundant,...] \
         [--fail-on unused,redundant] \
         [--allowlist allow.txt] [--extra-datadir PATH] \
         [--fix --env production] [--fix-kinds unused,redundant,...] [--dry-run] \
         [--strict] [--stats] [-v]
  • The tool reads the deployed tree: environments under <code-dir>/environments, global modules at <code-dir>/modules, plus any datadir referenced by hiera.yaml files (absolute datadirs such as /etc/puppetlabs/code/hieradata are rebased under --code-dir when analysing a copied tree).
  • --env-dir PATH adds another environments-root directory to search, like an extra entry on Puppet's environmentpath. Each root holds environment subdirectories; the default <code-dir>/environments is searched first, then each --env-dir in order. If the same environment name appears under more than one root the first wins (the rest are reported as shadowed), matching Puppet. --env and --env-glob filter across all roots.
  • When --env or --env-glob narrows the run to a single environment, the report (and the exit status) covers only that environment's own files. Findings and warnings about shared, global or module data are visible to other environments the run did not analyse, so they are unreliable here as well as not fixable from this environment; they are listed by an all-environments run (no --env filter, or one matching more than one environment) instead. Warnings about files outside the environment's own tree (a module's hiera.yaml, a lookup() in a module manifest) are dropped the same way, except parse errors, which are kept because a file that fails to parse blinds the analysis. The report names the environment and how many findings it hid. This matches the scope of --fix.
  • environment.conf modulepath entries (e.g. site:modules:$basemodulepath) are honoured.
  • Exit codes: 0 clean, 1 findings matched --fail-on, 2 usage or (with --strict) parse errors. Diagnostics go to stderr; the report goes to stdout.

The allowlist file holds one Python regex per line (# comments); keys whose full name matches are reported separately and never fail the run. Use it for keys consumed by systems the analyser cannot see. Allowlisted keys are never fixed.

Fixing findings

--fix removes fixable findings from the one environment named by --env. It acts on exactly one environment per run, so --env is mandatory and must name a single environment (--fix will not fix every environment at once, and cannot be combined with --env-glob):

hiera-gc --fix --env production --dry-run        # show what would change
hiera-gc --fix --env production --fail-on none   # apply it
  • Only data files inside that environment's own directory are touched, so each run yields one reviewable commit in one repo. Findings in shared, global or module layer data (visible to other environments, and module data is usually vendored by r10k) are listed as out of scope; run --fix --env NAME again per environment for the rest.
  • --fix-kinds selects what to fix: unused, stale_params (just the stale-parameter subset of unused), redundant (removes the higher-priority copy), orphans and stale_files (deletes the file). Default: unused,redundant,orphans,stale_files. Shadowed definitions are never auto-fixed; they are likely bugs and the right fix is ambiguous.
  • Key removals are line-based edits driven by the parsed YAML node positions: comments, ordering, anchors and eyaml ENC[...] values elsewhere in the file are preserved. Definitions that are not safe to cut out mechanically are skipped with a reason: values anchored and aliased by another key, duplicate top-level keys, keys introduced via YAML merge, flow-style root mappings (JSON) and files without line information.
  • With any parse errors in the run, --fix refuses to act: a file the analyser could not read may hold the only consumer of a key.
  • Exit codes are unchanged, so a fix run usually wants --fail-on none. Run against a clean checkout and review the diff before pushing; the tool does not keep backups.

Before you delete anything

Treat the report as a list of removal candidates, not a verdict:

  • Keys may be consumed by things outside the code tree: cron jobs running puppet lookup, monitoring scripts, other repositories. Grep your ops repos before deleting.
  • lookup($variable) calls with runtime-built keys are reported as warnings; keys they consume cannot be detected.
  • A key reported as used via automatic parameter lookup only proves the class and parameter exist, not that the class is ever included on a node.

Remove data in small batches and watch catalog compilation (e.g. an r10k catalog-diff run) before merging.

Development

python3 -m venv .venv && .venv/bin/pip install -e . pytest
.venv/bin/pytest
tox          # full matrix

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hiera_gc-0.1.0.tar.gz (65.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hiera_gc-0.1.0-py3-none-any.whl (54.6 kB view details)

Uploaded Python 3

File details

Details for the file hiera_gc-0.1.0.tar.gz.

File metadata

  • Download URL: hiera_gc-0.1.0.tar.gz
  • Upload date:
  • Size: 65.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hiera_gc-0.1.0.tar.gz
Algorithm Hash digest
SHA256 37109a99caae5cb4d8504d62d07c3744c7e79df79383db411e33b0ae76df72e1
MD5 1f9bdd6dbbcd9232936d15754c0263ef
BLAKE2b-256 7b208562b5fdd7e2d1f784185aa03bc08e45ec4ed311a780923483ccf775fa14

See more details on using hashes here.

File details

Details for the file hiera_gc-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hiera_gc-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 54.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hiera_gc-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9ebbf9cbf3b7986b996dafd5fd97f15ab6b00b58dd2f2d1e3b4efff414de118e
MD5 9e2664a5371aedbd6cc1e162e4d09ed1
BLAKE2b-256 b75732879baf0c752a2ae0e4836273daf5f6b9541ca8635d4f6b1420eb42034d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page