Skip to main content

Track configuration value origins and modification history through YAML parsing

Project description

herrkunft

From German "Herkunft" (origin, provenance)

Binder

Track configuration value origins and modification history through YAML parsing with modern Python best practices.

Overview

herrkunft is a standalone library extracted from esm_tools that provides transparent provenance tracking for configuration values loaded from YAML files. It tracks:

  • Where each value came from (file path, line number, column)
  • When it was set or modified
  • How conflicts were resolved using hierarchical categories
  • What the complete modification history is

Perfect for scientific computing, workflow configuration, and any application where configuration traceability matters.

Features

  • 🎯 Transparent Tracking: Values behave like normal Python types while tracking their provenance
  • 📍 Precise Location: Track exact file, line, and column for every configuration value
  • 🏗️ Hierarchical Resolution: Category-based conflict resolution (e.g., defaults < user < runtime)
  • 🔄 Modification History: Complete audit trail of all changes to configuration values
  • 🎨 Type-Safe: Full type hints and Pydantic validation throughout
  • 📝 YAML Round-Trip: Preserve provenance as comments when writing YAML
  • 🚀 Modern Python: Built with Pydantic 2.0, ruamel.yaml, and loguru
  • 📓 Interactive Docs: Try it in Binder without installing anything

Try It Now

Launch interactive notebooks in your browser (no installation required):

Binder

Installation

pip install herrkunft

For development:

pip install herrkunft[dev]

Quick Start

from provenance import load_yaml

# Load a configuration file with provenance tracking
config = load_yaml("config.yaml", category="defaults")

# Access values normally
database_url = config["database"]["url"]
print(database_url)  # postgresql://localhost/mydb

# Access provenance information
print(database_url.provenance.current.yaml_file)  # config.yaml
print(database_url.provenance.current.line)       # 15
print(database_url.provenance.current.column)     # 8

Hierarchical Configuration

from provenance import ProvenanceLoader

# Set up hierarchy: defaults < user < production
loader = ProvenanceLoader()

# Load multiple configs with different priorities
defaults = loader.load("defaults.yaml", category="defaults")
user_config = loader.load("user.yaml", category="user")
prod_config = loader.load("production.yaml", category="production")

# Merge with automatic conflict resolution
from provenance import HierarchyManager

hierarchy = HierarchyManager(["defaults", "user", "production"])
final_config = hierarchy.merge(defaults, user_config, prod_config)

# Production values override user values, which override defaults
# Full history is preserved in provenance

Save with Provenance Comments

from provenance import dump_yaml

# Save configuration with provenance as inline comments
dump_yaml(config, "output.yaml", include_provenance=True)

Output:

database:
  url: postgresql://localhost/mydb  # config.yaml:15:8
  port: 5432  # config.yaml:16:8

Architecture

herrkunft is built with modern Python best practices:

  • Pydantic 2.0: Type-safe data models and settings
  • ruamel.yaml: YAML parsing with position tracking and comment preservation
  • loguru: Simple, powerful logging
  • Type hints: Full typing support for IDE autocomplete and type checking

Core Components

herrkunft/
├── core/           # Provenance tracking and hierarchy management
├── types/          # Type wrappers (DictWithProvenance, etc.)
├── yaml/           # YAML loading and dumping
├── utils/          # Utilities for cleaning, validation, serialization
└── config/         # Library configuration and settings

Use Cases

Scientific Computing

Track which configuration file and parameters were used for each simulation run:

config = load_yaml("simulation.yaml")
run_simulation(config)

# Later, audit which file provided each parameter
for key, value in config.items():
    print(f"{key}: {value.provenance.current.yaml_file}")

Multi-Environment Configuration

Manage development, staging, and production configs with clear conflict resolution:

loader = ProvenanceLoader()
config = loader.load_multiple([
    ("defaults.yaml", "defaults"),
    ("production.yaml", "production"),
    ("secrets.yaml", "secrets"),  # Highest priority
])

Configuration Auditing

Export complete provenance history for compliance or debugging:

from provenance import to_json

# Export config with full provenance metadata
to_json_file(config, "audit.json")

Documentation

Full documentation is available at https://herrkunft.readthedocs.io

Development

Setup

git clone https://github.com/pgierz/herrkunft.git
cd herrkunft
pip install -e .[dev]

Testing

pytest                          # Run all tests
pytest --cov=provenance        # With coverage
pytest -v tests/test_core/     # Specific test directory

Code Quality

black provenance tests          # Format code
ruff provenance tests           # Lint
mypy provenance                 # Type check

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Authors

License

MIT License - see LICENSE for details.

Acknowledgments

Extracted from the esm_tools project, which provides workflow management for Earth System Models. The provenance tracking feature was originally developed to track configuration origins in complex HPC simulation workflows.

Related Projects

  • esm_tools - Earth System Model workflow management
  • OmegaConf - Hierarchical configuration (no provenance tracking)
  • Dynaconf - Settings management (no provenance tracking)
  • Hydra - Configuration framework (no detailed provenance)

Citation

If you use herrkunft in your research, please cite:

@software{herrkunft2024,
  title = {herrkunft: Configuration Provenance Tracking for Python},
  author = {Gierz, Paul and Andrés-Martínez, Miguel},
  year = {2024},
  url = {https://github.com/pgierz/herrkunft}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

herrkunft-0.2.0.tar.gz (75.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

herrkunft-0.2.0-py3-none-any.whl (87.1 kB view details)

Uploaded Python 3

File details

Details for the file herrkunft-0.2.0.tar.gz.

File metadata

  • Download URL: herrkunft-0.2.0.tar.gz
  • Upload date:
  • Size: 75.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for herrkunft-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ea2a45a408c08f744835b6547af61cd254b5753b4dffa6babee8494b514fa11e
MD5 a8f8d1b82f2dc9310f29ef36da136eed
BLAKE2b-256 a6ae461d04c4db6f288b847c03dad6073ed0f38634f4d2e361877c5743dbc0d7

See more details on using hashes here.

File details

Details for the file herrkunft-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: herrkunft-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 87.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for herrkunft-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5ccd1c768854faacd818e068192a25e0d40b61a3b1ee5e2f2b9e6edfa9fc9b9
MD5 b65216b01030705c47866f3844dc9641
BLAKE2b-256 13f94f2c151e7d7a496680b5d2350d6e379190cb390f5e10612789a66d81cf83

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page