Skip to main content

Modular configuration system with composable settings and environment variable overrides

Project description

DataKnobs Config

A modular, reusable configuration system for composable settings with environment variable overrides, file loading, and optional object construction helpers.

Features

  • Modular Configuration: Organize configurations by type with atomic configuration units
  • Multiple Input Formats: Load from YAML, JSON files, or Python dictionaries
  • Composable: Reference other configurations and compose complex setups
  • Environment Overrides: Override any configuration value via environment variables
  • Path Resolution: Automatically resolve relative paths to absolute
  • Object Construction: Optional helpers to build objects from configurations
  • Defaults Management: Global and type-specific default values
  • Caching: Cache constructed objects for efficiency

Installation

pip install dataknobs-config

Quick Start

from dataknobs_config import Config

# Load from dictionary
config = Config({
    "database": [
        {"name": "primary", "host": "localhost", "port": 5432},
        {"name": "secondary", "host": "backup.local", "port": 5433}
    ],
    "cache": [
        {"name": "redis", "host": "localhost", "port": 6379}
    ]
})

# Access configurations
primary_db = config.get("database", "primary")
print(primary_db["host"])  # localhost

# Load from file
config = Config.from_file("config.yaml")

# Load from multiple sources
config = Config("base.yaml", "overrides.json", {"extra": [...]})

Core Concepts

Atomic Configurations

Each configuration is an "atomic" unit - a dictionary of settings for a single object:

{
    "name": "primary",      # Optional, auto-generated if not provided
    "type": "database",     # Optional, inferred from parent key
    "host": "localhost",
    "port": 5432,
    # ... any other attributes
}

Configuration Structure

Internally, configurations are organized by type:

{
    "database": [           # Type name
        {...},              # Atomic config 1
        {...}               # Atomic config 2
    ],
    "cache": [
        {...}               # Atomic config
    ],
    "settings": {           # Special type for global settings
        "config_root": "/app/config",
        "default_timeout": 30
    }
}

String References (xref)

Reference other configurations using the xref format:

config = Config({
    "database": [
        {"name": "primary", "host": "db.example.com"}
    ],
    "api": [
        {
            "name": "main",
            "database": "xref:database[primary]"  # Reference
        }
    ]
})

# Resolve references
api = config.resolve_reference("xref:api[main]")
print(api["database"]["host"])  # db.example.com

Reference Formats

  • xref:type[name] - Reference by name
  • xref:type[0] - Reference by index
  • xref:type[-1] - Reference last item
  • xref:type - Reference first/only item

Environment Variable Overrides

Override any configuration value using environment variables:

export DATAKNOBS_DATABASE__PRIMARY__HOST=prod.example.com
export DATAKNOBS_DATABASE__PRIMARY__PORT=5433
export DATAKNOBS_CACHE__REDIS__TTL=7200
config = Config({
    "database": [{"name": "primary", "host": "localhost", "port": 5432}],
    "cache": [{"name": "redis", "ttl": 3600}]
})

# Environment variables automatically override values
db = config.get("database", "primary")
print(db["host"])  # prod.example.com
print(db["port"])  # 5433 (converted to int)

Environment Variable Format

  • Pattern: DATAKNOBS_<TYPE>__<NAME_OR_INDEX>__<ATTRIBUTE>
  • Nested attributes: DATAKNOBS_DATABASE__0__CONNECTION__TIMEOUT
  • Automatic type conversion for integers, floats, and booleans

File References

Reference external configuration files using the @ prefix:

# main.yaml
database:
  - "@database/primary.yaml"    # Load from file
  - "@database/secondary.yaml"

settings:
  config_root: /app/config       # Base path for relative references

Global Settings and Defaults

Configure global settings and defaults in the special settings section:

config = Config({
    "database": [{"name": "db1"}],
    "settings": {
        # Paths
        "config_root": "/app/config",           # Base path for "@"-prefixed config file references
        "global_root": "/app",                   # Base for path resolution (settings.path_resolution_attributes)
        "database.global_root": "/app/db",       # Type-specific base for path resolution
        
        # Path resolution (supports exact names and regex patterns)
        "path_resolution_attributes": [
            "config_path",                       # Exact match for all types
            "database.data_dir",                 # Exact match for database type only
            "/.*_path$/",                        # Regex: all attributes ending with "_path"
            "cache./.*_dir$/"                    # Regex: cache type attributes ending with "_dir"
        ],
        
        # Defaults
        "default_timeout": 30,                   # Global default
        "database.default_pool_size": 10        # Type-specific default
    }
})

Path Resolution

Automatically resolve relative paths to absolute:

config = Config({
    "database": [{
        "name": "db1",
        "data_dir": "./data",              # Relative path
        "backup_dir": "/abs/path"          # Absolute path unchanged
    }],
    "settings": {
        "global_root": "/app",              # Base for path resolution
        "path_resolution_attributes": ["data_dir", "backup_dir"]
    }
})

db = config.get("database", "db1")
print(db["data_dir"])     # /app/data (resolved)
print(db["backup_dir"])   # /abs/path (unchanged)

Object Construction (Optional)

Build objects directly from configurations:

# Using class attribute
config = Config({
    "database": [{
        "name": "primary",
        "class": "myapp.database.PostgreSQL",
        "host": "localhost",
        "port": 5432
    }]
})

# Build object
db = config.build_object("xref:database[primary]")
# Returns instance of myapp.database.PostgreSQL

# Using factory pattern
config = Config({
    "cache": [{
        "name": "redis",
        "factory": "myapp.cache.CacheFactory",
        "type": "redis",
        "host": "localhost"
    }]
})

cache = config.build_object("xref:cache[redis]")

Implementing Configurable Classes

from dataknobs_config import ConfigurableBase

class MyDatabase(ConfigurableBase):
    def __init__(self, host, port, **kwargs):
        self.host = host
        self.port = port
        
    @classmethod
    def from_config(cls, config):
        # Custom configuration logic
        return cls(**config)

Implementing Factories

from dataknobs_config import FactoryBase

class DatabaseFactory(FactoryBase):
    def create(self, **config):
        db_type = config.pop("type", "postgresql")
        if db_type == "postgresql":
            return PostgreSQL(**config)
        elif db_type == "mysql":
            return MySQL(**config)

Lazy Factory Access

# Configuration with factory
config = Config({
    "database": [{
        "name": "primary",
        "factory": "myapp.db.DatabaseFactory",
        "type": "postgresql",
        "host": "localhost"
    }]
})

# Get the factory instance (cached)
factory = config.get_factory("database", "primary")
db1 = factory.create(database="app1")
db2 = factory.create(database="app2")

# Or get an instance directly
db = config.get_instance("database", "primary", database="myapp")

API Reference

Config Class

class Config:
    def __init__(self, *sources, use_env=True)
    def from_file(cls, path) -> Config
    def from_dict(cls, data) -> Config
    
    # Access
    def get_types() -> List[str]
    def get_count(type_name: str) -> int
    def get_names(type_name: str) -> List[str]
    def get(type_name: str, name_or_index: Union[str, int] = 0) -> dict
    def set(type_name: str, name_or_index: Union[str, int], config: dict)
    
    # References
    def resolve_reference(ref: str) -> dict
    def build_reference(type_name: str, name_or_index: Union[str, int]) -> str
    
    # Merging
    def merge(other: Config, precedence: str = "first")
    
    # Export
    def to_dict() -> dict
    def to_file(path: Path, format: str = None)
    
    # Object Construction
    def build_object(ref: str, cache: bool = True, **kwargs) -> Any
    def clear_object_cache(ref: str = None)
    
    # Lazy Factory Access
    def get_factory(type_name: str, name_or_index: Union[str, int] = 0) -> Any
    def get_instance(type_name: str, name_or_index: Union[str, int] = 0, **kwargs) -> Any

Examples

Multi-Environment Configuration

# base.yaml
database:
  - name: primary
    host: localhost
    port: 5432

# production.yaml  
database:
  - name: primary
    host: prod.db.example.com
    pool_size: 50

# Load with overrides
config = Config("base.yaml", "production.yaml")

Service Discovery Integration

config = Config({
    "services": [
        {"name": "auth", "url": "http://auth:8000"},
        {"name": "api", "url": "http://api:8080"}
    ],
    "app": [{
        "name": "main",
        "auth_service": "xref:services[auth]",
        "api_service": "xref:services[api]"
    }]
})

app = config.resolve_reference("xref:app[main]")
# app["auth_service"]["url"] = "http://auth:8000"

Dynamic Configuration with Environment

# Development: export DATAKNOBS_DATABASE__PRIMARY__HOST=localhost
# Production:  export DATAKNOBS_DATABASE__PRIMARY__HOST=prod.db.aws.com

config = Config.from_file("config.yaml")
db = config.get("database", "primary")
# Automatically uses environment-appropriate host

Configuration Inheritance

For simple YAML/JSON configuration files with inheritance support, use InheritableConfigLoader:

from dataknobs_config import InheritableConfigLoader, load_config_with_inheritance

# Create a loader
loader = InheritableConfigLoader("./configs")

# Load configuration with inheritance
config = loader.load("my-domain")

Base Configuration

# configs/base.yaml
llm:
  provider: openai
  model: gpt-4
  temperature: 0.7

knowledge_base:
  chunk_size: 500
  overlap: 50

Child Configuration

# configs/domain.yaml
extends: base

llm:
  model: gpt-4-turbo  # Override just this field

domain_specific:
  feature_enabled: true

Environment Variable Substitution

# configs/production.yaml
extends: base

llm:
  api_key: ${OPENAI_API_KEY}
  model: ${LLM_MODEL:gpt-4}  # With default value

paths:
  data_dir: ${DATA_DIR:~/data}  # Supports ~ expansion

InheritableConfigLoader API

class InheritableConfigLoader:
    def __init__(self, config_dir: str | Path | None = None)

    # Load configuration with inheritance
    def load(
        self,
        name: str,
        use_cache: bool = True,
        substitute_vars: bool = True,
    ) -> dict[str, Any]

    # Load from specific file path
    def load_from_file(
        self,
        filepath: str | Path,
        substitute_vars: bool = True,
    ) -> dict[str, Any]

    # List available configurations
    def list_available(self) -> list[str]

    # Validate a configuration
    def validate(self, name: str) -> tuple[bool, str | None]

    # Clear cache
    def clear_cache(self, name: str | None = None) -> None

Convenience Function

from dataknobs_config import load_config_with_inheritance

# Quick one-liner for loading a config file
config = load_config_with_inheritance("configs/my-domain.yaml")

Utility Functions

from dataknobs_config import deep_merge, substitute_env_vars

# Deep merge two dictionaries
merged = deep_merge(base_dict, override_dict)

# Substitute environment variables in any data structure
result = substitute_env_vars({"key": "${MY_VAR:default}"})

Best Practices

  1. Use Type Organization: Group related configurations by type
  2. Leverage Defaults: Define common values in settings to avoid repetition
  3. Environment Overrides: Use for deployment-specific values (hosts, ports, credentials)
  4. File References: Split large configurations into manageable files
  5. Path Resolution: Use relative paths in configs for portability
  6. Object Caching: Enable caching for expensive object construction
  7. Use Inheritance: Create base configs and extend them for specific environments/domains

Testing

Run tests with pytest:

pytest tests/

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataknobs_config-0.3.3.tar.gz (52.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataknobs_config-0.3.3-py3-none-any.whl (38.2 kB view details)

Uploaded Python 3

File details

Details for the file dataknobs_config-0.3.3.tar.gz.

File metadata

  • Download URL: dataknobs_config-0.3.3.tar.gz
  • Upload date:
  • Size: 52.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dataknobs_config-0.3.3.tar.gz
Algorithm Hash digest
SHA256 15f538db8085e886d61b8ef3d57ae37092dfb591001ff7b059edc4905252fcd7
MD5 762b5b8cb0cac361acd67373cb5eab01
BLAKE2b-256 e0f429c8140c6358e89bd5472f90a442cec69bce2afe95bba9b8bf8284b2c49b

See more details on using hashes here.

File details

Details for the file dataknobs_config-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: dataknobs_config-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 38.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dataknobs_config-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4d875830155fe8d4eccf96ba388c783d7ea1737fe11460a13355940b3de5d590
MD5 624ce1ba13ca7db93a7bb5f971b63a2c
BLAKE2b-256 162861ee38278bba20a652f139aedcf32ed28c06d01a385ee5ffb21ebfc923e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page