Skip to main content

Simple and flexible dataclass configuration system

Project description

haven

PyPI version PyTest License: MIT

A modular dataclass configuration system

Haven is system for configuring applications using dataclasses and YAML. It strives to be relatively simple while still scaling to complex use cases.

Key Features

  • Builds plain dataclasses, so you can use all standard dataclass features, such as custom methods, __post_init__, etc.
  • Doesn't take over your CLI or impose certain structure on your program.
  • Support for parsing a wide variety of types and type hints, including optionals and unions.
  • Scales to projects with many config variations or sub-components using choice and plugin fields.
  • Easily link each config variation to the matching variation of your code using Component.

Tour

Basic example

@dataclass
class ModelConfig:
    num_layers: int = 5
    embed_dim: int = 512

@dataclass
class TrainConfig:
    workers: int = 5
    steps: list[int] = field(default_factory=lambda: [50, 100 150])
    model: ModelConfig = field(default_factory=ModelConfig)

# Load from yaml string
cfg = haven.load(TrainConfig, """
steps: [1,2,3]
model:
  num_layers: 16
""")
assert cfg.model.num_layers == 16

# Or load from file
with open("config.yaml") as f:
    cfg = haven.load(TrainConfig, f)

# Update using "dotlist" style overrides (e.g. from CLI args)
cfg = haven.update_from_dotlist(cfg, ["workers=3", "model.num_layers=2"])

# Print yaml
print(haven.dump(cfg))

Choice fields

More complex projects often want to support many variations for each application component. This can be accomplished through subclassing and choice fields.

@dataclass
class ModelConfig:
    name: str

# Two types of models
@dataclass
class GPT2Config(ModelConfig):
    num_layers: int

@dataclass
class Llama2Config(ModelConfig):
    embed_dim: int = 512

@dataclass
class TrainConfig:
    workers: int = 5
    steps: list[int] = field(default_factory=lambda: [50, 100 150])

    # Choose config class based on value of `ModelConfig.name`.
    model: ModelConfig = haven.choice(
        [GPT2Config, Llama2Config],
        key_field="name",
        default_factory=Llama2Config,
    )

# Load from yaml string
cfg = haven.load(TrainConfig, """
steps: [1,2,3]
model:
  name: GPT2Config
  num_layers: 16
""")
assert isinstance(cfg.model, GPT2Config)

Chocies can also be module + object paths that are imported lazily:

@dataclass
class TrainConfig:
    model: ModelConfig = haven.choice([
        "models.llama.Llama2Config",
        "models.gpt.GPT2Config",
    ])

The benefit of this style of configuration is that all of the available choices are documented directly in the config definition. This works well when there are a small to medium number of variations. For more flexibility, a plugin system is available:

@dataclass
class TrainConfig:
    model: ModelConfig = haven.plugin(
        discover_packages_path="mypackage.models",
        attr="MODEL_CONFIG",
    )

Each module under the mypackage.models namespace that contains the attribute MODEL_CONFIG will then be an available choice. The choice name is the same as the name of the module.

Components

The problem with choice fields alone is that typically, you want to run different code in your application depending on which variant of the config was selected. haven.Component provides a simple mechanism for linking each variation to a callable.

# Sample model definitions
class ModelBase(nn.Module):
    pass

class Llama2(Model):
    def __init__(self, cfg: Llama2Config):
        pass

class GPT(Model):
    def __init__(self, cfg: GPTConfig):
        pass

@dataclass
class TrainConfig:
    model: haven.Component[ModelConfig, ModelBase] = haven.choice([
        Llama2
        GPT,
    ])

cfg = haven.load(TrainConfig, "model: Llama2")

# Instantiate the chosen class, passing the appropriate config as the first arg.
model = cfg.model()
assert isinstance(model, Llama2)

The config dataclass to use for each variation is automatically derived from the type hint on the first argument of the callable.

More examples

See the examples directory in the source code for more complete examples.

API

Full documentation here.

Acknowledgements

This project is inspired by and borrows code from Pyrallis, SimpleParsing, and draccus, and Hydra

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haven_conf-0.0.1.tar.gz (25.3 kB view details)

Uploaded Source

Built Distribution

haven_conf-0.0.1-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file haven_conf-0.0.1.tar.gz.

File metadata

  • Download URL: haven_conf-0.0.1.tar.gz
  • Upload date:
  • Size: 25.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for haven_conf-0.0.1.tar.gz
Algorithm Hash digest
SHA256 15ae352dfa11730f52f638683d19674446e7adca583e26d8152e1cdff2603161
MD5 e51ea269c033889da06452d6e1d8e84b
BLAKE2b-256 dd9f78d01441a8bf9ba680219d925a7169e75ad22e412448e5e57276324c0bac

See more details on using hashes here.

File details

Details for the file haven_conf-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: haven_conf-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for haven_conf-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2d48b8505cbd2cd553048f575a57eb443fd1a02414b0251d9a71b6eb7c8635b6
MD5 d255a006e82376b1f2d69caec157b6d1
BLAKE2b-256 78300ba25fe18776a2f46123c1ce5d057766da86aaf652530859bdb0a50f497b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page