Skip to main content

A framework for simple dataclass-based configurations.

Project description

logo

Pyrallis - Simple Configuration with Dataclasses

Pyrausta (also called pyrallis (πυραλλίς), pyragones) is a mythological insect-sized dragon from Cyprus.

Pyrallis is a simple library, derived from simple-parsing, for automagically creating project configuration from a dataclass.

GIF

Why pyrallis?

With pyrallis your configuration is linked directly to your pre-defined dataclass, allowing you to easily create different configuration structures, including nested ones, using an object-oriented design. The parsed arguments are used to initialize your dataclass, giving you the typing hints and automatic code completion of a full dataclass object.

Getting to Know pyrallis in 5 Simple Steps 🐲

The best way to understand pyrallis is through examples, let's get started!

:dragon: 1/5 pyrallis.ArgumentParser for dataclass Parsing :dragon:

Creation of an argparse configuration is really simple, just use pyrallis.ArgumentParser on your predefined dataclass.

from dataclasses import dataclass, field
import pyrallis


@dataclass
class TrainConfig:
    """ Training config for Machine Learning """
    # The number of workers for training
    workers: int = field(default=8)
    # The experiment name
    exp_name: str = field(default='default_exp')


def main():
    cfg = pyrallis.ArgumentParser(config_class=TrainConfig).parse_args()
    print(f'Training {cfg.exp_name} with {cfg.workers} workers...')


if __name__ == '__main__':
    main()

Not familiar with dataclasses? you should probably check the Python Tutorial and come back here.

The config can then be parsed directly from command-line

$ python train_model.py --exp_name=my_first_model
Training my_first_model with 8 workers...

Oh, and pyrallis also generates an --help string automatically using the comments in your dataclass 🪄

$ python train_model.py --help
usage: train_model.py [-h] [--CONFIG str] [--workers int] [--exp_name str]

optional arguments:
  -h, --help      show this help message and exit
  --CONFIG str    Path for a config file to parse with pyrallis (default:
                  None)

TrainConfig ['options']:
   Training config for Machine Learning

  --workers int   The number of workers for training (default: 8)
  --exp_name str  The experiment name (default: default_exp)

:dragon: 2/5 The pyrallis.wrap Decorator :dragon:

The pyrallis.ArgumentParser syntax is too cumbersome?

def main():
    cfg = pyrallis.ArgumentParser(config_class=TrainConfig).parse_args()
    print(f'Training {cfg.exp_name} with {cfg.workers} workers...')

One can equiavlently use the pyrallis.wrap syntax 😎

@pyrallis.wrap()
def main(cfg: TrainConfig):
    # The decorator automagically uses the type hint to parsers arguments into TrainConfig
    print(f'Training {cfg.exp_name} with {cfg.workers} workers...')

We will use this syntax for the rest of our tutorial.

:dragon: 3/5 Better Configs Using Inherent dataclass Features :dragon:

When using a dataclass we can add additional functionality using existing dataclass features, such as the post_init mechanism or @properties :grin:

from dataclasses import dataclass, field
from pathlib import Path
from typing import Optional
import pyrallis


@dataclass
class TrainConfig:
    """ Training config for Machine Learning """
    # The number of workers for training
    workers: int = field(default=8)
    # The number of workers for evaluation
    eval_workers: Optional[int] = field(default=None)
    # The experiment name
    exp_name: str = field(default='default_exp')
    # The experiment root folder path
    exp_root: Path = field(default=Path('/share/experiments'))

    def __post_init__(self):
        # A builtin method of dataclasses, used for post-processing our configuration.
        self.eval_workers = self.eval_workers or self.workers

    @property
    def exp_dir(self) -> Path:
        # Properties are great for arguments that can be derived from existing ones
        return self.exp_root / self.exp_name


@pyrallis.wrap()
def main(cfg: TrainConfig):
    print(f'Training {cfg.exp_name}...')
    print(f'\tUsing {cfg.workers} workers and {cfg.eval_workers} evaluation workers')
    print(f'\tSaving to {cfg.exp_dir}')
$ python -m train_model.py --exp_name=my_second_exp --workers=42
Training my_second_exp...
    Using 42 workers and 42 evaluation workers
    Saving to /share/experiments/my_second_exp

Notice that in all examples we use the explicit dataclass.field syntax. This isn't a requirement of pyrallis but rather a style choice. As some of your arguments will probably require dataclass.field (mutable types for example) we find it cleaner to always use the same notation.

:dragon: 4/5 Building Hierarchical Configurations :dragon:

Sometimes configs get too complex for a flat hierarchy 😕, luckily pyrallis supports nested dataclasses 💥

@dataclass
class ComputeConfig:
    """ Config for training resources """
    # The number of workers for training
    workers: int = field(default=8)
    # The number of workers for evaluation
    eval_workers: Optional[int] = field(default=None)

    def __post_init__(self):
        # A builtin method of dataclasses, used for post-processing our configuration.
        self.eval_workers = self.eval_workers or self.workers


@dataclass
class LogConfig:
    """ Config for logging arguments """
    # The experiment name
    exp_name: str = field(default='default_exp')
    # The experiment root folder path
    exp_root: Path = field(default=Path('/share/experiments'))

    @property
    def exp_dir(self) -> Path:
        # Properties are great for arguments that can be derived from existing ones
        return self.exp_root / self.exp_name

# TrainConfig will be our main configuration class.
# Notice that default_factory is the standard way to initialize a class argument in dataclasses

@dataclass
class TrainConfig:
    log: LogConfig = field(default_factory=LogConfig)
    compute: ComputeConfig = field(default_factory=ComputeConfig)

@pyrallis.wrap()
def main(cfg: TrainConfig):
    print(f'Training {cfg.log.exp_name}...')
    print(f'\tUsing {cfg.compute.workers} workers and {cfg.compute.eval_workers} evaluation workers')
    print(f'\tSaving to {cfg.log.exp_dir}')

The argument parse will be updated accordingly

$ python train_model.py --log.exp_name=my_third_exp --compute.eval_workers=2
Training my_third_exp...
    Using 8 workers and 2 evaluation workers
    Saving to /share/experiments/my_third_exp

:dragon: 5/5 Easy Serialization with pyrallis.dump :dragon:

As your config get longer you will probably want to start working with configuration files. Pyrallis supports encoding a dataclass configuration into a yaml file 💾

The command pyrallis.dump(cfg, open('run_config.yaml','w')) will result in the following yaml file

compute:
  eval_workers: 2
  workers: 8
log:
  exp_name: my_third_exp
  exp_root: /share/experiments

pyrallis.dump extends yaml.dump and uses the same syntax.

Configuration files can also be loaded back into a dataclass, and can even be used together with the command-line arguments.

cfg = pyrallis.ArgumentParser(config_class=TrainConfig,
                              config_path='/share/configs/config.yaml').parse_args()
# or the decorator synrax
@pyrallis.wrap(config_path='/share/configs/config.yaml')

# or with the CONFIG argument
python my_script.py --log.exp_name=readme_exp --CONFIG=/share/configs/config.yaml

# Or if you just want to load from a .yaml without cmd parsing
cfg = pyrallis.load(TrainConfig, '/share/configs/config.yaml')

Command-line arguments have a higher priority and will override the configuration file

Finally, one can easily extend the serialization to support new types 🔥

# For decoding from cmd/yaml
pyrallis.decode.register(np.ndarray,np.asarray)

# For encoding to yaml 
pyrallis.encode.register(np.ndarray, lambda x: str(list(x)))

# Or with the wrapper version instead 
@pyrallis.encode.register
def encode_array(arr : np.ndarray) -> str:
    return str(list(arr))

🐲 That's it you are now a pyrallis expert! 🐲

Why Another Parsing Library?

XKCD 927 - Standards

XKCD 927 - Standards

The builtin argparse has many great features but is somewhat outdated :older_man: with one its greatest weakness being the lack of typing. This has led to the development of many great libraries tackling different weaknesses of argparse (shout out for all the great projects out there! You rock! :metal:).

In our case, we were looking for a library that would support the vanilla dataclass without requiring dedicated classes, and would have a loading interface from both command-line and files. The closest candidates were hydra and simple-parsing, but they weren't exactly what we were looking for. Below are the pros and cons from our perspective:

Hydra

A framework for elegantly configuring complex applications from Facebook Research.

  • Supports complex configuration from multiple files and allows for overriding them from command-line.
  • Does not support non-standard types, does not play nicely with datclass.__post_init__and requires a ConfigStore registration.

SimpleParsing

A framework for simple, elegant and typed Argument Parsing by Fabrice Normandin

  • Strong integration with argparse, support for nested configurations together with standard arguments.
  • No support for joint loading from command-line and files, dataclasses are still wrapped by a Namespace, requires dedicated classes for serialization.

We decided to create a simple hybrid of the two approaches, building from SimpleParsing with some hydra features in mind. The result, pyrallis, is a simple library that that is relatively low on features, but hopefully excels at what it does.

If pyrallis isn't what you're looking for we strongly advise you to give hydra and simpleParsing a try (where other interesting option include click, ext_argpase, jsonargparse, datargs and tap). If you do :heart: pyrallis then welcome aboard! We're gonna have a great journey together! :speedboat: :dragon_face:

Design Choices and Some More

Uniform Parsing Syntax

For parsing files we opted for yaml as our format of choice, following hydra, due to its concise format. Now, let us assume we have the following .yaml file which yaml successfully handles:

compute:
  worker_inds: [0,2,3]

Intuitively we would also want users to be able to use the same syntax

python my_app.py --compute.worker_inds=[0,2,3]

However, the more standard syntax for an argparse application would be

python my_app.py --compute.worker_inds 0 2 3

We decided to use the same syntax as in the yaml files to avoid confusion when loading from multiple sources.

Beware of Mutable Types (or use pyrallis.field)

Dataclasses are great (really!) but using mutable fields can sometimes be confusing. For example, say we try to code the following dataclass

@dataclass
class OptimConfig:
    worker_inds: List[int] = []
    # Or the more explicit version
    worker_inds: List[int] = field(default=[])

As [] is mutable we would actually initialize every instance of this dataclass with the same list instance, and thus is not allowed. Instead dataclasses would direct you the default_factory function, which calls a factory function for generating the field in every new instance of your dataclass.

worker_inds: List[int] = field(default_factory=list)

Now, this works great for empty collections, but what would be the alternative for

worker_inds: List[int] = field(default=[1,2,3])

Well, you would have to create a dedicated factory function that regenerates the object, for example

worker_inds: List[int] = field(default_factory=lambda : [1,2,3])

Kind of annoying and could be confusing for a new guest reading your code :confused: Now, while this isn't really related to parsing/configuration we decided it could be nice to offer a sugar-syntax for such cases as part of pyrallis

from pyrallis import field
worker_inds: List[int] = field(default=[1,2,3], is_mutable=True)

The pyrallis.field behaves like the regular dataclasses.field with an additional is_mutable flag. When toggled, the default_factory is created automatically, offering the same functionally with a more reader-friendly syntax.

TODOs:

  • Create documentation page?

Create a full documentation with mkdocs

  • Improve warnings and logs
  • Think on relative paths

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrallis-0.1.1.tar.gz (32.5 kB view details)

Uploaded Source

Built Distribution

pyrallis-0.1.1-py3-none-any.whl (32.6 kB view details)

Uploaded Python 3

File details

Details for the file pyrallis-0.1.1.tar.gz.

File metadata

  • Download URL: pyrallis-0.1.1.tar.gz
  • Upload date:
  • Size: 32.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for pyrallis-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a9e30955584d1db7275f4b60da40973596652f408c129c19e0b373b1f482c162
MD5 6c6a4b842a816c409e65583fe456f193
BLAKE2b-256 a571fe044e8a8cf9ca8905244b6d642cc2ab49bb09b561caec6c5d86c16083d1

See more details on using hashes here.

File details

Details for the file pyrallis-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pyrallis-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 32.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for pyrallis-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1f669b91af743e8f16c16f8b09e9f2f571d746c3b44c762282db3bf74e986890
MD5 5f975f1a80c6075fff2e034cba59010b
BLAKE2b-256 d934f7a9573c7e9210e27899b71bd37c3906746436835e4632ceae4c34e53f16

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page