Skip to main content

Strongly typed, zero effort CLIs

Project description

dcargs

build mypy lint codecov

Overview

dcargs is a library for strongly-typed argument parsers and configuration objects.

pip install dcargs

Our core interface generates CLI interfaces from type-annotated callables, which may be functions, classes, or dataclasses. The goal is a tool that's lightweight enough for simple interactive scripts, but powerful enough to replace the heavier frameworks typically used to build hierarchical configuration systems.

dcargs.cli(f: Callable[..., T], *, description: Optional[str], args: Optional[Sequence[str]], default_instance: Optional[T]) -> T
Call `f(...)`, with arguments populated from an automatically generated CLI
interface.

`f` should have type-annotated inputs, and can be a function, class, or dataclass.
Note that if `f` is a class, `dcargs.cli()` returns an instance.

The parser is generated by populating helptext from docstrings and types from
annotations; a broad range of core type annotations are supported...
    - Types natively accepted by `argparse`: str, int, float, pathlib.Path, etc.
    - Default values for optional parameters.
    - Booleans, which are automatically converted to flags when provided a default
      value.
    - Enums (via `enum.Enum`).
    - Various container types. Some examples:
      - `typing.ClassVar`.
      - `typing.Optional`.
      - `typing.Literal`.
      - `typing.Sequence`.
      - `typing.List`.
      - `typing.Tuple` types, such as `typing.Tuple[T1, T2, T3]` or
        `typing.Tuple[T, ...]`.
      - `typing.Set` types.
      - `typing.Final` types and `typing.Annotated`.
      - Nested combinations of the above: `Optional[Literal[T]]`,
        `Final[Optional[Sequence[T]]]`, etc.
    - Nested dataclasses.
      - Simple nesting.
      - Unions over nested dataclasses (subparsers).
      - Optional unions over nested dataclasses (optional subparsers).
    - Generic dataclasses (including nested generics).

Args:
    f: Callable.

Keyword Args:
    description: Description text for the parser, displayed when the --help flag is
        passed in. If not specified, the dataclass docstring is used. Mirrors argument
        from `argparse.ArgumentParser()`.
    args: If set, parse arguments from a sequence of strings instead of the
        commandline. Mirrors argument from `argparse.ArgumentParser.parse_args()`.
    default_instance: An instance of `T` to use for default values; only supported
        if `T` is a dataclass type. Helpful for merging CLI arguments with values loaded
        from elsewhere. (for example, a config object loaded from a yaml file)

Returns:
    The output of `f(...)`.

Importantly, dcargs.cli() supports nested classes and dataclasses, which enable expressive hierarchical configuration objects built on standard Python features. Our goal is an interface that's:

  • Low-effort. Type annotations, docstrings, and default values can be used to automatically generate argument parsers with informative helptext. This includes bells and whistles like enums, containers, etc.
  • Strongly typed. Unlike dynamic configuration namespaces produced by libraries like argparse, YACS, abseil, hydra, or ml_collections, statically typed outputs mean that IDE-assisted autocomplete, rename, refactor, go-to-definition operations work out-of-the-box, as do static checking tools like mypy and pyright.
  • Modular. Most approaches to configuration objects require a centralized definition of all configurable fields. Supporting hierarchically nested configuration classes/dataclasses, however, makes it easy to distribute definitions, defaults, and documentation of configurable fields across modules or source files. A model configuration dataclass, for example, can be co-located in its entirety with the model implementation and dropped into any experiment configuration with an import — this eliminates redundancy and makes the entire module easy to port across codebases.
  • Noninvasive. Many popular approaches to argument parsing and configuration are treated as frameworks, with tentacles that squirm deep into project codebases.

Examples

A series of example scripts can be found in ./examples.

Functions

# examples/0_simple_function.py
import dcargs


def main(
    field1: str,
    field2: int,
    flag: bool = False,
) -> None:
    """Function, whose arguments will be populated from a CLI interface.

    Args:
        field1: First field.
        field2: Second field.
        flag: Boolean flag that we can set to true.
    """
    print(field1, field2, flag)


if __name__ == "__main__":
    dcargs.cli(main)

$ python 0_simple_function.py --help
usage: 0_simple_function.py [-h] --field1 STR --field2 INT [--flag]

Function, whose arguments will be populated from a CLI interface.

required arguments:
  --field1 STR  First field.
  --field2 INT  Second field.

optional arguments:
  -h, --help    show this help message and exit
  --flag        Boolean flag that we can set to true.

Dataclasses

# examples/1_simple_dataclass.py
import dataclasses

import dcargs


@dataclasses.dataclass
class Args:
    """Description.
    This should show up in the helptext!"""

    field1: str  # A string field.
    field2: int  # A numeric field.
    flag: bool = False  # A boolean flag.


if __name__ == "__main__":
    args = dcargs.cli(Args)
    print(args)

$ python 1_simple_dataclass.py --help
usage: 1_simple_dataclass.py [-h] --field1 STR --field2 INT [--flag]

Description.
This should show up in the helptext!

required arguments:
  --field1 STR  A string field.
  --field2 INT  A numeric field.

optional arguments:
  -h, --help    show this help message and exit
  --flag        A boolean flag.

Nested dataclasses

# examples/6_nested_dataclasses.py
import dataclasses
import enum

import dcargs


class OptimizerType(enum.Enum):
    ADAM = enum.auto()
    SGD = enum.auto()


@dataclasses.dataclass(frozen=True)
class OptimizerConfig:
    # Gradient-based optimizer to use.
    algorithm: OptimizerType = OptimizerType.ADAM

    # Learning rate to use.
    learning_rate: float = 3e-4

    # Coefficient for L2 regularization.
    weight_decay: float = 1e-2


@dataclasses.dataclass(frozen=True)
class ExperimentConfig:
    """A nested experiment configuration. Note that the argument parser description is
    pulled from this docstring by default, but can also be overrided with
    `dcargs.cli()`'s `description=` flag."""

    # Experiment name to use.
    experiment_name: str

    # Various configurable options for our optimizer.
    optimizer: OptimizerConfig

    # Random seed. This is helpful for making sure that our experiments are all
    # reproducible!
    seed: int = 0


if __name__ == "__main__":
    config = dcargs.cli(ExperimentConfig)
    print(config)
    print(dcargs.to_yaml(config))

usage: 6_nested_dataclasses.py [-h] --experiment-name STR [--optimizer.algorithm {ADAM,SGD}]
                               [--optimizer.learning-rate FLOAT] [--optimizer.weight-decay FLOAT]
                               [--seed INT]

A nested experiment configuration. Note that the argument parser description is
pulled from this docstring by default, but can also be overrided with
`dcargs.cli()`'s `description=` flag.

required arguments:
  --experiment-name STR
                        Experiment name to use.

optional arguments:
  -h, --help            show this help message and exit
  --seed INT            Random seed. This is helpful for making sure that our experiments are all
                        reproducible! (default: 0)

optional optimizer arguments:
  Various configurable options for our optimizer.

  --optimizer.algorithm {ADAM,SGD}
                        Gradient-based optimizer to use. (default: ADAM)
  --optimizer.learning-rate FLOAT
                        Learning rate to use. (default: 0.0003)
  --optimizer.weight-decay FLOAT
                        Coefficient for L2 regularization. (default: 0.01)

Serialization

As a secondary feature aimed at enabling the use of dcargs.cli() for general configuration use cases, we also introduce functions for human-readable dataclass serialization:

  • dcargs.from_yaml(cls: Type[T], stream: Union[str, IO[str], bytes, IO[bytes]]) -> T and dcargs.to_yaml(instance: T) -> str convert between YAML-style strings and dataclass instances.

The functions attempt to strike a balance between flexibility and robustness — in contrast to naively dumping or loading dataclass instances (via pickle, PyYAML, etc), explicit type references enable custom tags that are robust against code reorganization and refactor, while a PyYAML backend enables serialization of arbitrary Python objects.

Alternative tools

The core functionality of dcargs --- generating argument parsers from type annotations --- can be found as a subset of the features offered by many other libraries. A summary of some distinguishing features:

Choices from literals Generics Docstrings as helptext Nesting Subparsers Containers
dcargs
datargs
tap
simple-parsing soon
argparse-dataclass
argparse-dataclasses
dataclass-cli
clout
hf_argparser
pyrallis

Note that most of these other libraries are generally aimed specifically at dataclasses rather than general typed callables, but offer other features that you might find useful, such as clout), registration for custom types (pyrallis), different approaches for serialization and config files (tap, pyrallis), simultaneous parsing of multiple dataclasses (simple-parsing), etc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcargs-0.1.0.tar.gz (26.2 kB view hashes)

Uploaded Source

Built Distribution

dcargs-0.1.0-py3-none-any.whl (26.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page