Skip to main content

Strongly typed, zero-effort CLIs

Project description

dcargs

pip install dcargs

build mypy lint codecov

Overview

dcargs is a library for typed CLI interfaces and configuration objects.

Our core interface generates argument parsers from type-annotated callables. In the simplest case, this can be used as a drop-in replacement for argparse:

with argparse with dcargs
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--a", type=int, required=True)
parser.add_argument("--b", type=int, default=1)
args = parser.parse_args()

print(args.a + args.b)
import dcargs

def main(a: int, b: int = 3) -> None:
    print(a + b)

dcargs.cli(main)

The broader goal is a replacement for tools like argparse, hydra, and ml_collections that's:

  • Low effort. Standard Python type annotations, docstrings, and default values are parsed to automatically generate argument parsers with informative helptext.

  • Expressive. dcargs.cli() understands functions, classes, dataclasses, and nested classes and dataclasses, as well as frequently used annotations like unions, literals, collections, and generics, which can be composed into hierarchical configuration objects built on standard Python features.

  • Typed. Unlike dynamic configuration namespaces produced by libraries like argparse, YACS, abseil, hydra, or ml_collections, typed outputs mean that IDE-assisted autocomplete, rename, refactor, and go-to-definition operations work out-of-the-box, as well as static checking tools like mypy and pyright.

  • Modular. Most approaches to configuration objects require a centralized definition of all configurable fields. Hierarchically nesting configuration structures, however, makes it easy to distribute definitions, defaults, and documentation of configurable fields across modules or source files. A model configuration dataclass, for example, can be co-located in its entirety with the model implementation and dropped into any experiment configuration with an import — this eliminates redundancy and makes entire modules easy to port across codebases.

API

def cli(
    f: Callable[..., T],
    *,
    prog: Optional[str] = None,
    description: Optional[str] = None,
    args: Optional[Sequence[str]] = None,
    default_instance: Optional[T] = None,
    avoid_subparsers: bool = False,
) -> T
Docstring
Call `f(...)`, with arguments populated from an automatically generated CLI
interface.

`f` should have type-annotated inputs, and can be a function or class. Note that if
`f` is a class, `dcargs.cli()` returns an instance.

The parser is generated by populating helptext from docstrings and types from
annotations; a broad range of core type annotations are supported...
    - Types natively accepted by `argparse`: str, int, float, pathlib.Path, etc.
    - Default values for optional parameters.
    - Booleans, which are automatically converted to flags when provided a default
      value.
    - Enums (via `enum.Enum`).
    - Various annotations from the standard typing library. Some examples:
      - `typing.ClassVar[T]`.
      - `typing.Optional[T]`.
      - `typing.Literal[T]`.
      - `typing.Sequence[T]`.
      - `typing.List[T]`.
      - `typing.Dict[K, V]`.
      - `typing.Tuple`, such as `typing.Tuple[T1, T2, T3]` or
        `typing.Tuple[T, ...]`.
      - `typing.Set[T]`.
      - `typing.Final[T]` and `typing.Annotated[T]`.
      - `typing.Union[T1, T2]`.
      - Various nested combinations of the above: `Optional[Literal[T]]`,
        `Final[Optional[Sequence[T]]]`, etc.
    - Hierarchical structures via nested dataclasses, TypedDict, NamedTuple,
      classes.
      - Simple nesting.
      - Unions over nested structures (subparsers).
      - Optional unions over nested structures (optional subparsers).
    - Generics (including nested generics).

Args:
    f: Callable.

Keyword Args:
    prog: The name of the program printed in helptext. Mirrors argument from
        `argparse.ArgumentParser()`.
    description: Description text for the parser, displayed when the --help flag is
        passed in. If not specified, `f`'s docstring is used. Mirrors argument from
        `argparse.ArgumentParser()`.
    args: If set, parse arguments from a sequence of strings instead of the
        commandline. Mirrors argument from `argparse.ArgumentParser.parse_args()`.
    default_instance: An instance of `T` to use for default values; only supported
        if `T` is a dataclass, TypedDict, or NamedTuple. Helpful for merging CLI
        arguments with values loaded from elsewhere. (for example, a config object
        loaded from a yaml file)
    avoid_subparsers: Avoid creating a subparser when defaults are provided for
        unions over nested types. Generates cleaner but less expressive CLIs.

Returns:
    The output of `f(...)`.

Examples

1. Functions

In the simplest case, dcargs.cli() can be used to run a function with arguments populated from the CLI.

Code (link):

import dcargs


def main(
    field1: str,
    field2: int = 3,
) -> None:
    """Function, whose arguments will be populated from a CLI interface.

    Args:
        field1: A string field.
        field2: A numeric field, with a default value.
    """
    print(field1, field2)


if __name__ == "__main__":
    dcargs.cli(main)

Example usage:

$ python ./01_functions.py --help
usage: 01_functions.py [-h] --field1 STR [--field2 INT]

Function, whose arguments will be populated from a CLI interface.

arguments:
  -h, --help    show this help message and exit
  --field1 STR  A string field. (required)
  --field2 INT  A numeric field, with a default value. (default: 3)
$ python ./01_functions.py --field1 hello
hello 3
$ python ./01_functions.py --field1 hello --field2 10
hello 10
2. Dataclasses

Common pattern: use dcargs.cli() to instantiate a dataclass. The outputted instance can be used as a typed alternative for an argparse namespace.

Code (link):

import dataclasses

import dcargs


@dataclasses.dataclass
class Args:
    """Description.
    This should show up in the helptext!"""

    field1: str  # A string field.
    field2: int = 3  # A numeric field, with a default value.


if __name__ == "__main__":
    args = dcargs.cli(Args)
    print(args)

Example usage:

$ python ./02_dataclasses.py --help
usage: 02_dataclasses.py [-h] --field1 STR [--field2 INT]

Description.
This should show up in the helptext!

arguments:
  -h, --help    show this help message and exit
  --field1 STR  A string field. (required)
  --field2 INT  A numeric field, with a default value. (default: 3)
$ python ./02_dataclasses.py --field1 hello
Args(field1='hello', field2=3)
$ python ./02_dataclasses.py --field1 hello --field2 5
Args(field1='hello', field2=5)
3. Enums And Containers

We can generate argument parsers from more advanced type annotations, like enums and tuple types.

Code (link):

import dataclasses
import enum
import pathlib
from typing import Optional, Tuple

import dcargs


class OptimizerType(enum.Enum):
    ADAM = enum.auto()
    SGD = enum.auto()


@dataclasses.dataclass(frozen=True)
class TrainConfig:
    # Example of a variable-length tuple. `typing.List`, `typing.Sequence`,
    # `typing.Set`, `typing.Dict`, etc are all supported as well.
    dataset_sources: Tuple[pathlib.Path, ...]
    """Paths to load training data from. This can be multiple!"""

    # Fixed-length tuples are also okay.
    image_dimensions: Tuple[int, int] = (32, 32)
    """Height and width of some image data."""

    # Enums are handled seamlessly.
    optimizer_type: OptimizerType = OptimizerType.ADAM
    """Gradient-based optimizer to use."""

    # We can also explicitly mark arguments as optional.
    checkpoint_interval: Optional[int] = None
    """Interval to save checkpoints at."""


if __name__ == "__main__":
    config = dcargs.cli(TrainConfig)
    print(config)

Example usage:

$ python ./03_enums_and_containers.py --help
usage: 03_enums_and_containers.py [-h] --dataset-sources PATH [PATH ...]
                                  [--image-dimensions INT INT]
                                  [--optimizer-type {ADAM,SGD}]
                                  [--checkpoint-interval {None}|INT]

arguments:
  -h, --help            show this help message and exit
  --dataset-sources PATH [PATH ...]
                        Paths to load training data from. This can be
                        multiple! (required)
  --image-dimensions INT INT
                        Height and width of some image data. (default: 32
                        32)
  --optimizer-type {ADAM,SGD}
                        Gradient-based optimizer to use. (default:
                        ADAM)
  --checkpoint-interval {None}|INT
                        Interval to save checkpoints at. (default:
                        None)
$ python ./03_enums_and_containers.py --dataset-sources ./data --image-dimensions 16 16
TrainConfig(dataset_sources=(PosixPath('data'),), image_dimensions=(16, 16), optimizer_type=<OptimizerType.ADAM: 1>, checkpoint_interval=None)
$ python ./03_enums_and_containers.py --dataset-sources ./data --optimizer-type SGD
TrainConfig(dataset_sources=(PosixPath('data'),), image_dimensions=(32, 32), optimizer_type=<OptimizerType.SGD: 2>, checkpoint_interval=None)
4. Flags

Booleans can either be expected to be explicitly passed in, or, if given a default value, automatically converted to flags.

Code (link):

import dataclasses
from typing import Optional

import dcargs


@dataclasses.dataclass
class Args:
    # Boolean. This expects an explicit "True" or "False".
    boolean: bool

    # Optional boolean. Same as above, but can be omitted.
    optional_boolean: Optional[bool]

    # Pass --flag-a in to set this value to True.
    flag_a: bool = False

    # Pass --no-flag-b in to set this value to False.
    flag_b: bool = True


if __name__ == "__main__":
    args = dcargs.cli(Args)
    print(args)

Example usage:

$ python ./04_flags.py --help
usage: 04_flags.py [-h] --boolean {True,False} --optional-boolean
                   {None,True,False} [--flag-a] [--no-flag-b]

arguments:
  -h, --help            show this help message and exit
  --boolean {True,False}
                        Boolean. This expects an explicit "True" or "False".
                        (required)
  --optional-boolean {None,True,False}
                        Optional boolean. Same as above, but can be omitted.
                        (required)
  --flag-a              Pass --flag-a in to set this value to True. (sets:
                        flag_a=True)
  --no-flag-b           Pass --no-flag-b in to set this value to False.
                        (sets: flag_b=False)
$ python ./04_flags.py --boolean True
usage: 04_flags.py [-h] --boolean {True,False} --optional-boolean
                   {None,True,False} [--flag-a] [--no-flag-b]
04_flags.py: error: the following arguments are required: --optional-boolean
$ python ./04_flags.py --boolean False --flag-a
usage: 04_flags.py [-h] --boolean {True,False} --optional-boolean
                   {None,True,False} [--flag-a] [--no-flag-b]
04_flags.py: error: the following arguments are required: --optional-boolean
$ python ./04_flags.py --boolean False --no-flag-b
usage: 04_flags.py [-h] --boolean {True,False} --optional-boolean
                   {None,True,False} [--flag-a] [--no-flag-b]
04_flags.py: error: the following arguments are required: --optional-boolean
5. Hierarchical Configs

Parsing of nested types (in this case nested dataclasses) enables hierarchical configuration objects that are both modular and highly expressive.

Code (link):

import dataclasses
import enum
import pathlib

import dcargs


class OptimizerType(enum.Enum):
    ADAM = enum.auto()
    SGD = enum.auto()


@dataclasses.dataclass(frozen=True)
class OptimizerConfig:
    # Gradient-based optimizer to use.
    algorithm: OptimizerType = OptimizerType.ADAM

    # Learning rate to use.
    learning_rate: float = 3e-4

    # Coefficient for L2 regularization.
    weight_decay: float = 1e-2


@dataclasses.dataclass(frozen=True)
class ExperimentConfig:
    # Various configurable options for our optimizer.
    optimizer: OptimizerConfig

    # Batch size.
    batch_size: int = 32

    # Total number of training steps.
    train_steps: int = 100_000

    # Random seed. This is helpful for making sure that our experiments are all
    # reproducible!
    seed: int = 0


def train(
    out_dir: pathlib.Path,
    config: ExperimentConfig,
    restore_checkpoint: bool = False,
    checkpoint_interval: int = 1000,
) -> None:
    """Train a model.

    Args:
        out_dir: Where to save logs and checkpoints.
        config: Experiment configuration.
        restore_checkpoint: Set to restore an existing checkpoint.
        checkpoint_interval: Training steps between each checkpoint save.
    """
    print(f"{out_dir=}, {restore_checkpoint=}, {checkpoint_interval=}")
    print(f"{config=}")
    print(dcargs.to_yaml(config))


if __name__ == "__main__":
    dcargs.cli(train)

Example usage:

$ python ./05_hierarchical_configs.py --help
usage: 05_hierarchical_configs.py [-h] --out-dir PATH
                                  [--config.optimizer.algorithm {ADAM,SGD}]
                                  [--config.optimizer.learning-rate FLOAT]
                                  [--config.optimizer.weight-decay FLOAT]
                                  [--config.batch-size INT]
                                  [--config.train-steps INT]
                                  [--config.seed INT] [--restore-checkpoint]
                                  [--checkpoint-interval INT]

Train a model.

arguments:
  -h, --help            show this help message and exit
  --out-dir PATH  Where to save logs and checkpoints.
                        (required)
  --restore-checkpoint  Set to restore an existing checkpoint. (sets:
                        restore_checkpoint=True)
  --checkpoint-interval INT
                        Training steps between each checkpoint save.
                        (default: 1000)

config.optimizer arguments:
  Various configurable options for our optimizer.

  --config.optimizer.algorithm {ADAM,SGD}
                        Gradient-based optimizer to use. (default:
                        ADAM)
  --config.optimizer.learning-rate FLOAT
                        Learning rate to use. (default: 0.0003)
  --config.optimizer.weight-decay FLOAT
                        Coefficient for L2 regularization. (default:
                        0.01)

config arguments:
  Experiment configuration.

  --config.batch-size INT
                        Batch size. (default: 32)
  --config.train-steps INT
                        Total number of training steps. (default:
                        100000)
  --config.seed INT  Random seed. This is helpful for making sure that our
                        experiments are all reproducible! (default: 0)
$ python ./05_hierarchical_configs.py . --config.optimizer.algorithm SGD
usage: 05_hierarchical_configs.py [-h] --out-dir PATH
                                  [--config.optimizer.algorithm {ADAM,SGD}]
                                  [--config.optimizer.learning-rate FLOAT]
                                  [--config.optimizer.weight-decay FLOAT]
                                  [--config.batch-size INT]
                                  [--config.train-steps INT]
                                  [--config.seed INT] [--restore-checkpoint]
                                  [--checkpoint-interval INT]
05_hierarchical_configs.py: error: the following arguments are required: --out-dir
$ python ./05_hierarchical_configs.py . --restore-checkpoint
usage: 05_hierarchical_configs.py [-h] --out-dir PATH
                                  [--config.optimizer.algorithm {ADAM,SGD}]
                                  [--config.optimizer.learning-rate FLOAT]
                                  [--config.optimizer.weight-decay FLOAT]
                                  [--config.batch-size INT]
                                  [--config.train-steps INT]
                                  [--config.seed INT] [--restore-checkpoint]
                                  [--checkpoint-interval INT]
05_hierarchical_configs.py: error: the following arguments are required: --out-dir
6. Base Configs

We can integrate dcargs.cli() into common configuration patterns: here, we select one of multiple possible base configurations, and then use the CLI to either override (existing) or fill in (missing) values.

Code (link):

import sys
from dataclasses import dataclass
from typing import Callable, Dict, Literal, Tuple, TypeVar, Union

from torch import nn

import dcargs


@dataclass(frozen=True)
class AdamOptimizer:
    learning_rate: float = 1e-3
    betas: Tuple[float, float] = (0.9, 0.999)


@dataclass(frozen=True)
class SgdOptimizer:
    learning_rate: float = 3e-4


@dataclass(frozen=True)
class ExperimentConfig:
    # Dataset to run experiment on.
    dataset: Literal["mnist", "imagenet-50"]

    # Optimizer parameters.
    optimizer: Union[AdamOptimizer, SgdOptimizer]

    # Model size.
    num_layers: int
    units: int

    # Batch size.
    batch_size: int

    # Total number of training steps.
    train_steps: int

    # Random seed. This is helpful for making sure that our experiments are all
    # reproducible!
    seed: int

    # Activation to use. Not specifiable via the commandline.
    activation: Callable[[], nn.Module]


# Note that we could also define this library using separate YAML files (similar to
# `config_path`/`config_name` in Hydra), but staying in Python enables seamless type
# checking + IDE support.
base_configs = {
    "small": ExperimentConfig(
        dataset="mnist",
        optimizer=SgdOptimizer(),
        batch_size=2048,
        num_layers=4,
        units=64,
        train_steps=30_000,
        # The dcargs.MISSING sentinel allows us to specify that the seed should have no
        # default, and needs to be populated from the CLI.
        seed=dcargs.MISSING,
        activation=nn.ReLU,
    ),
    "big": ExperimentConfig(
        dataset="imagenet-50",
        optimizer=AdamOptimizer(),
        batch_size=32,
        num_layers=8,
        units=256,
        train_steps=100_000,
        seed=dcargs.MISSING,
        activation=nn.GELU,
    ),
}


T = TypeVar("T")


def cli_from_base_configs(base_library: Dict[str, T]) -> T:
    """Populate an instance of `cls`, where the first positional argument is used to
    select from a library of named base configs."""
    # Get base configuration name from the first positional argument.
    if len(sys.argv) < 2 or sys.argv[1] not in base_library:
        valid_usages = map(lambda k: f"{sys.argv[0]} {k} --help", base_library.keys())
        raise SystemExit("usage:\n  " + "\n  ".join(valid_usages))

    # Get base configuration from our library, and use it for default CLI parameters.
    default_instance = base_library[sys.argv[1]]
    return dcargs.cli(
        type(default_instance),
        prog=" ".join(sys.argv[:2]),
        args=sys.argv[2:],
        default_instance=default_instance,
        # `avoid_subparsers` will avoid making a subparser for unions when a default is
        # provided; in this case, it simplifies our CLI but makes it less expressive
        # (cannot switch away from the base optimizer types).
        avoid_subparsers=True,
    )


if __name__ == "__main__":
    config = cli_from_base_configs(base_configs)
    print(config)

Example usage:

$ python ./06_base_configs_argv.py
usage:
  examples/06_base_configs.py small --help
  examples/06_base_configs.py big --help
$ python ./06_base_configs_argv.py small --help
usage: examples/06_base_configs.py small [-h] [--dataset {mnist,imagenet-50}]
                                         [--optimizer.learning-rate FLOAT]
                                         [--num-layers INT] [--units INT]
                                         [--batch-size INT]
                                         [--train-steps INT] --seed INT
                                         [--activation {<class 'torch.nn.modules.activation.ReLU'>}]

arguments:
  -h, --help            show this help message and exit
  --dataset {mnist,imagenet-50}
                        Dataset to run experiment on. (default: mnist)
  --num-layers INT  Model size. (default: 4)
  --units INT   Model size. (default: 64)
  --batch-size INT  Batch size. (default: 2048)
  --train-steps INT  Total number of training steps. (default:
                        30000)
  --seed INT    Random seed. This is helpful for making sure that our
                        experiments are all reproducible!
                        (required)
  --activation {<class 'torch.nn.modules.activation.ReLU'>}
                        Activation to use. Not specifiable via the
                        commandline. (fixed)

optimizer arguments:
  Optimizer parameters.

  --optimizer.learning-rate FLOAT
                        (default: 0.0003)
$ python ./06_base_configs_argv.py small --seed 94720
ExperimentConfig(dataset='mnist', optimizer=SgdOptimizer(learning_rate=0.0003), num_layers=4, units=64, batch_size=2048, train_steps=30000, seed=94720, activation=<class 'torch.nn.modules.activation.ReLU'>)
$ python ./06_base_configs_argv.py big --help
usage: examples/06_base_configs.py big [-h] [--dataset {mnist,imagenet-50}]
                                       [--optimizer.learning-rate FLOAT]
                                       [--optimizer.betas FLOAT FLOAT]
                                       [--num-layers INT] [--units INT]
                                       [--batch-size INT] [--train-steps INT]
                                       --seed INT
                                       [--activation {<class 'torch.nn.modules.activation.GELU'>}]

arguments:
  -h, --help            show this help message and exit
  --dataset {mnist,imagenet-50}
                        Dataset to run experiment on. (default:
                        imagenet-50)
  --num-layers INT  Model size. (default: 8)
  --units INT   Model size. (default: 256)
  --batch-size INT  Batch size. (default: 32)
  --train-steps INT  Total number of training steps. (default:
                        100000)
  --seed INT    Random seed. This is helpful for making sure that our
                        experiments are all reproducible!
                        (required)
  --activation {<class 'torch.nn.modules.activation.GELU'>}
                        Activation to use. Not specifiable via the
                        commandline. (fixed)

optimizer arguments:
  Optimizer parameters.

  --optimizer.learning-rate FLOAT
                        (default: 0.001)
  --optimizer.betas FLOAT FLOAT
                        (default: 0.9 0.999)
$ python ./06_base_configs_argv.py big --seed 94720
ExperimentConfig(dataset='imagenet-50', optimizer=AdamOptimizer(learning_rate=0.001, betas=(0.9, 0.999)), num_layers=8, units=256, batch_size=32, train_steps=100000, seed=94720, activation=<class 'torch.nn.modules.activation.GELU'>)
7. Literals And Unions

typing.Literal[] can be used to restrict inputs to a fixed set of literal choices; typing.Union[] can be used to restrict inputs to a fixed set of types.

Code (link):

import dataclasses
import enum
from typing import Literal, Optional, Tuple, Union

import dcargs


class Color(enum.Enum):
    RED = enum.auto()
    GREEN = enum.auto()
    BLUE = enum.auto()


@dataclasses.dataclass(frozen=True)
class Args:
    # We can use Literal[] to restrict the set of allowable inputs, for example, over
    # enums.
    restricted_enum: Literal[Color.RED, Color.GREEN] = Color.RED

    # Literals can also be marked Optional.
    integer: Optional[Literal[0, 1, 2, 3]] = None

    # Unions can be used to specify multiple allowable types.
    union_over_types: Union[int, str] = 0
    string_or_enum: Union[Literal["red", "green"], Color] = "red"

    # Unions also work over more complex nested types.
    union_over_tuples: Union[Tuple[int, int], Tuple[str]] = ("1",)

    # And can be nested in other types.
    tuple_of_string_or_enum: Tuple[Union[Literal["red", "green"], Color], ...] = (
        "red",
        Color.RED,
    )


if __name__ == "__main__":
    args = dcargs.cli(Args)
    print(args)

Example usage:

$ python ./07_literals_and_unions.py --help
usage: 07_literals_and_unions.py [-h] [--restricted-enum {RED,GREEN}]
                                 [--integer {None,0,1,2,3}]
                                 [--union-over-types INT|STR]
                                 [--string-or-enum {red,green,RED,GREEN,BLUE}]
                                 [--union-over-tuples {INT INT}|STR]
                                 [--tuple-of-string-or-enum {red,green,RED,GREEN,BLUE} [{red,green,RED,GREEN,BLUE} ...]]

arguments:
  -h, --help            show this help message and exit
  --restricted-enum {RED,GREEN}
                        We can use Literal[] to restrict the set of allowable
                        inputs, for example, over enums. (default:
                        RED)
  --integer {None,0,1,2,3}
                        Literals can also be marked Optional. (default:
                        None)
  --union-over-types INT|STR
                        Unions can be used to specify multiple allowable
                        types. (default: 0)
  --string-or-enum {red,green,RED,GREEN,BLUE}
                        Unions can be used to specify multiple allowable
                        types. (default: red)
  --union-over-tuples {INT INT}|STR
                        Unions also work over more complex nested types.
                        (default: 1)
  --tuple-of-string-or-enum {red,green,RED,GREEN,BLUE} [{red,green,RED,GREEN,BLUE} ...]
                        And can be nested in other types. (default: red
                        RED)
8. Positional Args

Positional-only arguments in functions are converted to positional CLI arguments.

Code (link):

from __future__ import annotations

import dataclasses
import enum
import pathlib
from typing import Tuple

import dcargs


def main(
    source: pathlib.Path,
    dest: pathlib.Path,
    /,  # Mark the end of positional arguments.
    optimizer: OptimizerConfig,
    force: bool = False,
    verbose: bool = False,
    background_rgb: Tuple[float, float, float] = (1.0, 0.0, 0.0),
) -> None:
    """Command-line interface defined using a function signature. Note that this
    docstring is parsed to generate helptext.

    Args:
        source: Source path.
        dest: Destination path.
        optimizer: Configuration for our optimizer object.
        force: Do not prompt before overwriting.
        verbose: Explain what is being done.
        background_rgb: Background color. Red by default.
    """
    print(f"{source=}\n{dest=}\n{optimizer=}\n{force=}\n{verbose=}\n{background_rgb=}")


class OptimizerType(enum.Enum):
    ADAM = enum.auto()
    SGD = enum.auto()


@dataclasses.dataclass(frozen=True)
class OptimizerConfig:
    algorithm: OptimizerType = OptimizerType.ADAM
    """Gradient-based optimizer to use."""

    learning_rate: float = 3e-4
    """Learning rate to use."""

    weight_decay: float = 1e-2
    """Coefficient for L2 regularization."""


if __name__ == "__main__":
    dcargs.cli(main)

Example usage:

$ python ./08_positional_args.py --help
usage: 08_positional_args.py [-h] [--optimizer.algorithm {ADAM,SGD}]
                             [--optimizer.learning-rate FLOAT]
                             [--optimizer.weight-decay FLOAT] [--force]
                             [--verbose] [--background-rgb FLOAT FLOAT FLOAT]
                             SOURCE DEST

Command-line interface defined using a function signature. Note that this
docstring is parsed to generate helptext.

positional arguments:
  SOURCE                Source path. (required)
  DEST                  Destination path. (required)

arguments:
  -h, --help            show this help message and exit
  --force               Do not prompt before overwriting. (sets:
                        force=True)
  --verbose             Explain what is being done. (sets:
                        verbose=True)
  --background-rgb FLOAT FLOAT FLOAT
                        Background color. Red by default. (default: 1.0
                        0.0 0.0)

optimizer arguments:
  Configuration for our optimizer object.

  --optimizer.algorithm {ADAM,SGD}
                        Gradient-based optimizer to use. (default:
                        ADAM)
  --optimizer.learning-rate FLOAT
                        Learning rate to use. (default: 0.0003)
  --optimizer.weight-decay FLOAT
                        Coefficient for L2 regularization. (default:
                        0.01)
$ python ./08_positional_args.py ./a ./b --optimizer.learning-rate 1e-5
source=PosixPath('a')
dest=PosixPath('b')
optimizer=OptimizerConfig(algorithm=<OptimizerType.ADAM: 1>, learning_rate=1e-05, weight_decay=0.01)
force=False
verbose=False
background_rgb=(1.0, 0.0, 0.0)
9. Subparsers

Unions over nested types (classes or dataclasses) are populated using subparsers.

Code (link):

from __future__ import annotations

import dataclasses
from typing import Union

import dcargs


@dataclasses.dataclass(frozen=True)
class Checkout:
    """Checkout a branch."""

    branch: str


@dataclasses.dataclass(frozen=True)
class Commit:
    """Commit changes."""

    message: str
    all: bool = False


def main(cmd: Union[Checkout, Commit]) -> None:
    print(cmd)


if __name__ == "__main__":
    dcargs.cli(main)

Example usage:

$ python ./09_subparsers.py --help
usage: 09_subparsers.py [-h] {checkout,commit}

arguments:
  -h, --help         show this help message and exit

subcommands:

  {checkout,commit}
$ python ./09_subparsers.py commit --help
usage: 09_subparsers.py commit [-h] --cmd.message STR [--cmd.all]

Commit changes.

arguments:
  -h, --help         show this help message and exit

cmd arguments:
  --cmd.message STR  (required)
  --cmd.all          (sets: all=True)
$ python ./09_subparsers.py commit --cmd.message hello --cmd.all
Commit(message='hello', all=True)
$ python ./09_subparsers.py checkout --help
usage: 09_subparsers.py checkout [-h] --cmd.branch STR

Checkout a branch.

arguments:
  -h, --help        show this help message and exit

cmd arguments:
  --cmd.branch STR  (required)
$ python ./09_subparsers.py checkout --cmd.branch main
Checkout(branch='main')
10. Multiple Subparsers

Multiple unions over nested types are populated using a series of subparsers.

Code (link):

from __future__ import annotations

import dataclasses
from typing import Literal, Tuple, Union

import dcargs

# Possible dataset configurations.


@dataclasses.dataclass
class MnistDataset:
    binary: bool = False
    """Set to load binary version of MNIST dataset."""


@dataclasses.dataclass
class ImageNetDataset:
    subset: Literal[50, 100, 1000]
    """Choose between ImageNet-50, ImageNet-100, ImageNet-1000, etc."""


# Possible optimizer configurations.


@dataclasses.dataclass
class AdamOptimizer:
    learning_rate: float = 1e-3
    betas: Tuple[float, float] = (0.9, 0.999)


@dataclasses.dataclass
class SgdOptimizer:
    learning_rate: float = 3e-4


# Train script.


def train(
    dataset: Union[MnistDataset, ImageNetDataset] = MnistDataset(),
    optimizer: Union[AdamOptimizer, SgdOptimizer] = AdamOptimizer(),
) -> None:
    """Example training script.

    Args:
        dataset: Dataset to train on.
        optimizer: Optimizer to train with.

    Returns:
        None:
    """
    print(dataset)
    print(optimizer)


if __name__ == "__main__":
    dcargs.cli(train)

Example usage:

$ python ./10_multiple_subparsers.py
MnistDataset(binary=False)
AdamOptimizer(learning_rate=0.001, betas=(0.9, 0.999))
$ python ./10_multiple_subparsers.py --help
usage: 10_multiple_subparsers.py [-h] [{mnist-dataset,image-net-dataset}]

Example training script.

arguments:
  -h, --help            show this help message and exit

optional subcommands:
  Dataset to train on.  (default: mnist-dataset)

  [{mnist-dataset,image-net-dataset}]
$ python ./10_multiple_subparsers.py mnist-dataset --help
usage: 10_multiple_subparsers.py mnist-dataset [-h] [--dataset.binary]
                                               [{adam-optimizer,sgd-optimizer}]

arguments:
  -h, --help            show this help message and exit

dataset arguments:
  --dataset.binary      Set to load binary version of MNIST dataset.
                        (sets: binary=True)

optional subcommands:
  Optimizer to train with.  (default: adam-optimizer)

  [{adam-optimizer,sgd-optimizer}]
$ python ./10_multiple_subparsers.py mnist-dataset adam-optimizer --optimizer.learning-rate 3e-4
MnistDataset(binary=False)
AdamOptimizer(learning_rate=0.0003, betas=(0.9, 0.999))
11. Dictionaries

Dictionary inputs can be specified using either a standard Dict[K, V] annotation, or a TypedDict type.

Code (link):

from typing import Dict, Tuple, TypedDict

import dcargs


class DictionarySchema(
    TypedDict,
    # Setting `total=False` specifies that not all keys need to exist.
    total=False,
):
    learning_rate: float
    betas: Tuple[float, float]


def main(
    typed_dict: DictionarySchema,
    standard_dict: Dict[str, float] = {
        "learning_rate": 3e-4,
        "beta1": 0.9,
        "beta2": 0.999,
    },
) -> None:
    assert isinstance(standard_dict, dict)
    assert isinstance(typed_dict, dict)
    print("Standard dict:", standard_dict)
    print("Typed dict:", typed_dict)


if __name__ == "__main__":
    dcargs.cli(main)

Example usage:

$ python ./11_dictionaries.py --help
usage: 11_dictionaries.py [-h] [--typed-dict.learning-rate FLOAT]
                          [--typed-dict.betas FLOAT FLOAT]
                          [--standard-dict STR FLOAT [STR FLOAT ...]]

arguments:
  -h, --help            show this help message and exit
  --standard-dict STR FLOAT [STR FLOAT ...]
                        (default: learning_rate 0.0003 beta1 0.9 beta2
                        0.999)

typed_dict arguments:

  --typed-dict.learning-rate FLOAT
                        Setting `total=False` specifies that not all keys need
                        to exist. (unset by default)
  --typed-dict.betas FLOAT FLOAT
                        Setting `total=False` specifies that not all keys need
                        to exist. (unset by default)
$ python ./11_dictionaries.py --typed-dict.learning-rate 3e-4
Standard dict: {'learning_rate': 0.0003, 'beta1': 0.9, 'beta2': 0.999}
Typed dict: {'learning_rate': 0.0003}
$ python ./11_dictionaries.py --typed-dict.betas 0.9 0.999
Standard dict: {'learning_rate': 0.0003, 'beta1': 0.9, 'beta2': 0.999}
Typed dict: {'betas': (0.9, 0.999)}
12. Named Tuples

Example using dcargs.cli() to instantiate a named tuple.

Code (link):

from typing import NamedTuple

import dcargs


class TupleType(NamedTuple):
    """Description.
    This should show up in the helptext!"""

    field1: str  # A string field.
    field2: int = 3  # A numeric field, with a default value.
    flag: bool = False  # A boolean flag.


if __name__ == "__main__":
    x = dcargs.cli(TupleType)
    assert isinstance(x, tuple)
    print(x)

Example usage:

$ python ./12_named_tuples.py --help
usage: 12_named_tuples.py [-h] --field1 STR [--field2 INT] [--flag]

Description.
This should show up in the helptext!

arguments:
  -h, --help    show this help message and exit
  --field1 STR  A string field. (required)
  --field2 INT  A numeric field, with a default value. (default: 3)
  --flag        A boolean flag. (sets: flag=True)
$ python ./12_named_tuples.py --field1 hello
TupleType(field1='hello', field2=3, flag=False)
13. Standard Classes

In addition to functions and dataclasses, we can also generate CLIs from (the constructors of) standard Python classes.

Code (link):

import dcargs


class Args:
    def __init__(
        self,
        field1: str,
        field2: int,
        flag: bool = False,
    ):
        """Arguments.

        Args:
            field1: A string field.
            field2: A numeric field.
            flag: A boolean flag.
        """
        self.data = [field1, field2, flag]


if __name__ == "__main__":
    args = dcargs.cli(Args)
    print(args.data)

Example usage:

$ python ./13_standard_classes.py --help
usage: 13_standard_classes.py [-h] --field1 STR --field2 INT [--flag]

Arguments.

arguments:
  -h, --help    show this help message and exit
  --field1 STR  A string field. (required)
  --field2 INT  A numeric field. (required)
  --flag        A boolean flag. (sets: flag=True)
$ python ./13_standard_classes.py --field1 hello --field2 7
['hello', 7, False]
14. Generics

Example of parsing for generic dataclasses.

Code (link):

import dataclasses
from typing import Generic, TypeVar

import dcargs

ScalarType = TypeVar("ScalarType")
ShapeType = TypeVar("ShapeType")


@dataclasses.dataclass(frozen=True)
class Point3(Generic[ScalarType]):
    x: ScalarType
    y: ScalarType
    z: ScalarType
    frame_id: str


@dataclasses.dataclass(frozen=True)
class Triangle:
    a: Point3[float]
    b: Point3[float]
    c: Point3[float]


@dataclasses.dataclass(frozen=True)
class Args(Generic[ShapeType]):
    point_continuous: Point3[float]
    point_discrete: Point3[int]
    shape: ShapeType


if __name__ == "__main__":
    args = dcargs.cli(Args[Triangle])
    print(args)

Example usage:

$ python ./14_generics.py --help
usage: 14_generics.py [-h] --point-continuous.x FLOAT --point-continuous.y
                      FLOAT --point-continuous.z FLOAT
                      --point-continuous.frame-id STR --point-discrete.x INT
                      --point-discrete.y INT --point-discrete.z INT
                      --point-discrete.frame-id STR --shape.a.x FLOAT
                      --shape.a.y FLOAT --shape.a.z FLOAT --shape.a.frame-id
                      STR --shape.b.x FLOAT --shape.b.y FLOAT --shape.b.z
                      FLOAT --shape.b.frame-id STR --shape.c.x FLOAT
                      --shape.c.y FLOAT --shape.c.z FLOAT --shape.c.frame-id
                      STR

arguments:
  -h, --help            show this help message and exit

point_continuous arguments:

  --point-continuous.x FLOAT
                        (required)
  --point-continuous.y FLOAT
                        (required)
  --point-continuous.z FLOAT
                        (required)
  --point-continuous.frame-id STR
                        (required)

point_discrete arguments:

  --point-discrete.x INT
                        (required)
  --point-discrete.y INT
                        (required)
  --point-discrete.z INT
                        (required)
  --point-discrete.frame-id STR
                        (required)

shape.a arguments:

  --shape.a.x FLOAT  (required)
  --shape.a.y FLOAT  (required)
  --shape.a.z FLOAT  (required)
  --shape.a.frame-id STR
                        (required)

shape.b arguments:

  --shape.b.x FLOAT  (required)
  --shape.b.y FLOAT  (required)
  --shape.b.z FLOAT  (required)
  --shape.b.frame-id STR
                        (required)

shape.c arguments:

  --shape.c.x FLOAT  (required)
  --shape.c.y FLOAT  (required)
  --shape.c.z FLOAT  (required)
  --shape.c.frame-id STR
                        (required)

Serialization

As a secondary feature aimed at enabling the use of dcargs.cli() for general configuration use cases, we also introduce functions for human-readable dataclass serialization:

  • dcargs.from_yaml(cls: Type[T], stream: Union[str, IO[str], bytes, IO[bytes]]) -> T and dcargs.to_yaml(instance: T) -> str convert between YAML-style strings and dataclass instances.

The functions attempt to strike a balance between flexibility and robustness — in contrast to naively dumping or loading dataclass instances (via pickle, PyYAML, etc), explicit type references enable custom tags that are robust against code reorganization and refactor, while a PyYAML backend enables serialization of arbitrary Python objects.

Note that we generally prefer to use YAML purely for serialization, as opposed to a configuration interface that humans are expected to manually write or modify. Specifying things like loadable base configurations can be done directly in Python, which enables all of the usual autocompletion and type checking features.

Alternative tools

The core functionality of dcargs — generating argument parsers from type annotations — can be found as a subset of the features offered by many other libraries. A summary of some distinguishing features:

Choices from literals Generics Docstrings as helptext Nesting Subparsers Containers
dcargs
datargs
tap
simple-parsing soon
argparse-dataclass
argparse-dataclasses
dataclass-cli
clout
hf_argparser
pyrallis

Note that most of these other libraries are generally aimed specifically at dataclasses rather than general typed callables, but offer other features that you might find useful, such as registration for custom types (pyrallis), different approaches for serialization and config files (tap, pyrallis), simultaneous parsing of multiple dataclasses (simple-parsing), etc. Pull requests are welcome if you're missing any of these. :slightly_smiling_face:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcargs-0.1.7.tar.gz (56.3 kB view hashes)

Uploaded Source

Built Distribution

dcargs-0.1.7-py3-none-any.whl (41.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page