Portable, reusable, strongly typed CLIs from dataclass definitions
Project description
dcargs
dcargs is a library for building dataclass-based argument parsers and configuration objects.
The vision: we use (potentially nested or generic) dataclasses to define
configuration objects that can be (a) populated via a CLI interface without
additional effort and (b) robustly and human-readably serialized. The result is
a statically typed replacement for not only argparse
, but libraries likes
YACS and
ml_collections.
We expose a one-function argument parsing API:
dcargs.parse(cls: Type[T], *, description: Optional[str]) -> T
takes a dataclass type and instantiates it via an argparse-style CLI interface.
And two functions for dataclass serialization:
dcargs.from_yaml(cls: Type[T], stream: Union[str, IO[str], bytes, IO[bytes]]) -> T
anddcargs.to_yaml(instance: T) -> str
convert between YAML-style strings and dataclass instances. In contrast to naively dumping or loading (via pickle, PyYAML, etc), explicit type references enable robustness against code reorganization and refactor.
Simple example
import dataclasses
import dcargs
@dataclasses.dataclass
class Args:
field1: str # A string field.
field2: int # A numeric field.
if __name__ == "__main__":
args = dcargs.parse(Args)
print(args)
print()
print(dcargs.to_yaml(args))
Running python simple.py --help
would print:
usage: simple.py [-h] --field1 STR --field2 INT
optional arguments:
-h, --help show this help message and exit
required arguments:
--field1 STR A string field.
--field2 INT A numeric field.
And, from python simple.py --field1 string --field2 4
:
Args(field1='string', field2=4)
!dataclass:Args
field1: string
field2: 4
Feature list
The parse function supports a wide range of dataclass definitions, while automatically generating helptext from comments/docstrings. Some of the basic features are shown in the nesting example below.
Our unit tests cover many more complex type annotations, including classes containing:
- Types natively accepted by
argparse
: str, int, float, pathlib.Path, etc - Default values for optional parameters
- Booleans, which are automatically converted to flags when provided a default
value (eg
action="store_true"
oraction="store_false"
; in the latter case, we prefix names withno-
) - Enums (via
enum.Enum
; argparse'schoices
is populated and arguments are converted automatically) - Various container types. Some examples:
typing.ClassVar
types (omitted from parser)typing.Optional
typestyping.Literal
types (populates argparse'schoices
)typing.Sequence
types (populates argparse'snargs
)typing.List
types (populates argparse'snargs
)typing.Tuple
types, such astyping.Tuple[T1, T2, T3]
ortyping.Tuple[T, ...]
(populates argparse'snargs
, and converts automatically)typing.Set
types (populates argparse'snargs
, and converts automatically)typing.Final
types andtyping.Annotated
(for parsing, these are effectively no-ops)- Nested combinations of the above:
Optional[Literal[T]]
,Final[Optional[Sequence[T]]]
, etc
- Nested dataclasses
- Simple nesting (see
OptimizerConfig
example below) - Unions over nested dataclasses (subparsers)
- Optional unions over nested dataclasses (optional subparsers)
- Simple nesting (see
- Generic dataclasses (including nested generics, see ./examples/generics.py)
Comparisons to alternative tools
There are several alternative libraries to the parsing functionality of
dcargs
; here's a rough summary of some of them:
dataclasses | attrs | Nesting | Subparsers | Containers | Choices from literals | Docstrings as helptext | Generics | |
---|---|---|---|---|---|---|---|---|
dcargs | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
datargs | ✓ | ✓ | ✓ | ✓ | ✓ | |||
typed-argument-parser | ✓ | ✓ | ✓ | ✓ | ||||
simple-parsing | ✓ | ✓ | ✓ | ✓ | soon | ✓ | ||
argparse-dataclass | ✓ | |||||||
argparse-dataclasses | ✓ | |||||||
dataclass-cli | ✓ | |||||||
clout | ✓ | ✓ | ||||||
hf_argparser | ✓ | ✓ |
Some other distinguishing factors that dcargs
has put effort into:
- Robust handling of forward references
- Support for nested containers and generics
- Strong typing: we actively avoid relying on strings or dynamic namespace
objects (eg
argparse.Namespace
) - Simplicity + strict abstractions: we're focused on a single function API, and don't leak any argparse implementation details to the user level. We also intentionally don't offer any way to add argument parsing-specific logic to dataclass definitions. (in contrast, some of the libaries above rely heavily on dataclass field metadata, or on the more extreme end inheritance+decorators to make parsing-specific dataclasses)
Nested example
This code:
"""An argument parsing example.
Note that there are multiple possible ways to document dataclass attributes, all
of which are supported by the automatic helptext generator.
"""
import dataclasses
import enum
import dcargs
class OptimizerType(enum.Enum):
ADAM = enum.auto()
SGD = enum.auto()
@dataclasses.dataclass
class OptimizerConfig:
# Variant of SGD to use.
type: OptimizerType
# Learning rate to use.
learning_rate: float = 3e-4
# Coefficient for L2 regularization.
weight_decay: float = 1e-2
@dataclasses.dataclass
class ExperimentConfig:
experiment_name: str # Experiment name to use.
optimizer: OptimizerConfig
seed: int = 0
"""Random seed. This is helpful for making sure that our experiments are
all reproducible!"""
if __name__ == "__main__":
config = dcargs.parse(ExperimentConfig, description=__doc__)
print(config)
Generates the following argument parser:
$ python example.py --help
usage: example.py [-h] --experiment-name STR --optimizer.type {ADAM,SGD} [--optimizer.learning-rate FLOAT]
[--optimizer.weight-decay FLOAT] [--seed INT]
An argument parsing example.
Note that there are multiple possible ways to document dataclass attributes, all
of which are supported by the automatic helptext generator.
optional arguments:
-h, --help show this help message and exit
--optimizer.learning-rate FLOAT
Learning rate to use. (default: 0.0003)
--optimizer.weight-decay FLOAT
Coefficient for L2 regularization. (default: 0.01)
--seed INT Random seed. This is helpful for making sure that our experiments are
all reproducible! (default: 0)
required arguments:
--experiment-name STR
Experiment name to use.
--optimizer.type {ADAM,SGD}
Variant of SGD to use.
Examples of additional features can be found in our unit tests.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.