Portable, reusable, strongly typed CLIs from dataclass definitions
Project description
dcargs
Overview
dcargs
is a library for defining argument parsers and configuration
objects using standard Python dataclasses.
Installation is simple:
pip install dcargs
Core interface
Our core interface is composed of a single function, which instantiates a dataclass from an automatically generated CLI interface:
|
Example usage
If you're familiar with dataclasses, writing a script with dcargs.parse()
is
simple!
# examples/simple.py
import dataclasses
import dcargs
@dataclasses.dataclass
class Args:
field1: str # A string field.
field2: int # A numeric field.
flag: bool = False # A boolean flag.
if __name__ == "__main__":
args = dcargs.parse(Args)
print(args)
We can run this to get:
$ python simple.py --help
usage: simple.py [-h] --field1 STR --field2 INT [--flag]
required arguments:
--field1 STR A string field.
--field2 INT A numeric field.
optional arguments:
-h, --help show this help message and exit
--flag A boolean flag.
$ python simple.py --field1 string --field2 4
Args(field1='string', field2=4, flag=False)
Note that we also support significantly more complex structures and annotations, including nested dataclasses, container types, generics, optional and default arguments, enums, and more. Examples of additional features can be found in the examples and unit tests; a feature list is also included below.
Motivation
Compared to other options, using dataclasses for configuration is:
- Low effort. Type annotations, docstrings, and default values for dataclass fields can be used to automatically generate argument parsers.
- Noninvasive. Dataclasses themselves are part of the standard Python
library; defining them requires no external dependencies and they can be
easily instantiated without
dcargs
(for example, within quick experiments in Jupyter notebooks). - Modular. Most approaches to configuration objects require a centralized definition of all configurable fields. Hierarchically nesting dataclasses, however, makes it easy to distribute definitions, defaults, and documentation of configurable fields across modules or source files. A model configuration dataclass, for example, can be co-located in its entirety with the model implementation and dropped into any experiment configuration dataclass with an import — this eliminates the redundancy you typically see with the argparse equivalent and makes the entire module easy to port across codebases.
- Strongly typed. Unlike dynamic configuration namespaces produced by
libraries like
argparse
,YACS
,abseil
, orml_collections
, dataclasses are robustly supported by static type checking tools (mypy, pyright, etc), as well as IDEs and language servers. This means code can be checked automatically for errors and typos, and IDE-assisted autocomplete, rename, refactor, and jump operations work out-of-the-box.
Serialization
As a secondary feature, we also introduce two functions for human-readable dataclass serialization:
dcargs.from_yaml(cls: Type[T], stream: Union[str, IO[str], bytes, IO[bytes]]) -> T
anddcargs.to_yaml(instance: T) -> str
convert between YAML-style strings and dataclass instances.
The functions attempt to strike a balance between flexibility and robustness — in contrast to naively dumping or loading dataclass instances (via pickle, PyYAML, etc), explicit type references enable custom tags that are robust against code reorganization and refactor, while a PyYAML backend enables serialization of arbitrary Python objects.
Particularly for cases where serialized dataclasses need to exit the Python ecosystem, dacite is also a good option (at the cost of a little bit of flexibility).
Feature list
The parse function supports a wide range of dataclass definitions, while automatically generating helptext from comments/docstrings. A selection of features are shown in the examples.
Our unit tests cover many more complex type annotations, including classes containing:
- Types natively accepted by
argparse
: str, int, float, pathlib.Path, etc - Default values for optional parameters
- Booleans, which are automatically converted to flags when provided a default
value (eg
action="store_true"
oraction="store_false"
; in the latter case, we prefix names withno-
) - Enums (via
enum.Enum
; argparse'schoices
is populated and arguments are converted automatically) - Various container types. Some examples:
typing.ClassVar
types (omitted from parser)typing.Optional
typestyping.Literal
types (populates argparse'schoices
)typing.Sequence
types (populates argparse'snargs
)typing.List
types (populates argparse'snargs
)typing.Tuple
types, such astyping.Tuple[T1, T2, T3]
ortyping.Tuple[T, ...]
(populates argparse'snargs
, and converts automatically)typing.Set
types (populates argparse'snargs
, and converts automatically)typing.Final
types andtyping.Annotated
(for parsing, these are effectively no-ops)- Nested combinations of the above:
Optional[Literal[T]]
,Final[Optional[Sequence[T]]]
, etc
- Nested dataclasses
- Simple nesting (see
OptimizerConfig
in ./examples/example.py) - Unions over nested dataclasses (subparsers, see ./examples/subparsers.py)
- Optional unions over nested dataclasses (optional subparsers)
- Simple nesting (see
- Generic dataclasses (including nested generics, see ./examples/generics.py)
Comparisons to alternative tools
There are several alternatives for the parsing functionality of dcargs
; here's
a rough summary of some of them:
dataclasses | attrs | Choices from literals | Generics | Docstrings as helptext | Nesting | Subparsers | Containers | |
---|---|---|---|---|---|---|---|---|
dcargs | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
datargs | ✓ | ✓ | ✓ | ✓ | ✓ | |||
typed-argument-parser | ✓ | ✓ | ✓ | ✓ | ||||
simple-parsing | ✓ | soon | ✓ | ✓ | ✓ | ✓ | ||
argparse-dataclass | ✓ | |||||||
argparse-dataclasses | ✓ | |||||||
dataclass-cli | ✓ | |||||||
clout | ✓ | ✓ | ||||||
hf_argparser | ✓ | ✓ |
Some other distinguishing factors that we've put effort into:
- Robust handling of forward references
- Support for nested containers and generics
- Strong typing: we actively avoid relying on strings or dynamic namespace
objects (eg
argparse.Namespace
) - Simplicity + strict abstractions: we're focused on a single function API, and don't leak any argparse implementation details to the user level. We also intentionally don't offer any way to add argument parsing-specific logic to dataclass definitions. (in contrast, some of the libaries above rely heavily on dataclass field metadata, or on the more extreme end inheritance+decorators to make parsing-specific dataclasses)
- POSIX compatibility. For example, field names with underscores
(
argument_name
) are parsed with hyphens (--argument-name
). (why)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.