A parser combining dacite and argparse
Project description
Parsley Coco
Overview
Parsley Coco is a Python library that combines the power of dacite and argparse to provide a flexible and extensible parser for command-line arguments and configuration files. It supports recursive YAML parsing, dataclass-based argument definitions, and merging of arguments from multiple sources.
Features
- Recursive YAML Parsing: Parse nested YAML files into Python dataclasses.
- Dataclass-Based Argument Parsing: Define arguments using Python dataclasses for type safety and clarity.
- Command-Line and Config File Integration: Merge arguments from command-line inputs, YAML config files, and extra arguments.
extra_argsSupport: Programmatically provide additional arguments using an extended version of the base dataclass, allowing for flexible overrides and dynamic configurations.- Overwrite Functionality: Automatically resolve nested configurations and apply overwrite values from YAML files or programmatically provided arguments.
Requirements
- Python 3.12 or higher
Installation
To install the library, use pip:
pip install parsley-coco
Usage
Table of Contents
- Basic Example
- Input Arguments
- Precedence of Arguments
- Union Types, Defaults, and Discriminator Fields
- Recursive YAML Parsing
- Using Classes with a YAML Path Method (e.g., Enum Integration)
- Default Behavior Without a Config File
- Command-Line Arguments Handling
- Using
extra_argswithparse_arguments
Basic Example
Define your dataclasses and use create_parsley to create a parser, then instantiate the dataclass from (for instance) a yaml conf file:
from dataclasses import dataclass
from parsley import create_parsley, Parsley
@dataclass
class Config:
x: int
y: str
parser: Parsley[Config] = create_parsley(Config)
config: Config = parser.parse_arguments(config_file_path="path/to/config.yaml")
print(config)
Input Arguments
create_parsley
The create_parsley function initializes a Parsley parser for a given dataclass.
-
Arguments:
dataclass_type:- The dataclass type you want to parse (e.g.,
Config). - This defines the structure of the configuration, including fields, types, and default values.
- The dataclass type you want to parse (e.g.,
should_parse_command_line_arguments(optional):- A boolean indicating whether command-line arguments should be parsed.
- Defaults to
True. If set toFalse, command-line arguments will be ignored.
-
Returns:
- A
Parsleyparser instance that can parse arguments based on the provided dataclass.
- A
parse_arguments
The parse_arguments method of the Parsley parser parses arguments from multiple sources (command-line, YAML, extra_args).
-
Arguments:
extra_args(optional):- A dataclass instance or dictionary containing additional arguments.
- These arguments are merged with other sources and take precedence over YAML but are overridden by command-line arguments.
config_file_path(optional):- A string specifying the path to a YAML configuration file.
- If not provided, the parser looks for a
config_file_namekey inextra_argsor command-line arguments.
-
Returns:
- An instance of the dataclass populated with the merged arguments from all sources.
Precedence of Arguments
Parsley Coco merges arguments from multiple sources in a specific order of precedence. The final configuration is determined by the following hierarchy (from highest to lowest priority):
extra_args: Programmatically provided arguments via theextra_argsparameter inparse_argumentstake the highest priority. These values overwrite all other sources, including command-line arguments.- Command-Line Arguments: Arguments provided via the command line take precedence over YAML configuration files and default values in the dataclass.
- YAML Configuration File: Values from the YAML configuration file are used if not overridden by
extra_argsor command-line arguments. - Default Values in the Dataclass: If no value is provided from
extra_args, command-line arguments, or the YAML file, the default values defined in the dataclass are used.
This precedence ensures flexibility while maintaining a clear and predictable merging process.
Example
Consider the following setup:
-
Dataclass:
from dataclasses import dataclass @dataclass class Config: x: int = 0 y: str = "default"
-
YAML File (
config.yaml):x: 10 y: "from_yaml"
-
Command-Line Arguments:
--x 42
-
extra_args:{"y": "from_extra"}
Code Example
from parsley.alternative_dataclasses import make_dataclass_with_optional_paths_and_overwrite, make_partial_dataclass_with_optional_paths
from parsley.factory import create_parsley
parser = create_parsley(Config)
# creating an extented config dataclass that allows more flexibility (more later in the readme)
ExtendedConfig = make_partial_dataclass_with_optional_paths(Config)
# Parse arguments
config = parser.parse_arguments(
config_file_path="tests/yaml_files/config.yaml",
extra_args=ExtendedConfig(y= "from_extra")
)
Resulting Configuration
Config(x=42, y="from_extra")
Explanation
- The value of
xis42because the command-line argument--x 42overrides the YAML file value (10). - The value of
yis"from_extra"becauseextra_argstakes precedence over the YAML file ("from_yaml") and the default value ("default").
This updated explanation reflects the correct precedence order based on the implementation in the library. Let me know if you need further clarification or adjustments!
Union Types, Defaults, and Discriminator Fields
Parsley Coco uses dacite for parsing dictionaries into dataclasses, with Config(strict=False). This means that if a dataclass field is a union of multiple types (e.g., int | MyDataClass), and the dataclass has default values, any compatible type in the union can be used during parsing. For example, if your YAML provides an integer, it will be parsed as an int; if it provides a mapping, it will be parsed as a dataclass.
However, when using unions of dataclasses, we strongly recommend adding a discriminator field (such as Literal["my_type"]) to each dataclass. This helps dacite and Parsley Coco reliably determine which dataclass to instantiate when parsing nested structures.
Example
from dataclasses import dataclass
from typing import Literal
@dataclass
class OptionA:
discriminator: Literal["A"]
value: int
@dataclass
class OptionB:
discriminator: Literal["B"]
name: str
@dataclass
class Config:
option: OptionA | OptionB | int = 0
YAML Example:
option:
discriminator: B
name: "hello"
This will be parsed as Config(option=OptionB(discriminator="B", name="hello")).
YAML Example:
option: 42
This will be parsed as Config(option=42).
Recommendation:
Always include a discriminator field (using Literal[...]) in each dataclass used in a union. This ensures robust and predictable parsing, especially when your configuration can match multiple types.
Recursive YAML Parsing
Parsley Coco supports recursive YAML parsing. For example:
# config.yaml
x: 10
y: "hello"
nested_config_path_to_yaml_file: "nested_config.yaml"
# nested_config.yaml
z: 42
To handle this, you need to define your Config dataclass to include a field for the nested configuration:
from dataclasses import dataclass
from typing import Optional
from parsley import create_parsley, Parsley
@dataclass
class NestedConfig:
z: int
@dataclass
class Config:
x: int
y: str
nested_config: NestedConfig
parser: Parsley[Config] = create_parsley(Config)
config = parser.parse_arguments(config_file_path="path/to/config.yaml")
print(config)
In this example:
- The
Configdataclass includes a fieldnested_configof typeOptional[NestedConfig]. - The
NestedConfigdataclass defines the structure of the nested YAML file. - Parsley Coco will automatically resolve the nested YAML file into the
nested_configfield.
This ensures that the recursive YAML parsing works seamlessly with your dataclass structure.
Using Classes with a YAML Path Method (e.g., Enum Integration)
Parsley Coco supports advanced configuration patterns where a field in your dataclass can be an object (such as an Enum member) that provides a method to retrieve a YAML file path. If the class has a method (for example, get_yaml_file_path) that returns the path to a YAML file, Parsley will automatically load and parse the YAML file to instantiate the corresponding object.
This is especially useful for scenarios where you want to select a configuration "profile" or "preset" by name, and have the details loaded from a separate YAML file.
Example
Suppose you have an Enum for model presets, and each preset has a YAML file describing its configuration:
from enum import Enum
from dataclasses import dataclass
from parsley import create_parsley, Parsley
class ModelPreset(str, Enum):
small = "small"
large = "large"
def get_yaml_file_path(self) -> str:
return f"presets/{self.value}.yaml"
@dataclass
class ModelConfig:
preset: ModelPreset
# other fields...
@dataclass
class AppConfig:
model: ModelConfig
YAML Example (config.yaml):
model:
preset: large
YAML Example (presets/large.yaml):
# Any fields for ModelConfig, e.g.:
layers: 24
hidden_size: 1024
How it works
- When you specify
preset: largein your main YAML, Parsley will instantiate theModelPreset.largeenum. - Since
ModelPresethas aget_yaml_file_pathmethod, Parsley will call this method to get the path (presets/large.yaml), load the YAML file, and use its contents to populate theModelConfigdataclass.
Usage
parser = create_parsley(AppConfig)
config = parser.parse_arguments(config_file_path="config.yaml")
print(config)
This pattern allows you to keep your main configuration clean and delegate detailed settings to separate YAML files, referenced by simple tags or enum values.
Tip:
You can use this approach with any class, not just Enums, as long as it provides a get_yaml_file_path() method returning the YAML path as a string.
Default Behavior Without a Config File
If no configuration file is provided, Parsley Coco will instantiate the dataclass using its default arguments. This means that all fields in your dataclass should either have default values or be optional to ensure proper initialization.
Example
from dataclasses import dataclass
from parsley import create_parsley, Parsley
@dataclass
class Config:
x: int = 0 # Default value
y: str = "default" # Default value
parser: Parsley[Config] = create_parsley(Config)
# No config file provided
config = parser.parse_arguments()
print(config)
Output
Config(x=0, y="default")
Key Points:
- Default Values: Fields in the dataclass should have default values or be optional to avoid errors when no configuration file is provided.
- Fallback Behavior: This ensures that your application can run with default settings even if no external configuration is supplied.
By designing your dataclass with defaults, you make your application more robust and user-friendly.
Command-Line Arguments Handling
Parsley Coco integrates seamlessly with argparse to handle command-line arguments. Command-line arguments take the highest priority when merging configurations from multiple sources (e.g., YAML files, extra_args, and defaults).
How It Works
- Automatic Argument Parsing: Parsley Coco automatically generates command-line arguments based on the fields in your dataclass.
- Priority: Command-line arguments override values from YAML files,
extra_args, and default values in the dataclass. - Type Safety: The types of the arguments are inferred from the dataclass fields, ensuring type safety.
Example
from dataclasses import dataclass
from parsley import create_parsley, Parsley
@dataclass
class Config:
x: int
y: str
parser: Parsley[Config] = create_parsley(Config)
# Parse arguments from the command line
config = parser.parse_arguments()
print(config)
Command-Line Usage
If the script above is saved as example.py, you can run it with command-line arguments:
python example.py --x 42 --y "hello world"
Output
Config(x=42, y="hello world")
Key Points:
- Automatic Argument Names: The argument names are derived from the field names in the dataclass (e.g.,
xbecomes--x). - Type Conversion: Parsley Coco automatically converts the command-line arguments to the appropriate types based on the dataclass field types.
- Help Message: A help message is automatically generated for the command-line arguments.
Example with YAML and Command-Line Arguments
If a YAML file is provided along with command-line arguments, the command-line arguments will take precedence:
# config.yaml
x: 10
y: "from_yaml"
Run the script with:
python example.py --x 42
Resulting configuration:
Config(x=42, y="from_yaml")
This demonstrates how command-line arguments can override specific fields while retaining other values from the YAML file.
This section explains how Parsley Coco handles command-line arguments and their priority in the configuration merging process. Let me know if you need further clarification or adjustments!
Using extra_args with parse_arguments
The extra_args parameter in the parse_arguments function allows you to programmatically provide additional arguments. These arguments are passed as an instance of a dataclass that extends the base dataclass. This extended dataclass is created using the make_partial_dataclass_with_optional_paths function.
How make_partial_dataclass_with_optional_paths Works
The make_partial_dataclass_with_optional_paths function:
- Extends the Base Dataclass: It adds optional fields for:
- Paths to YAML files (e.g.,
field_name_path_to_yaml_file). - Overwrite values (e.g.,
field_name_overwrite).
- Paths to YAML files (e.g.,
- Makes All Fields Optional: This allows partial instantiation of the dataclass, making it flexible for use with
extra_args.
This function combines two steps:
make_dataclass_with_optional_paths_and_overwrite: Adds optional fields for paths and overwrite values.make_partial_dataclass: Makes all fields in the dataclass optional, including nested dataclasses.
Example
Base Dataclass
from dataclasses import dataclass
from parsley import create_parsley, Parsley
from parsley.alternative_dataclasses import make_partial_dataclass_with_optional_paths
@dataclass
class NestedConfig:
z: int
@dataclass
class Config:
x: int
y: str
nested_config: NestedConfig
Extended Dataclass
Using make_partial_dataclass_with_optional_paths, we create an extended version of the Config dataclass:
PartialConfig = make_partial_dataclass_with_optional_paths(Config)
This will generate a new dataclass with the following structure:
- All fields from
Configare optional. - Additional fields are added:
nested_config_path_to_yaml_file: Optional[str]nested_config_overwrite: Optional[NestedConfig]
Using extra_args in parse_arguments
You can now use the extended dataclass to provide additional arguments via extra_args:
# Create the parser
parser: Parsley[Config] = create_parsley(Config)
# Define extra arguments using the extended dataclass
extra_args = PartialConfig(
x=20, # Override the value of x
nested_config_overwrite=NestedConfig(z=100) # Override the nested configuration
)
# Parse arguments with extra_args
config = parser.parse_arguments(
config_file_path="path/to/config.yaml",
extra_args=extra_args
)
print(config)
Example YAML File
# config.yaml
x: 10
y: "hello"
nested_config:
z: 42
Output
Config(x=20, y="hello", nested_config=NestedConfig(z=100))
Using package_name for Package-Relative YAML Paths
Parsley Coco supports resolving YAML file paths that start with package:// by allowing you to specify a package_name (or package root path) in several functions, such as parse_arguments, resolve_yaml_file_to_base_dataclass, and related utilities. This enables you to reference data files within a package using a consistent, package-relative syntax.
Why Use package_name?
When developing a Python package that includes internal YAML configuration files, you may want to reference those files using paths like package://data/config.yaml. This is especially useful when your package is imported and used by another script or project, and you want to ensure that YAML file paths are resolved relative to the package’s location, not the caller’s working directory.
Example Scenario
Suppose you have a package my_package with the following structure:
my_package/
data/
config.yaml
main.py
And your YAML file references another YAML file using a package-relative path:
# config.yaml
nested_config_path_to_yaml_file: "package://data/nested.yaml"
Code Example
from parsley import create_parsley, Parsley
@dataclass
class NestedConfig:
z: int
@dataclass
class Config:
x: int
y: str
nested_config: NestedConfig
parser: Parsley[Config] = create_parsley(Config)
# Suppose your package root is '/home/user/my_package'
config = parser.parse_arguments(
config_file_path="/home/user/my_package/data/config.yaml",
package_name="/home/user/my_package"
)
print(config)
In this example:
- Any YAML path in your config that starts with
package://will be resolved relative to/home/user/my_package. - For instance,
package://data/nested.yamlbecomes/home/user/my_package/data/nested.yaml.
When Is This Needed?
- Package Internal Data: When your package ships with YAML data files and you want to reference them reliably.
- Imported Usage: When your package is imported by another script, and you want YAML references to resolve to your package’s data, not the importing script’s directory.
- Portability: Ensures that YAML references work regardless of where or how your package is used.
Tip:
Always use the package_name argument when you expect to resolve package:// paths, especially in reusable libraries or packages.
Key Points
- Extended Dataclass: The
make_partial_dataclass_with_optional_pathsfunction creates an extended version of the base dataclass with optional fields for paths and overwrites. - Partial Instantiation: The extended dataclass allows partial instantiation, making it flexible for use with
extra_args. - Priority: Values provided via
extra_argstake precedence over those in the YAML file or command-line arguments.
This approach provides a powerful way to programmatically override or extend configurations while maintaining type safety and flexibility.
Testing
Run the tests using tox:
tox
Development
Setting Up the Environment
-
Clone the repository:
git clone https://github.com/victorgabillon/parsley-coco.git cd parsley-coco
-
Install dependencies:
python -m pip install --upgrade pip pip install .
Running Tests
Use tox to run the tests:
tox
Code Formatting and Linting
-
Format code with
blackandisort:tox -e black tox -e isort
-
Lint code with
flake8:tox -e flake8
-
Type-check with
mypy:tox -e mypy
Contributing
Contributions are welcome! Please open an issue or submit a pull request on GitHub.
License
This project is licensed under the GPL-3.0 License.
Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parsley_coco-0.1.36.tar.gz.
File metadata
- Download URL: parsley_coco-0.1.36.tar.gz
- Upload date:
- Size: 47.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d3d22c31145ddf16f06f717f9f596856f98ffce419d0749f73f80659cbd753c
|
|
| MD5 |
d4f1c11267d69fccc988aa27673e77d1
|
|
| BLAKE2b-256 |
aa35044c4e3a2d579b1d89e7449b3f71bb033fc50f554182978c76d9e7928677
|
Provenance
The following attestation bundles were made for parsley_coco-0.1.36.tar.gz:
Publisher:
release.yml on victorgabillon/parsley
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parsley_coco-0.1.36.tar.gz -
Subject digest:
4d3d22c31145ddf16f06f717f9f596856f98ffce419d0749f73f80659cbd753c - Sigstore transparency entry: 937121777
- Sigstore integration time:
-
Permalink:
victorgabillon/parsley@e35a1b54679ef6bdc9ef6bfd2e4a4f36ccdbb67d -
Branch / Tag:
refs/tags/v0.1.36 - Owner: https://github.com/victorgabillon
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e35a1b54679ef6bdc9ef6bfd2e4a4f36ccdbb67d -
Trigger Event:
push
-
Statement type:
File details
Details for the file parsley_coco-0.1.36-py3-none-any.whl.
File metadata
- Download URL: parsley_coco-0.1.36-py3-none-any.whl
- Upload date:
- Size: 37.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb5d7b685194893aef8aef7300b01d5d12be8180b0d49ada55193a10f67489e7
|
|
| MD5 |
effb4305d849fb1359976aba1747ad44
|
|
| BLAKE2b-256 |
337b5532c9736c8d57808d0fb79ce953202443ac61c735a87f14a9fc1f67afb0
|
Provenance
The following attestation bundles were made for parsley_coco-0.1.36-py3-none-any.whl:
Publisher:
release.yml on victorgabillon/parsley
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parsley_coco-0.1.36-py3-none-any.whl -
Subject digest:
cb5d7b685194893aef8aef7300b01d5d12be8180b0d49ada55193a10f67489e7 - Sigstore transparency entry: 937121782
- Sigstore integration time:
-
Permalink:
victorgabillon/parsley@e35a1b54679ef6bdc9ef6bfd2e4a4f36ccdbb67d -
Branch / Tag:
refs/tags/v0.1.36 - Owner: https://github.com/victorgabillon
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e35a1b54679ef6bdc9ef6bfd2e4a4f36ccdbb67d -
Trigger Event:
push
-
Statement type: