Skip to main content

A parser combining dacite and argparse

Project description

Parsley Coco

Python Checked with mypy Ruff Linting: pylint Tests

Overview

Parsley Coco is a Python library that combines the power of dacite and argparse to provide a flexible and extensible parser for command-line arguments and configuration files. It supports recursive YAML parsing, dataclass-based argument definitions, and merging of arguments from multiple sources.

Features

  • Recursive YAML Parsing: Parse nested YAML files into Python dataclasses.
  • Dataclass-Based Argument Parsing: Define arguments using Python dataclasses for type safety and clarity.
  • Command-Line and Config File Integration: Merge arguments from command-line inputs, YAML config files, and extra arguments.
  • extra_args Support: Programmatically provide additional arguments using an extended version of the base dataclass, allowing for flexible overrides and dynamic configurations.
  • Overwrite Functionality: Automatically resolve nested configurations and apply overwrite values from YAML files or programmatically provided arguments.

Requirements

  • Python 3.12 or higher

Installation

To install the library, use pip:

pip install parsley-coco

Usage

Table of Contents


Basic Example

Define your dataclasses and use create_parsley to create a parser, then instantiate the dataclass from (for instance) a yaml conf file:

from dataclasses import dataclass
from parsley import create_parsley, Parsley

@dataclass
class Config:
    x: int
    y: str

parser: Parsley[Config] = create_parsley(Config)
config: Config = parser.parse_arguments(config_file_path="path/to/config.yaml")
print(config)

Input Arguments

create_parsley

The create_parsley function initializes a Parsley parser for a given dataclass.

  • Arguments:

    1. dataclass_type:
      • The dataclass type you want to parse (e.g., Config).
      • This defines the structure of the configuration, including fields, types, and default values.
    2. should_parse_command_line_arguments (optional):
      • A boolean indicating whether command-line arguments should be parsed.
      • Defaults to True. If set to False, command-line arguments will be ignored.
  • Returns:

    • A Parsley parser instance that can parse arguments based on the provided dataclass.

parse_arguments

The parse_arguments method of the Parsley parser parses arguments from multiple sources (command-line, YAML, extra_args).

  • Arguments:

    1. extra_args (optional):
      • A dataclass instance or dictionary containing additional arguments.
      • These arguments are merged with other sources and take precedence over YAML but are overridden by command-line arguments.
    2. config_file_path (optional):
      • A string specifying the path to a YAML configuration file.
      • If not provided, the parser looks for a config_file_name key in extra_args or command-line arguments.
  • Returns:

    • An instance of the dataclass populated with the merged arguments from all sources.

Precedence of Arguments

Parsley Coco merges arguments from multiple sources in a specific order of precedence. The final configuration is determined by the following hierarchy (from highest to lowest priority):

  1. extra_args: Programmatically provided arguments via the extra_args parameter in parse_arguments take the highest priority. These values overwrite all other sources, including command-line arguments.
  2. Command-Line Arguments: Arguments provided via the command line take precedence over YAML configuration files and default values in the dataclass.
  3. YAML Configuration File: Values from the YAML configuration file are used if not overridden by extra_args or command-line arguments.
  4. Default Values in the Dataclass: If no value is provided from extra_args, command-line arguments, or the YAML file, the default values defined in the dataclass are used.

This precedence ensures flexibility while maintaining a clear and predictable merging process.


Example

Consider the following setup:

  • Dataclass:

    from dataclasses import dataclass
    
    @dataclass
    class Config:
        x: int = 0
        y: str = "default"
    
  • YAML File (config.yaml):

    x: 10
    y: "from_yaml"
    
  • Command-Line Arguments:

    --x 42
    
  • extra_args:

    {"y": "from_extra"}
    

Code Example

from parsley.alternative_dataclasses import make_dataclass_with_optional_paths_and_overwrite, make_partial_dataclass_with_optional_paths
from parsley.factory import create_parsley

parser = create_parsley(Config)

# creating an extented config dataclass that allows more flexibility (more later in the readme)
ExtendedConfig = make_partial_dataclass_with_optional_paths(Config)

# Parse arguments
config = parser.parse_arguments(
    config_file_path="tests/yaml_files/config.yaml",
    extra_args=ExtendedConfig(y= "from_extra")
)

Resulting Configuration

Config(x=42, y="from_extra")

Explanation

  1. The value of x is 42 because the command-line argument --x 42 overrides the YAML file value (10).
  2. The value of y is "from_extra" because extra_args takes precedence over the YAML file ("from_yaml") and the default value ("default").

This updated explanation reflects the correct precedence order based on the implementation in the library. Let me know if you need further clarification or adjustments!

Union Types, Defaults, and Discriminator Fields

Parsley Coco uses dacite for parsing dictionaries into dataclasses, with Config(strict=False). This means that if a dataclass field is a union of multiple types (e.g., int | MyDataClass), and the dataclass has default values, any compatible type in the union can be used during parsing. For example, if your YAML provides an integer, it will be parsed as an int; if it provides a mapping, it will be parsed as a dataclass.

However, when using unions of dataclasses, we strongly recommend adding a discriminator field (such as Literal["my_type"]) to each dataclass. This helps dacite and Parsley Coco reliably determine which dataclass to instantiate when parsing nested structures.

Example

from dataclasses import dataclass
from typing import Literal

@dataclass
class OptionA:
    discriminator: Literal["A"]
    value: int

@dataclass
class OptionB:
    discriminator: Literal["B"]
    name: str

@dataclass
class Config:
    option: OptionA | OptionB | int = 0

YAML Example:

option:
  discriminator: B
  name: "hello"

This will be parsed as Config(option=OptionB(discriminator="B", name="hello")).

YAML Example:

option: 42

This will be parsed as Config(option=42).

Recommendation: Always include a discriminator field (using Literal[...]) in each dataclass used in a union. This ensures robust and predictable parsing, especially when your configuration can match multiple types.

Recursive YAML Parsing

Parsley Coco supports recursive YAML parsing. For example:

# config.yaml
x: 10
y: "hello"
nested_config_path_to_yaml_file: "nested_config.yaml"
# nested_config.yaml
z: 42

To handle this, you need to define your Config dataclass to include a field for the nested configuration:

from dataclasses import dataclass
from typing import Optional
from parsley import create_parsley, Parsley

@dataclass
class NestedConfig:
    z: int

@dataclass
class Config:
    x: int
    y: str
    nested_config: NestedConfig

parser: Parsley[Config] = create_parsley(Config)
config = parser.parse_arguments(config_file_path="path/to/config.yaml")
print(config)

In this example:

  • The Config dataclass includes a field nested_config of type Optional[NestedConfig].
  • The NestedConfig dataclass defines the structure of the nested YAML file.
  • Parsley Coco will automatically resolve the nested YAML file into the nested_config field.

This ensures that the recursive YAML parsing works seamlessly with your dataclass structure.

Using Classes with a YAML Path Method (e.g., Enum Integration)

Parsley Coco supports advanced configuration patterns where a field in your dataclass can be an object (such as an Enum member) that provides a method to retrieve a YAML file path. If the class has a method (for example, get_yaml_file_path) that returns the path to a YAML file, Parsley will automatically load and parse the YAML file to instantiate the corresponding object.

This is especially useful for scenarios where you want to select a configuration "profile" or "preset" by name, and have the details loaded from a separate YAML file.

Example

Suppose you have an Enum for model presets, and each preset has a YAML file describing its configuration:

from enum import Enum
from dataclasses import dataclass
from parsley import create_parsley, Parsley

class ModelPreset(str, Enum):
    small = "small"
    large = "large"

    def get_yaml_file_path(self) -> str:
        return f"presets/{self.value}.yaml"

@dataclass
class ModelConfig:
    preset: ModelPreset
    # other fields...

@dataclass
class AppConfig:
    model: ModelConfig

YAML Example (config.yaml):

model:
  preset: large

YAML Example (presets/large.yaml):

# Any fields for ModelConfig, e.g.:
layers: 24
hidden_size: 1024

How it works

  • When you specify preset: large in your main YAML, Parsley will instantiate the ModelPreset.large enum.
  • Since ModelPreset has a get_yaml_file_path method, Parsley will call this method to get the path (presets/large.yaml), load the YAML file, and use its contents to populate the ModelConfig dataclass.

Usage

parser = create_parsley(AppConfig)
config = parser.parse_arguments(config_file_path="config.yaml")
print(config)

This pattern allows you to keep your main configuration clean and delegate detailed settings to separate YAML files, referenced by simple tags or enum values.

Tip: You can use this approach with any class, not just Enums, as long as it provides a get_yaml_file_path() method returning the YAML path as a string.

Default Behavior Without a Config File

If no configuration file is provided, Parsley Coco will instantiate the dataclass using its default arguments. This means that all fields in your dataclass should either have default values or be optional to ensure proper initialization.

Example

from dataclasses import dataclass
from parsley import create_parsley, Parsley

@dataclass
class Config:
    x: int = 0  # Default value
    y: str = "default"  # Default value

parser: Parsley[Config] = create_parsley(Config)

# No config file provided
config = parser.parse_arguments()
print(config)

Output

Config(x=0, y="default")

Key Points:

  1. Default Values: Fields in the dataclass should have default values or be optional to avoid errors when no configuration file is provided.
  2. Fallback Behavior: This ensures that your application can run with default settings even if no external configuration is supplied.

By designing your dataclass with defaults, you make your application more robust and user-friendly.

Command-Line Arguments Handling

Parsley Coco integrates seamlessly with argparse to handle command-line arguments. Command-line arguments take the highest priority when merging configurations from multiple sources (e.g., YAML files, extra_args, and defaults).

How It Works

  1. Automatic Argument Parsing: Parsley Coco automatically generates command-line arguments based on the fields in your dataclass.
  2. Priority: Command-line arguments override values from YAML files, extra_args, and default values in the dataclass.
  3. Type Safety: The types of the arguments are inferred from the dataclass fields, ensuring type safety.

Example

from dataclasses import dataclass
from parsley import create_parsley, Parsley

@dataclass
class Config:
    x: int
    y: str

parser: Parsley[Config] = create_parsley(Config)

# Parse arguments from the command line
config = parser.parse_arguments()
print(config)

Command-Line Usage

If the script above is saved as example.py, you can run it with command-line arguments:

python example.py --x 42 --y "hello world"

Output

Config(x=42, y="hello world")

Key Points:

  1. Automatic Argument Names: The argument names are derived from the field names in the dataclass (e.g., x becomes --x).
  2. Type Conversion: Parsley Coco automatically converts the command-line arguments to the appropriate types based on the dataclass field types.
  3. Help Message: A help message is automatically generated for the command-line arguments.

Example with YAML and Command-Line Arguments

If a YAML file is provided along with command-line arguments, the command-line arguments will take precedence:

# config.yaml
x: 10
y: "from_yaml"

Run the script with:

python example.py --x 42

Resulting configuration:

Config(x=42, y="from_yaml")

This demonstrates how command-line arguments can override specific fields while retaining other values from the YAML file.


This section explains how Parsley Coco handles command-line arguments and their priority in the configuration merging process. Let me know if you need further clarification or adjustments!

Using extra_args with parse_arguments

The extra_args parameter in the parse_arguments function allows you to programmatically provide additional arguments. These arguments are passed as an instance of a dataclass that extends the base dataclass. This extended dataclass is created using the make_partial_dataclass_with_optional_paths function.


How make_partial_dataclass_with_optional_paths Works

The make_partial_dataclass_with_optional_paths function:

  1. Extends the Base Dataclass: It adds optional fields for:
    • Paths to YAML files (e.g., field_name_path_to_yaml_file).
    • Overwrite values (e.g., field_name_overwrite).
  2. Makes All Fields Optional: This allows partial instantiation of the dataclass, making it flexible for use with extra_args.

This function combines two steps:

  • make_dataclass_with_optional_paths_and_overwrite: Adds optional fields for paths and overwrite values.
  • make_partial_dataclass: Makes all fields in the dataclass optional, including nested dataclasses.

Example

Base Dataclass

from dataclasses import dataclass
from parsley import create_parsley, Parsley
from parsley.alternative_dataclasses import make_partial_dataclass_with_optional_paths

@dataclass
class NestedConfig:
    z: int

@dataclass
class Config:
    x: int
    y: str
    nested_config: NestedConfig

Extended Dataclass

Using make_partial_dataclass_with_optional_paths, we create an extended version of the Config dataclass:

PartialConfig = make_partial_dataclass_with_optional_paths(Config)

This will generate a new dataclass with the following structure:

  • All fields from Config are optional.
  • Additional fields are added:
    • nested_config_path_to_yaml_file: Optional[str]
    • nested_config_overwrite: Optional[NestedConfig]

Using extra_args in parse_arguments

You can now use the extended dataclass to provide additional arguments via extra_args:

# Create the parser
parser: Parsley[Config] = create_parsley(Config)

# Define extra arguments using the extended dataclass
extra_args = PartialConfig(
    x=20,  # Override the value of x
    nested_config_overwrite=NestedConfig(z=100)  # Override the nested configuration
)

# Parse arguments with extra_args
config = parser.parse_arguments(
    config_file_path="path/to/config.yaml",
    extra_args=extra_args
)

print(config)

Example YAML File

# config.yaml
x: 10
y: "hello"
nested_config:
  z: 42

Output

Config(x=20, y="hello", nested_config=NestedConfig(z=100))

Using package_name for Package-Relative YAML Paths

Parsley Coco supports resolving YAML file paths that start with package:// by allowing you to specify a package_name (or package root path) in several functions, such as parse_arguments, resolve_yaml_file_to_base_dataclass, and related utilities. This enables you to reference data files within a package using a consistent, package-relative syntax.

Why Use package_name?

When developing a Python package that includes internal YAML configuration files, you may want to reference those files using paths like package://data/config.yaml. This is especially useful when your package is imported and used by another script or project, and you want to ensure that YAML file paths are resolved relative to the package’s location, not the caller’s working directory.

Example Scenario

Suppose you have a package my_package with the following structure:

my_package/
  data/
    config.yaml
  main.py

And your YAML file references another YAML file using a package-relative path:

# config.yaml
nested_config_path_to_yaml_file: "package://data/nested.yaml"

Code Example

from parsley import create_parsley, Parsley

@dataclass
class NestedConfig:
    z: int

@dataclass
class Config:
    x: int
    y: str
    nested_config: NestedConfig

parser: Parsley[Config] = create_parsley(Config)

# Suppose your package root is '/home/user/my_package'
config = parser.parse_arguments(
    config_file_path="/home/user/my_package/data/config.yaml",
    package_name="/home/user/my_package"
)
print(config)

In this example:

  • Any YAML path in your config that starts with package:// will be resolved relative to /home/user/my_package.
  • For instance, package://data/nested.yaml becomes /home/user/my_package/data/nested.yaml.

When Is This Needed?

  • Package Internal Data: When your package ships with YAML data files and you want to reference them reliably.
  • Imported Usage: When your package is imported by another script, and you want YAML references to resolve to your package’s data, not the importing script’s directory.
  • Portability: Ensures that YAML references work regardless of where or how your package is used.

Tip: Always use the package_name argument when you expect to resolve package:// paths, especially in reusable libraries or packages.

Key Points

  1. Extended Dataclass: The make_partial_dataclass_with_optional_paths function creates an extended version of the base dataclass with optional fields for paths and overwrites.
  2. Partial Instantiation: The extended dataclass allows partial instantiation, making it flexible for use with extra_args.
  3. Priority: Values provided via extra_args take precedence over those in the YAML file or command-line arguments.

This approach provides a powerful way to programmatically override or extend configurations while maintaining type safety and flexibility.

Testing

Run the tests using tox:

tox

Development

Setting Up the Environment

  1. Clone the repository:

    git clone https://github.com/victorgabillon/parsley-coco.git
    cd parsley-coco
    
  2. Install dependencies:

    python -m pip install --upgrade pip
    pip install .
    

Running Tests

Use tox to run the tests:

tox

Code Formatting and Linting

  • Format code with black and isort:

    tox -e black
    tox -e isort
    
  • Lint code with flake8:

    tox -e flake8
    
  • Type-check with mypy:

    tox -e mypy
    

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

License

This project is licensed under the GPL-3.0 License.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsley_coco-0.1.36.tar.gz (47.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parsley_coco-0.1.36-py3-none-any.whl (37.2 kB view details)

Uploaded Python 3

File details

Details for the file parsley_coco-0.1.36.tar.gz.

File metadata

  • Download URL: parsley_coco-0.1.36.tar.gz
  • Upload date:
  • Size: 47.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parsley_coco-0.1.36.tar.gz
Algorithm Hash digest
SHA256 4d3d22c31145ddf16f06f717f9f596856f98ffce419d0749f73f80659cbd753c
MD5 d4f1c11267d69fccc988aa27673e77d1
BLAKE2b-256 aa35044c4e3a2d579b1d89e7449b3f71bb033fc50f554182978c76d9e7928677

See more details on using hashes here.

Provenance

The following attestation bundles were made for parsley_coco-0.1.36.tar.gz:

Publisher: release.yml on victorgabillon/parsley

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file parsley_coco-0.1.36-py3-none-any.whl.

File metadata

  • Download URL: parsley_coco-0.1.36-py3-none-any.whl
  • Upload date:
  • Size: 37.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parsley_coco-0.1.36-py3-none-any.whl
Algorithm Hash digest
SHA256 cb5d7b685194893aef8aef7300b01d5d12be8180b0d49ada55193a10f67489e7
MD5 effb4305d849fb1359976aba1747ad44
BLAKE2b-256 337b5532c9736c8d57808d0fb79ce953202443ac61c735a87f14a9fc1f67afb0

See more details on using hashes here.

Provenance

The following attestation bundles were made for parsley_coco-0.1.36-py3-none-any.whl:

Publisher: release.yml on victorgabillon/parsley

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page