Skip to main content

XML Utility tools with Pydantic and Typer

Project description

xmlu

XML Utility - Transform XML files into type-safe Pydantic models or JSON format with automatic type inference and structure detection.

Python 3.12+ License: MIT

Features

  • Automatic Type Inference: Detects str, int, and bool types from XML values
  • Smart Field Detection: Identifies required vs. optional fields based on occurrence patterns
  • Nested Model Support: Recognizes and generates nested models for XML elements with Name/Value attributes
  • Pythonic Naming: Converts XML tags to snake_case with field aliases to preserve original names
  • JSON Conversion: Convert XML files to JSON format with structured output
  • CLI & API: Use as a command-line tool or import as a Python library
  • Type-Safe: Generated models use Pydantic v2 for runtime validation
  • Rich Output: Beautiful terminal interface with progress information

Installation

# Using uv (recommended)
uv pip install xmlu

# Using pip
pip install xmlu

Quick Start

CLI Usage

# Generate Pydantic models from XML file
xmlu generate schedule.xml --parent Event

# Convert XML to JSON
xmlu xml-to-json schedule.xml --output schedule.json

# Specify custom output file
xmlu generate data.xml --parent Item --output models.py

# Verbose mode for detailed progress
xmlu generate schedule.xml --parent Event --verbose

# Show version
xmlu version

Python API

from xmlu import generate_pydantic_models, create_models_file, xml_to_json, xml_to_dict

# Generate Pydantic models
models = generate_pydantic_models("schedule.xml", parent_element="Event")
Event, Fields = models

# Create a models file
output_file = create_models_file("schedule.xml", list(models))
print(f"Generated: {output_file}")

# Convert XML to JSON
json_output = xml_to_json("schedule.xml", "schedule.json")

# Convert XML to dictionary (for programmatic use)
xml_dict = xml_to_dict("schedule.xml")
print(xml_dict)

# Use the generated models
event = Event(
    is_fixed=True,
    fields=Fields(duration="01:00:00", enabled=True)
)

How It Works

xmlu analyzes your XML structure and generates Pydantic models with intelligent defaults:

Input XML:

<Schedule>
    <Event>
        <EventId>1</EventId>
        <IsFixed>True</IsFixed>
        <Fields>
            <Parameter Name="Duration" Value="01:00:00" />
            <Parameter Name="Device" Value="SM-1" />
        </Fields>
    </Event>
    <Event>
        <EventId>2</EventId>
        <IsFixed>False</IsFixed>
        <Fields>
            <Parameter Name="Duration" Value="02:00:00" />
        </Fields>
    </Event>
</Schedule>

Generated Pydantic Models:

"""Auto-generated Pydantic models from Schedule.xml"""
from pydantic import BaseModel, Field, ConfigDict
from typing import Optional

class Fields(BaseModel):
    model_config = ConfigDict(
        str_strip_whitespace=True,
        validate_assignment=True,
        populate_by_name=True,
    )

    duration: str = Field(alias="Duration")
    device: Optional[str] = Field(None, alias="Device")  # Optional - not in all Events

class Event(BaseModel):
    model_config = ConfigDict(
        str_strip_whitespace=True,
        validate_assignment=True,
        populate_by_name=True,
    )

    event_id: int = Field(alias="EventId")
    is_fixed: bool = Field(alias="IsFixed")
    fields: Fields = Field(alias="Fields")

JSON Conversion

xmlu can also convert XML files directly to JSON format, preserving the hierarchical structure and handling repeated elements automatically.

Input XML:

<Schedule>
    <Event>
        <EventId>1</EventId>
        <IsFixed>True</IsFixed>
    </Event>
    <Event>
        <EventId>2</EventId>
        <IsFixed>False</IsFixed>
    </Event>
</Schedule>

Generated JSON:

{
  "Schedule": {
    "Event": [
      {
        "Event": {
          "EventId": "1",
          "IsFixed": "True"
        }
      },
      {
        "Event": {
          "EventId": "2", 
          "IsFixed": "False"
        }
      }
    ]
  }
}

JSON Conversion Features

  • Preserves Structure: Maintains XML hierarchy in JSON format
  • Handles Attributes: XML attributes are preserved as key-value pairs
  • Repeated Elements: Automatically converts repeated XML elements to JSON arrays
  • CLI and API: Available via command line and Python API
  • Flexible Output: Print to console or save to file

Features in Detail

Type Inference

xmlu automatically infers Python types:

  • Boolean: True, False, 0, 1bool
  • Integer: Numeric strings → int
  • String: All other values → str

Optional Fields

Fields that don't appear in every XML element are marked as Optional[T]:

# Field present in 2 out of 3 events → Optional
device: Optional[str] = None

Nested Structures

XML elements with child elements containing Name and Value attributes are automatically converted to nested models:

<Fields>
    <Parameter Name="Duration" Value="01:00:00" />
    <Parameter Name="Device" Value="SM-1" />
</Fields>

Becomes:

class Fields(BaseModel):
    duration: str
    device: str

Snake Case Conversion

XML tags are converted to Pythonic snake_case while preserving original names via field aliases:

# XML: <EventId>123</EventId>
event_id: int = Field(alias="EventId")

# Both work for parsing:
Event(event_id=123)
Event(EventId=123)

CLI Commands

generate

Generate Pydantic models from an XML file.

xmlu generate [FILE] [OPTIONS]

Options:

  • --parent, -p TEXT: XML tag to use as parent model (default: "Event")
  • --output, -o PATH: Output file path (default: auto-generated)
  • --verbose, -v: Show detailed progress information
  • --help: Show help message

Examples:

# Basic usage
xmlu generate schedule.xml

# Custom parent element
xmlu generate data.xml --parent CustomElement

# Specify output file
xmlu generate data.xml -o app/models.py

# Verbose output
xmlu generate schedule.xml --verbose

xml-to-json

Convert an XML file to JSON format.

xmlu xml-to-json [FILE] [OPTIONS]

Options:

  • --output, -o PATH: Output JSON file path (default: prints to console)
  • --help: Show help message

Examples:

# Convert to JSON and print to console
xmlu xml-to-json schedule.xml

# Save to specific JSON file
xmlu xml-to-json schedule.xml --output schedule.json

# Save to custom location
xmlu xml-to-json data.xml -o output/data.json

version

Show version information.

xmlu version

API Reference

generate_pydantic_models(file_path, parent_element="Event")

Generate Pydantic models from an XML file.

Parameters:

  • file_path (str): Path to the XML file
  • parent_element (str): XML tag name to use as parent model

Returns:

  • tuple[type[BaseModel], ...]: Tuple of Pydantic models (parent first, then nested models)

Example:

models = generate_pydantic_models("schedule.xml", "Event")
Event, Fields = models

create_models_file(file_path, models)

Write Pydantic models to a Python file.

Parameters:

  • file_path (str): Path to source XML file (used for naming output)
  • models (list[type[BaseModel]]): List of Pydantic models to write

Returns:

  • str: Name of the generated file

Example:

output_file = create_models_file("schedule.xml", list(models))
# Returns: "schedule_models.py"

xml_to_json(source_file, output_file=None, indent=4)

Convert an XML file to JSON format.

Parameters:

  • source_file (str | Path): Path to the XML file to convert
  • output_file (str | Path, optional): Path to save JSON output (if None, prints to console)
  • indent (int): Number of spaces for JSON indentation (default: 4)

Returns:

  • str: The JSON string representation of the XML

Example:

# Convert and print to console
json_str = xml_to_json("schedule.xml")

# Convert and save to file
xml_to_json("schedule.xml", "schedule.json")

# Custom indentation
xml_to_json("schedule.xml", "schedule.json", indent=2)

xml_to_dict(source_file)

Convert an XML file to a Python dictionary.

Parameters:

  • source_file (str | Path): Path to the XML file to convert

Returns:

  • dict: Dictionary representation of the XML structure

Example:

xml_dict = xml_to_dict("schedule.xml")
print(xml_dict["Schedule"]["Event"][0]["EventId"])

xml_to_console(source_file)

Print XML structure analysis to console (useful for debugging).

Parameters:

  • source_file (str | Path): Path to the XML file to analyze

Returns:

  • str: String representation of the XML structure

Example:

xml_to_console("schedule.xml")
# Prints hierarchical structure with attributes and text content

Utility Functions

convert_to_snake_case(name)

Convert any string to snake_case.

from xmlu import convert_to_snake_case

convert_to_snake_case("CamelCase")      # "camel_case"
convert_to_snake_case("HTTPServer")     # "http_server"
convert_to_snake_case("kebab-case")     # "kebab_case"

convert_to_pascal_case(name)

Convert any string to PascalCase.

from xmlu import convert_to_pascal_case

convert_to_pascal_case("snake_case")    # "SnakeCase"
convert_to_pascal_case("HTTPServer")    # "HTTPServer"
convert_to_pascal_case("kebab-case")    # "KebabCase"

Development

Setup

# Clone the repository
git clone https://github.com/MichielMe/xmlu.git
cd xmlu

# Install dependencies with uv
uv sync

# Run tests
make test

Running Tests

# Run all tests with pytest
make test

# Or directly with uv
uv run pytest -v

Project Structure

xmlu/
├── src/
│   └── xmlu/
│       ├── __init__.py              # Public API exports
│       ├── main.py                  # CLI application
│       ├── convert_to_pydantic.py   # Core model generation
│       ├── convert_to_json.py       # XML to JSON conversion
│       └── utils.py                 # String utilities & type inference
├── tests/
│   ├── test_generator.py            # Model generation tests
│   ├── test_json.py                 # JSON conversion tests
│   └── test_string_utils.py         # Utility function tests
├── pyproject.toml                   # Project configuration
└── README.md

Configuration

Generated models include sensible defaults:

model_config = ConfigDict(
    str_strip_whitespace=True,      # Auto-strip whitespace
    validate_assignment=True,        # Validate on field assignment
    populate_by_name=True,           # Accept both snake_case and original names
)

Requirements

  • Python 3.12+
  • lxml >= 6.0.2
  • pydantic >= 2.12.3
  • typer >= 0.20.0

License

MIT License - see LICENSE file for details.

Author

Michiel Meire

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Roadmap

  • JSON output format support
  • XML Schema (XSD) validation
  • Support for XML attributes (not just elements)
  • List/array type detection for repeated elements
  • Custom type mapping configuration
  • Watch mode for automatic regeneration

Acknowledgments

Built with:

  • Pydantic - Data validation using Python type hints
  • lxml - XML processing library
  • Typer - CLI framework
  • Rich - Terminal formatting (via Typer)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xmlu-0.1.2.tar.gz (42.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xmlu-0.1.2-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file xmlu-0.1.2.tar.gz.

File metadata

  • Download URL: xmlu-0.1.2.tar.gz
  • Upload date:
  • Size: 42.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.4

File hashes

Hashes for xmlu-0.1.2.tar.gz
Algorithm Hash digest
SHA256 31e681a5b627743cf826bbf9c7ddc69b15356af87008aca3574b58cfe85aaa53
MD5 cdc32f6ed99e27c14199816d89e11107
BLAKE2b-256 a2b4ef4169418f4b8cb4294c5357c39e5c5611a0e0812cb2a34ba6b200de9f6b

See more details on using hashes here.

File details

Details for the file xmlu-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: xmlu-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.4

File hashes

Hashes for xmlu-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 aa2dfdfd795fdec9703944d936fa2ec081c458475fde004618bfb7e29ae00daf
MD5 319fb50c9a927e87cb739e28741d7f44
BLAKE2b-256 d1661341a36f6c89dc216f519f4babf876a1e4586d101ba651740d31529fdda8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page