No project description provided
Project description
xml-to-pydantic
xml-to-pydantic is a library for Python to convert XML or HTML to pydantic models. This can be used to:
- Parse and validate a scraped HTML page into a python object
- Parse and validate an XML response from an XML-based API
- Parse and validate data stored in XML format
(Please note that this project is not affiliated in any way with the great team at pydantic.)
pydantic is a Python library for data validation, applying type hints / annotations. It enables the creation of easy or complex data validation rules for processing external data. That data usually comes in JSON format or from a Python dictionary.
But to process and validate HTML or XML into pydantic models would then require two steps: convert the HTML or XML to a Python dictionary, then convert to the pydantic model. This libary provides a convenient way to combine those steps.
Note: if you are using this library to parse external, uncontrolled HTML or XML, you should be aware of possible attack vectors through XML: [https://github.com/tiran/defusedxml]. This library uses lxml under the hood.
Installation
Use pip, or your favorite Python package manager (pipenv, poetry, pdm, ...):
pip install xml-to-pydantic
Usage
The HTML or XML data is extracted using XPath. For simple documents, the XPath can be calcualted from the model:
from xml_to_pydantic import ConfigDict, XmlBaseModel
html_bytes = b"""
<!doctype html>
<html lang="en-US">
<head>
<meta charset="utf-8" />
<title>My page title</title>
</head>
<body>
<header>
<h1>Header</h1>
</header>
<main>
<p>Paragraph1</p>
<p>Paragraph2</p>
<p>Paragraph3</p>
</main>
</body>
</html>
"""
class MainContent(XmlBaseModel):
model_config = ConfigDict(xpath_root="/html/body/main")
p: list[str]
result = MainContent.model_validate_html(html_bytes)
print(result)
#> p=['Paragraph1', 'Paragraph2', 'Paragraph3']
from xml_to_pydantic import XmlBaseModel
xml_bytes = b"""<?xml version="1.0" encoding="UTF-8"?>
<root>
<element>4.53</element>
<element>3.25</element>
</root>
"""
class MyModel(XmlBaseModel):
element: list[float]
model = MyModel.model_validate_xml(xml_bytes)
print(model)
#> element=[4.53, 3.25]
However, for more complicated XML, this one-to-one correspondance may not be convenient, and a better approach is supplying the xpath directly (similar to how pydantic allows specifying an alias for a field):
from xml_to_pydantic import XmlBaseModel, XmlField
xml_bytes = b"""<?xml version="1.0" encoding="UTF-8"?>
<root>
<element>4.53</element>
<a href="https://example.com">Link</a>
</root>
"""
class MyModel(XmlBaseModel):
number: float = XmlField(xpath="./element/text()")
href: str = XmlField(xpath="./a/@href")
model = MyModel.model_validate_xml(xml_bytes)
print(model)
#> number=4.53 href='https://example.com'
The parsing can also deal with nested models and lists:
from xml_to_pydantic import XmlBaseModel, XmlField
xml_bytes = b"""<?xml version="1.0" encoding="UTF-8"?>
<root>
<level1>
<level2>value1</level2>
<level2>value2</level2>
<level2>value3</level2>
</level1>
<level11>value11</level11>
</root>
"""
class NextLevel(XmlBaseModel):
level2: list[str] = XmlField(xpath="./level2/text()")
class MyModel(XmlBaseModel):
next_level: NextLevel = XmlField(xpath="./level1")
level_11: list[str] = XmlField(xpath="./level11/text()")
model = MyModel.model_validate_xml(xml_bytes)
print(model)
#> next_level=NextLevel(level2=['value1', 'value2', 'value3']) level_11=['value11']
Development
Prerequisites:
- Any Python 3.8 through 3.12
- poetry for dependency management
- git
- make (to use the helper scripts in the Makefile)
Autoformatting can be applied by running
make lintable
Before commiting, remember to run
make lint
make test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file xml_to_pydantic-0.2.tar.gz
.
File metadata
- Download URL: xml_to_pydantic-0.2.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37b80e56f1c088c05fc9c1058d40d49ebcb790d7c582fe02736deafb720e4f3a |
|
MD5 | cb244bfc1c8d1b4d1779d39b95281afa |
|
BLAKE2b-256 | 462f0a389a51b816da5bb64fc186e364dccb844077c50f6d89f27dc902e18e53 |
File details
Details for the file xml_to_pydantic-0.2-py3-none-any.whl
.
File metadata
- Download URL: xml_to_pydantic-0.2-py3-none-any.whl
- Upload date:
- Size: 6.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8adb87fa4e9ce5918bf4375770857b8f4d90c5c87d827399af5080b76f3fd956 |
|
MD5 | 15121fb723f2b490bac56d20fbd4f0a5 |
|
BLAKE2b-256 | cd163318065d22b96a2e65bbf7e55cf20c0ab02eebb86c65c7801e1deb6e85eb |