QwQ Tag for parsing xml
Project description
QwQ Tag
A lightweight Python library for parsing XML/HTML content into structured, type-safe objects using Pydantic models. qwq-tag provides a simple and intuitive way to work with XML/HTML data while maintaining strong type safety and validation.
Features
- 🚀 Simple API: Parse XML/HTML strings with a single method call
- 🔒 Type Safety: Built on Pydantic for robust data validation and type hints
- 🌐 Flexible Parsing: Handles malformed XML/HTML with recovery parsing
- 📦 Lightweight: Minimal dependencies (only
lxmlandpydantic) - 🎯 Mixed Content Support: Properly handles text and nested elements
- 🔄 Multiple Root Elements: Can parse fragments with multiple top-level elements
- 🧹 Clean Output: Automatically handles whitespace normalization
Installation
Using pip
pip install qwq-tag
Using PDM
pdm add qwq-tag
Requirements
- Python 3.10+
- lxml >= 6.0.0
- pydantic >= 2.11.7
Quick Start
from qwq_tag import QwqTag
# Parse simple XML
html = '<div class="container">Hello World</div>'
tags = QwqTag.from_str(html)
# Access the parsed content
tag = tags[0]
print(tag.name) # "div"
print(tag.content) # ["Hello World"]
print(tag.attr) # {"class": "container"}
print(tag.content_text) # "Hello World"
Usage Examples
Basic XML Parsing
from qwq_tag import QwqTag
# Simple element with attributes
xml = '<p class="text" id="intro">Hello World</p>'
result = QwqTag.from_str(xml)
tag = result[0]
print(f"Tag: {tag.name}") # Tag: p
print(f"Content: {tag.content}") # Content: ['Hello World']
print(f"Class: {tag.attr['class']}") # Class: text
print(f"ID: {tag.attr['id']}") # ID: intro
Nested Elements
# Nested structure
xml = """
<div class="container">
<h1>Title</h1>
<p>Paragraph content</p>
</div>
"""
result = QwqTag.from_str(xml)
div_tag = result[0]
print(f"Container has {len(div_tag.content)} children")
for child in div_tag.content:
if isinstance(child, QwqTag):
print(f"- {child.name}: {child.content_text}")
Mixed Content (Text + Elements)
# Mixed content with text and nested elements
xml = '<p>Before <strong>bold text</strong> and <em>italic</em> after</p>'
result = QwqTag.from_str(xml)
p_tag = result[0]
print("Content breakdown:")
for item in p_tag.content:
if isinstance(item, str):
print(f" Text: '{item}'")
else:
print(f" Element: <{item.name}>{item.content_text}</{item.name}>")
# Output:
# Text: 'Before'
# Element: <strong>bold text</strong>
# Text: 'and'
# Element: <em>italic</em>
# Text: 'after'
Multiple Root Elements
# Fragment with multiple root elements
xml = '<h1>Title</h1><p>First paragraph</p><p>Second paragraph</p>'
result = QwqTag.from_str(xml)
print(f"Found {len(result)} root elements:")
for tag in result:
print(f"- {tag.name}: {tag.content_text}")
# Output:
# Found 3 root elements:
# - h1: Title
# - p: First paragraph
# - p: Second paragraph
Error Recovery
# Malformed XML/HTML
malformed = '<div><p>Unclosed paragraph<span>Text</div>'
try:
result = QwqTag.from_str(malformed)
print("Successfully parsed malformed XML!")
print(str(result[0]))
except Exception as e:
print(f"Parsing failed: {e}")
Converting Back to String
# Create a tag programmatically
tag = QwqTag(
name="article",
content=["Article content"],
attr={"class": "post", "id": "123"}
)
print(str(tag))
# Output: <article class="post" id="123">Article content</article>
Development
Setup Development Environment
# Clone the repository
git clone https://github.com/yanli/qwq-tag.git
cd qwq-tag
# Install PDM if you haven't already
pip install pdm
# Install dependencies
pdm install
# Install development dependencies
pdm install -G dev
Running Tests
# Run all tests
pdm run test
# Run with coverage
pdm run pytest --cov=qwq_tag tests/
# Run specific test file
pdm run pytest tests/test_qwq_tag.py
Code Quality
# Format code
pdm run fix
# Check code quality
pdm run check
Available Scripts
pdm run test- Run the test suitepdm run fix- Auto-fix code formatting and linting issuespdm run check- Check code formatting and linting without making changes
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Workflow
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for your changes
- Run the test suite (
pdm run test) - Check code quality (
pdm run check) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qwq_tag-0.1.1.tar.gz.
File metadata
- Download URL: qwq_tag-0.1.1.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: pdm/2.25.6 CPython/3.13.5 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca668b60984319dc610a528e7d700230737a96b41d7206936aca8eafb93b0588
|
|
| MD5 |
26288010de4ac79357e2b47f584fd077
|
|
| BLAKE2b-256 |
c35e72fc66edb677f088ba962624a033fb8eddd07147c76993de3e40caeb9261
|
File details
Details for the file qwq_tag-0.1.1-py3-none-any.whl.
File metadata
- Download URL: qwq_tag-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: pdm/2.25.6 CPython/3.13.5 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b053499a32c22e1af7196127a826ee5ef97bc8ca888a22a8caf0464a9a269bcb
|
|
| MD5 |
930fa687299d48440ce65f8da61fe305
|
|
| BLAKE2b-256 |
74730c0a147a3c43f14c3ddbda577dfbb9e80999fc95f8f11f2a7088d10b7cae
|