Skip to main content

QwQ Tag for parsing xml

Project description

QwQ Tag

Python 3.10+ License: MIT

A lightweight Python library for parsing XML/HTML content into structured, type-safe objects using Pydantic models. qwq-tag provides a simple and intuitive way to work with XML/HTML data while maintaining strong type safety and validation.

Features

  • 🚀 Simple API: Parse XML/HTML strings with a single method call
  • 🔒 Type Safety: Built on Pydantic for robust data validation and type hints
  • 🌐 Flexible Parsing: Handles malformed XML/HTML with recovery parsing
  • 📦 Lightweight: Minimal dependencies (only lxml and pydantic)
  • 🎯 Mixed Content Support: Properly handles text and nested elements
  • 🔄 Multiple Root Elements: Can parse fragments with multiple top-level elements
  • 🧹 Clean Output: Automatically handles whitespace normalization

Installation

Using pip

pip install qwq-tag

Using PDM

pdm add qwq-tag

Requirements

  • Python 3.10+
  • lxml >= 6.0.0
  • pydantic >= 2.11.7

Quick Start

from qwq_tag import QwqTag

# Parse simple XML
html = '<div class="container">Hello World</div>'
tags = QwqTag.from_str(html)

# Access the parsed content
tag = tags[0]
print(tag.name)           # "div"
print(tag.content)        # ["Hello World"]
print(tag.attr)           # {"class": "container"}
print(tag.content_text)   # "Hello World"

Usage Examples

Basic XML Parsing

from qwq_tag import QwqTag

# Simple element with attributes
xml = '<p class="text" id="intro">Hello World</p>'
result = QwqTag.from_str(xml)

tag = result[0]
print(f"Tag: {tag.name}")                    # Tag: p
print(f"Content: {tag.content}")             # Content: ['Hello World']
print(f"Class: {tag.attr['class']}")         # Class: text
print(f"ID: {tag.attr['id']}")               # ID: intro

Nested Elements

# Nested structure
xml = """
<div class="container">
    <h1>Title</h1>
    <p>Paragraph content</p>
</div>
"""

result = QwqTag.from_str(xml)
div_tag = result[0]

print(f"Container has {len(div_tag.content)} children")
for child in div_tag.content:
    if isinstance(child, QwqTag):
        print(f"- {child.name}: {child.content_text}")

Mixed Content (Text + Elements)

# Mixed content with text and nested elements
xml = '<p>Before <strong>bold text</strong> and <em>italic</em> after</p>'
result = QwqTag.from_str(xml)

p_tag = result[0]
print("Content breakdown:")
for item in p_tag.content:
    if isinstance(item, str):
        print(f"  Text: '{item}'")
    else:
        print(f"  Element: <{item.name}>{item.content_text}</{item.name}>")

# Output:
# Text: 'Before'
# Element: <strong>bold text</strong>
# Text: 'and'
# Element: <em>italic</em>
# Text: 'after'

Multiple Root Elements

# Fragment with multiple root elements
xml = '<h1>Title</h1><p>First paragraph</p><p>Second paragraph</p>'
result = QwqTag.from_str(xml)

print(f"Found {len(result)} root elements:")
for tag in result:
    print(f"- {tag.name}: {tag.content_text}")

# Output:
# Found 3 root elements:
# - h1: Title
# - p: First paragraph
# - p: Second paragraph

Error Recovery

# Malformed XML/HTML
malformed = '<div><p>Unclosed paragraph<span>Text</div>'
try:
    result = QwqTag.from_str(malformed)
    print("Successfully parsed malformed XML!")
    print(str(result[0]))
except Exception as e:
    print(f"Parsing failed: {e}")

Converting Back to String

# Create a tag programmatically
tag = QwqTag(
    name="article",
    content=["Article content"],
    attr={"class": "post", "id": "123"}
)

print(str(tag))
# Output: <article class="post" id="123">Article content</article>

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/yanli/qwq-tag.git
cd qwq-tag

# Install PDM if you haven't already
pip install pdm

# Install dependencies
pdm install

# Install development dependencies
pdm install -G dev

Running Tests

# Run all tests
pdm run test

# Run with coverage
pdm run pytest --cov=qwq_tag tests/

# Run specific test file
pdm run pytest tests/test_qwq_tag.py

Code Quality

# Format code
pdm run fix

# Check code quality
pdm run check

Available Scripts

  • pdm run test - Run the test suite
  • pdm run fix - Auto-fix code formatting and linting issues
  • pdm run check - Check code formatting and linting without making changes

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Workflow

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests for your changes
  5. Run the test suite (pdm run test)
  6. Check code quality (pdm run check)
  7. Commit your changes (git commit -m 'Add amazing feature')
  8. Push to the branch (git push origin feature/amazing-feature)
  9. Open a Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwq_tag-0.1.2.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qwq_tag-0.1.2-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file qwq_tag-0.1.2.tar.gz.

File metadata

  • Download URL: qwq_tag-0.1.2.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: pdm/2.25.6 CPython/3.13.5 Linux/6.11.0-1018-azure

File hashes

Hashes for qwq_tag-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e5c0d89de168a5b0057c25dd9a6ad00ae6bfc7ea488baa2ecb217cd0e97be84d
MD5 4f4d11e20df25860197b96fa6f93a0b1
BLAKE2b-256 1b2c81fc7afeb777a017dae72ae536005645b630c5249e9a31e9f50ede274990

See more details on using hashes here.

File details

Details for the file qwq_tag-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: qwq_tag-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: pdm/2.25.6 CPython/3.13.5 Linux/6.11.0-1018-azure

File hashes

Hashes for qwq_tag-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c8dada889c92101ad14dbdae366d9bb31fb080f9f38d9daeaa66b1b4b9e2a79d
MD5 a43cd19b7d856bf6822755ed260cadf4
BLAKE2b-256 d154e3d4e4e5898e6a976236ad8ec6e963664aec52f4ffe41b2fbafcbb32d725

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page