No project description provided
Project description
Vexy Markliff
A Python package and CLI tool for bidirectional conversion between Markdown/HTML and XLIFF 2.1 format, enabling high-fidelity localization workflows.
Features
- Bidirectional Conversion: Seamless Markdown ↔ XLIFF and HTML ↔ XLIFF conversion
- XLIFF 2.1 Compliant: Full compliance with OASIS XLIFF 2.1 standard
- Format Style Module: Preserves HTML attributes and structure using fs:fs and fs:subFs
- ITS 2.0 Support: Native integration with W3C Internationalization Tag Set
- Flexible Modes: One-document and two-document translation workflows
- Round-trip Fidelity: Lossless Markdown → XLIFF → Markdown conversion
- Intelligent Segmentation: Smart sentence splitting for translation units
- Skeleton Management: External skeleton files for document structure preservation
- Rich CLI: Comprehensive command-line interface built with Fire
- Modern Python: Type hints, Pydantic models, and async support
Installation
uv pip install --system vexy-markliff
or
uv add vexy-markliff
Quick Start
CLI Usage
# Convert Markdown to XLIFF
vexy-markliff md2xliff document.md document.xlf
# Convert HTML to XLIFF
vexy-markliff html2xliff page.html page.xlf
# Convert XLIFF back to Markdown
vexy-markliff xliff2md translated.xlf result.md
# Two-document mode (parallel source and target)
vexy-markliff md2xliff --mode=two-doc source.md target.md aligned.xlf
Python API
from vexy_markliff import Config, process_data
summary = process_data(
["alpha", "beta", "alpha"],
config=Config(name="demo", value="count", options={"mode": "summary"}),
)
print(summary["unique"]) # -> 2
Advanced Usage
Configuration
Create a vexy-markliff.yaml configuration file:
source_language: en
target_language: es
markdown:
extensions:
- tables
- footnotes
- task_lists
html_passthrough: true
xliff:
version: "2.1"
format_style: true
its_support: true
segmentation:
split_sentences: true
sentence_splitter: nltk
Use the configuration:
vexy-markliff md2xliff --config=vexy-markliff.yaml input.md output.xlf
Two-Document Mode
Process parallel source and target documents for alignment:
from vexy_markliff import VexyMarkliff, TwoDocumentMode
converter = VexyMarkliff()
# Load source and target content
with open("source.md", "r") as f:
source = f.read()
with open("target.md", "r") as f:
target = f.read()
# Process parallel documents
result = converter.process_parallel(
source_content=source,
target_content=target,
mode=TwoDocumentMode.ALIGNED
)
# Generate XLIFF with aligned segments
xliff_content = result.to_xliff()
Custom Processing Pipeline
from vexy_markliff import Pipeline, MarkdownParser, XLIFFGenerator
# Build custom pipeline
pipeline = Pipeline()
pipeline.add_stage(MarkdownParser())
pipeline.add_stage(CustomProcessor()) # Your custom processor
pipeline.add_stage(XLIFFGenerator())
# Process content
result = pipeline.process(markdown_content)
Supported Formats
Markdown Elements
- CommonMark compliant base
- Tables (GitHub Flavored Markdown)
- Task lists
- Strikethrough
- Footnotes
- Front matter (YAML/TOML)
- Raw HTML passthrough
HTML Elements
- All HTML5 structural elements
- Text content elements (p, h1-h6, etc.)
- Inline formatting (strong, em, a, etc.)
- Tables with complex structures
- Forms and inputs
- Media elements (img, video, audio)
- Web Components and custom elements
XLIFF Features
- XLIFF 2.1 Core compliance
- Format Style (fs) module for attribute preservation
- ITS 2.0 metadata support
- Translation unit notes
- Preserve space handling
- External skeleton files
- Inline element protection
How It Works
- Parsing: Markdown is parsed using markdown-it-py, HTML using lxml
- HTML Conversion: Markdown is converted to HTML as intermediate format
- Content Extraction: Translatable content is identified and extracted
- Structure Preservation: Document structure is stored in skeleton files
- XLIFF Generation: Content is formatted as XLIFF 2.1 with Format Style attributes
- Round-trip: Translated XLIFF is merged with skeleton to reconstruct the original format
Development
This project uses Hatch for development workflow management.
Setup Development Environment
# Install hatch if you haven't already
pip install hatch
# Create and activate development environment
hatch shell
# Run tests
hatch run test
# Run tests with coverage
hatch run test-cov
# Run linting
hatch run lint
# Format code
hatch run format
Testing
# Run all tests (preferred)
uvx hatch run test
# Run with coverage
uvx hatch run test-cov
# Underlying command if hatch env already active
python -m pytest
# Run specific test file
python -m pytest tests/test_markdown_parser.py
# Run with verbose output
python -m pytest -xvs
Documentation
Full documentation is available in the docs/ folder:
500-intro.md- Introduction to HTML-XLIFF handling510-512-prefs-html*.md- HTML element handling specifications513-prefs-md.md- Markdown element handling specifications530-vexy-markliff-spec.md- Complete technical specification
Contributing
Contributions are welcome! Please ensure:
- All tests pass
- Code follows PEP 8 style guidelines
- Type hints are provided
- Documentation is updated
License
MIT License
Acknowledgments
Built on the XLIFF 2.1 OASIS standard and leverages:
- markdown-it-py for Markdown parsing
- lxml for XML/HTML processing
- Fire for CLI interface
- Pydantic for data validation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vexy_markliff-1.3.1.tar.gz.
File metadata
- Download URL: vexy_markliff-1.3.1.tar.gz
- Upload date:
- Size: 11.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e60d1b6def448d7893ca2fef620b341e770c5d68f3e04b2619c036d232806fc
|
|
| MD5 |
c6ff8f35b6995f0596832eb47079f48c
|
|
| BLAKE2b-256 |
3221a938926546a4d35d57abf9a2f2b8eb19c444f308898e6f6ba3925a22a315
|
File details
Details for the file vexy_markliff-1.3.1-py3-none-any.whl.
File metadata
- Download URL: vexy_markliff-1.3.1-py3-none-any.whl
- Upload date:
- Size: 17.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a541a7b40116da613295ac79c2b0e9a4072e0d514901bb5a4a9f6310dc916ab
|
|
| MD5 |
2d892a0607b1e4d832974343f2f1c817
|
|
| BLAKE2b-256 |
5ce5890a677e4597cf6668c57d6b7a985ce1997eca0ff637f9ee954c898c146f
|