Skip to main content

No project description provided

Project description

Vexy Markliff

A Python package and CLI tool for bidirectional conversion between Markdown/HTML and XLIFF 2.1 format, enabling high-fidelity localization workflows.

Features

  • Bidirectional Conversion: Seamless Markdown ↔ XLIFF and HTML ↔ XLIFF conversion
  • XLIFF 2.1 Compliant: Full compliance with OASIS XLIFF 2.1 standard
  • Format Style Module: Preserves HTML attributes and structure using fs:fs and fs:subFs
  • ITS 2.0 Support: Native integration with W3C Internationalization Tag Set
  • Flexible Modes: One-document and two-document translation workflows
  • Round-trip Fidelity: Lossless Markdown → XLIFF → Markdown conversion
  • Intelligent Segmentation: Smart sentence splitting for translation units
  • Skeleton Management: External skeleton files for document structure preservation
  • Rich CLI: Comprehensive command-line interface built with Fire
  • Modern Python: Type hints, Pydantic models, and async support

Installation

uv pip install --system vexy-markliff

or

uv add vexy-markliff

Quick Start

CLI Usage

# Convert Markdown to XLIFF
vexy-markliff md2xliff document.md document.xlf

# Convert HTML to XLIFF
vexy-markliff html2xliff page.html page.xlf

# Convert XLIFF back to Markdown
vexy-markliff xliff2md translated.xlf result.md

# Two-document mode (parallel source and target)
vexy-markliff md2xliff --mode=two-doc source.md target.md aligned.xlf

Python API

from vexy_markliff import Config, process_data

summary = process_data(
    ["alpha", "beta", "alpha"],
    config=Config(name="demo", value="count", options={"mode": "summary"}),
)

print(summary["unique"])  # -> 2

Advanced Usage

Configuration

Create a vexy-markliff.yaml configuration file:

source_language: en
target_language: es

markdown:
  extensions:
    - tables
    - footnotes
    - task_lists
  html_passthrough: true

xliff:
  version: "2.1"
  format_style: true
  its_support: true

segmentation:
  split_sentences: true
  sentence_splitter: nltk

Use the configuration:

vexy-markliff md2xliff --config=vexy-markliff.yaml input.md output.xlf

Two-Document Mode

Process parallel source and target documents for alignment:

from vexy_markliff import VexyMarkliff, TwoDocumentMode

converter = VexyMarkliff()

# Load source and target content
with open("source.md", "r") as f:
    source = f.read()
with open("target.md", "r") as f:
    target = f.read()

# Process parallel documents
result = converter.process_parallel(
    source_content=source,
    target_content=target,
    mode=TwoDocumentMode.ALIGNED
)

# Generate XLIFF with aligned segments
xliff_content = result.to_xliff()

Custom Processing Pipeline

from vexy_markliff import Pipeline, MarkdownParser, XLIFFGenerator

# Build custom pipeline
pipeline = Pipeline()
pipeline.add_stage(MarkdownParser())
pipeline.add_stage(CustomProcessor())  # Your custom processor
pipeline.add_stage(XLIFFGenerator())

# Process content
result = pipeline.process(markdown_content)

Supported Formats

Markdown Elements

  • CommonMark compliant base
  • Tables (GitHub Flavored Markdown)
  • Task lists
  • Strikethrough
  • Footnotes
  • Front matter (YAML/TOML)
  • Raw HTML passthrough

HTML Elements

  • All HTML5 structural elements
  • Text content elements (p, h1-h6, etc.)
  • Inline formatting (strong, em, a, etc.)
  • Tables with complex structures
  • Forms and inputs
  • Media elements (img, video, audio)
  • Web Components and custom elements

XLIFF Features

  • XLIFF 2.1 Core compliance
  • Format Style (fs) module for attribute preservation
  • ITS 2.0 metadata support
  • Translation unit notes
  • Preserve space handling
  • External skeleton files
  • Inline element protection

How It Works

  1. Parsing: Markdown is parsed using markdown-it-py, HTML using lxml
  2. HTML Conversion: Markdown is converted to HTML as intermediate format
  3. Content Extraction: Translatable content is identified and extracted
  4. Structure Preservation: Document structure is stored in skeleton files
  5. XLIFF Generation: Content is formatted as XLIFF 2.1 with Format Style attributes
  6. Round-trip: Translated XLIFF is merged with skeleton to reconstruct the original format

Development

This project uses Hatch for development workflow management.

Setup Development Environment

# Install hatch if you haven't already
pip install hatch

# Create and activate development environment
hatch shell

# Run tests
hatch run test

# Run tests with coverage
hatch run test-cov

# Run linting
hatch run lint

# Format code
hatch run format

Testing

# Run all tests (preferred)
uvx hatch run test

# Run with coverage
uvx hatch run test-cov

# Underlying command if hatch env already active
python -m pytest

# Run specific test file
python -m pytest tests/test_markdown_parser.py

# Run with verbose output
python -m pytest -xvs

Documentation

Full documentation is available in the docs/ folder:

  • 500-intro.md - Introduction to HTML-XLIFF handling
  • 510-512-prefs-html*.md - HTML element handling specifications
  • 513-prefs-md.md - Markdown element handling specifications
  • 530-vexy-markliff-spec.md - Complete technical specification

Contributing

Contributions are welcome! Please ensure:

  1. All tests pass
  2. Code follows PEP 8 style guidelines
  3. Type hints are provided
  4. Documentation is updated

License

MIT License

Acknowledgments

Built on the XLIFF 2.1 OASIS standard and leverages:

  • markdown-it-py for Markdown parsing
  • lxml for XML/HTML processing
  • Fire for CLI interface
  • Pydantic for data validation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vexy_markliff-1.2.1.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vexy_markliff-1.2.1-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file vexy_markliff-1.2.1.tar.gz.

File metadata

  • Download URL: vexy_markliff-1.2.1.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for vexy_markliff-1.2.1.tar.gz
Algorithm Hash digest
SHA256 0dc4a7d267e3ed68bcf36aff2c668f2c451e063d11a7751ad6d0e9ea15338850
MD5 4ec3273dea4b1fc6ecbeba5c5cf10bff
BLAKE2b-256 f5b8a89a192cf51ad01af8006eda4ea0678416d81dc7c558b7f671108ba134f0

See more details on using hashes here.

File details

Details for the file vexy_markliff-1.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for vexy_markliff-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2803a99611633cc07baa53fc81c4b6c1cc1424ae2c981f4dbebf39079cc2c71f
MD5 a52d118630f8a4b4253c5d128dbb698d
BLAKE2b-256 0cf12dfac9e46df28c19cb184ea72f8bc29358d0f14e1ec60985ff0ebe98453e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page