Skip to main content

No project description provided

Project description

Vexy Markliff

A Python package and CLI tool for bidirectional conversion between Markdown/HTML and XLIFF 2.1 format, enabling high-fidelity localization workflows.

Features

  • Bidirectional Conversion: Seamless Markdown ↔ XLIFF and HTML ↔ XLIFF conversion
  • XLIFF 2.1 Compliant: Full compliance with OASIS XLIFF 2.1 standard
  • Format Style Module: Preserves HTML attributes and structure using fs:fs and fs:subFs
  • ITS 2.0 Support: Native integration with W3C Internationalization Tag Set
  • Flexible Modes: One-document and two-document translation workflows
  • Round-trip Fidelity: Lossless Markdown → XLIFF → Markdown conversion
  • Intelligent Segmentation: Smart sentence splitting for translation units
  • Skeleton Management: External skeleton files for document structure preservation
  • Rich CLI: Comprehensive command-line interface built with Fire
  • Modern Python: Type hints, Pydantic models, and async support

Installation

uv pip install --system vexy-markliff

or

uv add vexy-markliff

Quick Start

CLI Usage

# Convert Markdown to XLIFF
vexy-markliff md2xliff document.md document.xlf

# Convert HTML to XLIFF
vexy-markliff html2xliff page.html page.xlf

# Convert XLIFF back to Markdown
vexy-markliff xliff2md translated.xlf result.md

# Two-document mode (parallel source and target)
vexy-markliff md2xliff --mode=two-doc source.md target.md aligned.xlf

Python API

from vexy_markliff import Config, process_data

summary = process_data(
    ["alpha", "beta", "alpha"],
    config=Config(name="demo", value="count", options={"mode": "summary"}),
)

print(summary["unique"])  # -> 2

Advanced Usage

Configuration

Create a vexy-markliff.yaml configuration file:

source_language: en
target_language: es

markdown:
  extensions:
    - tables
    - footnotes
    - task_lists
  html_passthrough: true

xliff:
  version: "2.1"
  format_style: true
  its_support: true

segmentation:
  split_sentences: true
  sentence_splitter: nltk

Use the configuration:

vexy-markliff md2xliff --config=vexy-markliff.yaml input.md output.xlf

Two-Document Mode

Process parallel source and target documents for alignment:

from vexy_markliff import VexyMarkliff, TwoDocumentMode

converter = VexyMarkliff()

# Load source and target content
with open("source.md", "r") as f:
    source = f.read()
with open("target.md", "r") as f:
    target = f.read()

# Process parallel documents
result = converter.process_parallel(
    source_content=source,
    target_content=target,
    mode=TwoDocumentMode.ALIGNED
)

# Generate XLIFF with aligned segments
xliff_content = result.to_xliff()

Custom Processing Pipeline

from vexy_markliff import Pipeline, MarkdownParser, XLIFFGenerator

# Build custom pipeline
pipeline = Pipeline()
pipeline.add_stage(MarkdownParser())
pipeline.add_stage(CustomProcessor())  # Your custom processor
pipeline.add_stage(XLIFFGenerator())

# Process content
result = pipeline.process(markdown_content)

Supported Formats

Markdown Elements

  • CommonMark compliant base
  • Tables (GitHub Flavored Markdown)
  • Task lists
  • Strikethrough
  • Footnotes
  • Front matter (YAML/TOML)
  • Raw HTML passthrough

HTML Elements

  • All HTML5 structural elements
  • Text content elements (p, h1-h6, etc.)
  • Inline formatting (strong, em, a, etc.)
  • Tables with complex structures
  • Forms and inputs
  • Media elements (img, video, audio)
  • Web Components and custom elements

XLIFF Features

  • XLIFF 2.1 Core compliance
  • Format Style (fs) module for attribute preservation
  • ITS 2.0 metadata support
  • Translation unit notes
  • Preserve space handling
  • External skeleton files
  • Inline element protection

How It Works

  1. Parsing: Markdown is parsed using markdown-it-py, HTML using lxml
  2. HTML Conversion: Markdown is converted to HTML as intermediate format
  3. Content Extraction: Translatable content is identified and extracted
  4. Structure Preservation: Document structure is stored in skeleton files
  5. XLIFF Generation: Content is formatted as XLIFF 2.1 with Format Style attributes
  6. Round-trip: Translated XLIFF is merged with skeleton to reconstruct the original format

Development

This project uses Hatch for development workflow management.

Setup Development Environment

# Install hatch if you haven't already
pip install hatch

# Create and activate development environment
hatch shell

# Run tests
hatch run test

# Run tests with coverage
hatch run test-cov

# Run linting
hatch run lint

# Format code
hatch run format

Testing

# Run all tests (preferred)
uvx hatch run test

# Run with coverage
uvx hatch run test-cov

# Underlying command if hatch env already active
python -m pytest

# Run specific test file
python -m pytest tests/test_markdown_parser.py

# Run with verbose output
python -m pytest -xvs

Documentation

Full documentation is available in the docs/ folder:

  • 500-intro.md - Introduction to HTML-XLIFF handling
  • 510-512-prefs-html*.md - HTML element handling specifications
  • 513-prefs-md.md - Markdown element handling specifications
  • 530-vexy-markliff-spec.md - Complete technical specification

Contributing

Contributions are welcome! Please ensure:

  1. All tests pass
  2. Code follows PEP 8 style guidelines
  3. Type hints are provided
  4. Documentation is updated

License

MIT License

Acknowledgments

Built on the XLIFF 2.1 OASIS standard and leverages:

  • markdown-it-py for Markdown parsing
  • lxml for XML/HTML processing
  • Fire for CLI interface
  • Pydantic for data validation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vexy_markliff-1.3.1.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vexy_markliff-1.3.1-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file vexy_markliff-1.3.1.tar.gz.

File metadata

  • Download URL: vexy_markliff-1.3.1.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for vexy_markliff-1.3.1.tar.gz
Algorithm Hash digest
SHA256 6e60d1b6def448d7893ca2fef620b341e770c5d68f3e04b2619c036d232806fc
MD5 c6ff8f35b6995f0596832eb47079f48c
BLAKE2b-256 3221a938926546a4d35d57abf9a2f2b8eb19c444f308898e6f6ba3925a22a315

See more details on using hashes here.

File details

Details for the file vexy_markliff-1.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for vexy_markliff-1.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9a541a7b40116da613295ac79c2b0e9a4072e0d514901bb5a4a9f6310dc916ab
MD5 2d892a0607b1e4d832974343f2f1c817
BLAKE2b-256 5ce5890a677e4597cf6668c57d6b7a985ce1997eca0ff637f9ee954c898c146f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page