Skip to main content

Python library for manipulating, creating and editing tmx files

Reason this release was yanked:

Renamed to hypomnema, keeping only v0.4.2 for legacy compat

Project description

python-tmx

PyPI version License: MIT Python 3.12+

The industrial-grade TMX framework for Python.

python-tmx is a strictly typed, policy-driven parser and generator for the TMX 1.4b standard. It provides a robust infrastructure for building Localization and NLP tools, designed to handle messy translation memories without crashing.

🚀 Why this library?

Most TMX parsers are simple XML wrappers. python-tmx is an infrastructure library offering:

  • 🛡️ Policy-Driven Recovery: Configure exactly how to handle errors (missing segments, extra text, invalid tags). Choose between raise, ignore, log, or repair.
  • 🔌 Backend Agnostic: Runs on lxml for speed or standard xml.etree for zero-dependency environments.
  • ✨ Type Safe: Fully annotated with modern Python 3.12+ types. Returns structured Dataclasses, not raw XML nodes.
  • 🏗️ Symmetrical: Deserialize XML to Objects, manipulate them, and Serialize back to XML with roundtrip integrity.

📦 Installation

pip install python-tmx
OR
uv add python-tmx

For maximum performance, install with lxml support and use the LxmlBackend:

pip install "python-tmx[lxml]"
OR
uv add python-tmx[lxml]

⚡ Usage (Low-Level API)

Note: v0.4 exposes the core architecture components. Better docs and high-level convenience facades (load/dump) are coming in v0.5.

1. Deserializing (Reading)

To parse a file, you compose a Backend (the parser) with a Deserializer (the logic).

import xml.etree.ElementTree as ET
from python_tmx.xml.backends.standard import StandardBackend
from python_tmx.xml.deserialization import Deserializer
from python_tmx.base.types import Tmx

# 1. Initialize the Backend
backend = StandardBackend()

# 2. Initialize the Deserializer
deserializer = Deserializer(backend=backend)

# 3. Parse content (using standard ET for I/O in this example)
tree = ET.parse("memory.tmx")
root_element = tree.getroot()

# 4. Deserialize to Python Objects
tmx: Tmx = deserializer.deserialize(root_element)

print(f"Source Language: {tmx.header.srclang}")
for tu in tmx.body:
    print(f"TU: {tu.tuid}")

2. Handling Dirty Data (Policies)

Real-world TMX files are often broken. Configure a DeserializationPolicy to handle errors gracefully.

If not specified, the default policy is strict on purpose to fail fast and prevent silent data corruption.

You can configure also configure the logging level for each policy value independently of its behavior.

from python_tmx.xml.policy import DeserializationPolicy, PolicyValue
from python_tmx.xml.deserialization import Deserializer
import logging

# Configure a permissive policy
policy = DeserializationPolicy()

# If a <tuv> has no <seg>, don't crash -> ignore the error (returns empty content)
policy.missing_seg = PolicyValue("ignore", logging.WARNING)

# If a <tu> has garbage text between tags, ignore it
policy.extra_text = PolicyValue("ignore", logging.INFO)

deserializer = Deserializer(backend=backend, policy=policy)
tmx = deserializer.deserialize(root_element)

3. Serializing (Writing)

from datetime import datetime, timezone
from python_tmx.base.types import Tmx, Header, Tu, Tuv, Segtype
from python_tmx.xml.serialization import Serializer

# 1. Build the object tree
tmx_obj = Tmx(
    version="1.4",
    header=Header(
        creationtool="MyScript",
        creationtoolversion="1.0",
        segtype=Segtype.SENTENCE,
        o_tmf="JSON",
        adminlang="en-US",
        srclang="en-US",
        datatype="plaintext",
        creationdate=datetime.now(timezone.utc)
    ),
    body=[
        Tu(
            tuid="1",
            srclang="en-US",
            variants=[
                Tuv(lang="en-US", content=["Hello World"]),
                Tuv(lang="fr-FR", content=["Bonjour le monde"])
            ]
        )
    ]
)

# 2. Serialize to XML Element
serializer = Serializer(backend=backend)
xml_root = serializer.serialize(tmx_obj)

# 3. Write to file (using backend specifics)
ET.ElementTree(xml_root).write("output.tmx", encoding="utf-8", xml_declaration=True)

🧩 Architecture

The library is built on three decoupled layers:

  1. Backend Layer: Abstracts the XML parser. LxmlBackend (fast, features) vs StandardBackend (portable).
  2. Orchestration Layer: Serializer and Deserializer classes that manage recursion and dispatch.
  3. Handler Layer: Specialized classes (TuvDeserializer, NoteSerializer) that implement the business logic and policy checks for specific TMX elements.

🛠️ Advanced Usage

Working with Mixed Content (Tags)

TMX segments often contain inline markup like placeholders (<ph>) or formatting (<bpt>). python-tmx parses these into a mixed list of strings and objects.

from python_tmx.base.types import Ph, Bpt

# Content is a list of strings and Inline objects
# XML: Hello <ph x="1">Name</ph>
print(variant.content) 
# Output: ["Hello ", Ph(x=1, content=["Name"])]

🤝 Contributing

We welcome contributions!

Before you submit a pull request, please ensure:

  • Your code is fully typed with no Type Error from Pylance in standard mode
  • All tests pass
  • Your code is formatted with ruff using the config
  • Code coverage is 100%

📄 License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_tmx-0.4.1.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_tmx-0.4.1-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file python_tmx-0.4.1.tar.gz.

File metadata

  • Download URL: python_tmx-0.4.1.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for python_tmx-0.4.1.tar.gz
Algorithm Hash digest
SHA256 4aaa949a4ad0357e550abe98602090125a85c136ab3429bbd9e9b6406d59f4e8
MD5 dd8577d18869e6acda716055c184d11a
BLAKE2b-256 033af90572764bc4ebff4227c00a3693d677efb9cc7102860bb6ee17eff5eb6e

See more details on using hashes here.

File details

Details for the file python_tmx-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: python_tmx-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for python_tmx-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0a06d2a09ce81d70f834c2355977e3752c20929052a9f1473952559e34410fc7
MD5 08988175d7cb9d8b3586b460c2147bd4
BLAKE2b-256 6e6b874b9e261fb1be626bb3c2ed79f638d110391f59c800979909612ffbfb85

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page