Python library for manipulating, creating and editing tmx files

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.14

Project description

Hypomnema

Industrial-grade TMX 1.4b parsing and serialization for Python.

Hypomnema is a strictly typed infrastructure library for working with TMX 1.4b (Translation Memory eXchange) files. It is designed as a foundation for building localization tools, CAT software, and NLP pipelines.

Warning: Hypomnema is pre-1.0 software. Expect breaking changes without notice until version 1.0.0.

TMX 1.4b Level 2 Compliance

Hypomnema is the only Python library that fully implements the TMX 1.4b Level 2 specification. Key capabilities include:

Arbitrary Nesting Depth: No arbitrary limits on inline element nesting. <bpt>/<ept> pairs, <ph> placeholders, and <sub> elements can be nested to any depth, matching the full expressiveness of the TMX 1.4b spec.
Complete Inline Element Support: All six inline markup elements (<bpt>, <ept>, <it>, <ph>, <hi>, <sub>) with proper handling of mixed content (text and elements intermixed).
Full Attribute Modeling: Every TMX attribute is modeled with proper types, including enumerations for segtype, pos, and assoc.
Metadata Preservation: Properties (<prop>) and notes (<note>) are fully supported at all valid nesting levels.

Intentionally Omitted Elements

The following TMX 1.4b elements are not implemented:

<ude> — User Defined Encoding (custom encoding handling)
<map> — Character mapping

These elements relate to custom encodings and character mapping. They are rarely encountered in practice and were excluded to keep the library focused and maintainable. If you need support for them, you can implement custom handlers by subclassing the existing handler classes — the architecture is designed to be extensible. See the source code in xml/deserialization/_handlers.py and xml/serialization/_handlers.py for patterns to follow.

What is TMX?

TMX (Translation Memory eXchange) is an open XML standard for exchanging translation memory data between tools and providers. A TMX file contains translation units (TU) with source and target language variants (TUV), each containing segmented text. TMX files often include inline markup for formatting, placeholders, and tags that must be preserved during processing.

Why Hypomnema?

Most TMX parsers are simple XML wrappers. Hypomnema offers:

Policy-Driven Error Handling: Configure exactly how to handle malformed data (missing segments, extra text, invalid tags, etc.)
Backend Agnostic: Use lxml for speed or standard library xml.etree for zero-dependency deployments
Full Type Safety: Modern Python 3.14+ type annotations with structured dataclasses, not raw XML nodes
Roundtrip Integrity: Deserialize to objects, manipulate, serialize back — with optional validation at each step
Streaming API: Process large TMX files element-by-element without loading everything into memory

Installation

pip install hypomnema
# or
uv add hypomnema

For maximum performance with large files:

pip install "hypomnema[lxml]"
# or
uv add hypomnema[lxml]

Quick Start

import hypomnema as hm

# Load a TMX file
tmx = hm.load("translations.tmx")

# Inspect the content
print(f"Source language: {tmx.header.srclang}")
print(f"Translation units: {len(tmx.body)}")

# Find a specific translation unit
for tu in tmx.body:
    for tuv in tu.variants:
        if tuv.lang == "fr":
            print(f"French: {tuv.content}")

High-Level API

The load() and save() functions provide the simplest interface for common tasks:

import hypomnema as hm

# Load entire file
tmx = hm.load("input.tmx")

# Filter loading - only get translation units (streaming, memory efficient)
for tu in hm.load("large.tmx", filter="tu"):
    print(tu.tuid)

# Load specific element types
for element in hm.load("file.tmx", filter=["tu", "header"]):
    if isinstance(element, hm.Header):
        print(element.creationtool)

# Save back to disk
hm.save(tmx, "output.tmx")

# Specify encoding
tmx = hm.load("file.tmx", encoding="utf-16")
hm.save(tmx, "output.tmx", encoding="utf-16")

Low-Level API

For finer control over parsing and serialization, use the Deserializer and Serializer classes directly:

import hypomnema as hm

# Choose your backend
backend = hm.LxmlBackend()   # Fast, feature-rich
# or
backend = hm.StandardBackend()  # Portable, stdlib only

# Deserialize
deserializer = hm.Deserializer(backend=backend)
xml_tree = backend.parse("file.tmx")
tmx = deserializer.deserialize(xml_tree)

# Manipulate the object model
new_tuv = hm.create_tuv("de", content=["Guten Tag"])
new_tu = hm.create_tu(variants=[new_tuv])
tmx.body.append(new_tu)

# Serialize back
serializer = hm.Serializer(backend=backend)
xml_element = serializer.serialize(tmx)

# Write to file
backend.write(xml_element, "output.tmx")

Policy Configuration

Real-world TMX files are often imperfect. Policies let you configure how Hypomnema handles validation errors:

import hypomnema as hm
from hypomnema.xml.policy import PolicyValue
import logging

# Configure deserialization policy
policy = hm.DeserializationPolicy(
    missing_seg=PolicyValue("ignore", logging.WARNING),
    extra_text=PolicyValue("ignore", logging.INFO),
    invalid_attribute_value=PolicyValue("ignore", logging.DEBUG),
)

# Use custom policy when loading
tmx = hm.load("messy.tmx", policy=policy)

# Configure serialization policy
serial_policy = hm.SerializationPolicy(
    required_attribute_missing=PolicyValue("ignore", logging.ERROR),
)

hm.save(tmx, "clean.tmx", policy=serial_policy)

Available deserialization policies:

missing_handler: No handler for element type
invalid_tag: Unexpected XML tag encountered
required_attribute_missing: Mandatory TMX attribute absent
invalid_attribute_value: Attribute violates TMX spec
extra_text: Unexpected text within elements
invalid_child_element: Child not permitted by TMX structure
multiple_headers: Multiple <header> elements
missing_header: Mandatory <header> missing
missing_seg: <tu>/<tuv> missing required <seg>
multiple_seg: <tuv> has multiple <seg> elements
empty_content: Element has no text content

Available serialization policies:

required_attribute_missing: Mandatory dataclass field is None
invalid_attribute_type: Field type incompatible with XML
invalid_content_type: Content is not a string
missing_handler: No serializer for dataclass type
invalid_object_type: Handler received unexpected type
invalid_child_element: Child invalid for parent element

Creating TMX from Scratch

import hypomnema as hm
from datetime import datetime

# Create a header with metadata
header = hm.create_header(
    srclang="en",
    creationtool="my-tool",
    segtype=hm.Segtype.SENTENCE,
)

# Create a segment with complex nested inline markup demonstrating arbitrary depth
# This example shows the full expressiveness of TMX 1.4b Level 2
segment_content = [
    "Click the ",
    hm.create_bpt(
        i=1,
        type="link",
        x=100,
        content=[
            hm.create_sub(
                content=["here", hm.create_hi(content=["important"])],
                datatype="text",
            )
        ],
    ),
    " button to proceed. ",
    "For special cases, use ",
    hm.create_ph(
        assoc=hm.Assoc.B,
        type="variable",
        x=200,
        content=[
            hm.create_sub(
                content=[
                    "the ",
                    hm.create_bpt(i=2, type="emphasis", content=[hm.create_sub(content=["default"])]),
                    " value",
                ],
                datatype="text",
            )
        ],
    ),
    ". ",
    "End of ",
    hm.create_it(pos=hm.Pos.BEGIN, type="closing", x=300),
    "document",
    hm.create_it(pos=hm.Pos.END, type="closing", x=300),
    ".",
]

source_tuv = hm.create_tuv("en", content=segment_content)

# Create target with equivalent nested structure
target_content = [
    "Cliquez sur le ",
    hm.create_bpt(
        i=1,
        type="lien",
        x=100,
        content=[
            hm.create_sub(
                content=["ici", hm.create_hi(content=["important"])],
                datatype="text",
            )
        ],
    ),
    " pour continuer. ",
    "Pour les cas spéciaux, utilisez ",
    hm.create_ph(
        assoc=hm.Assoc.B,
        type="variable",
        x=200,
        content=[
            hm.create_sub(
                content=[
                    "la ",
                    hm.create_bpt(i=2, type="emphasis", content=[hm.create_sub(content=["valeur par défaut"])]),
                    " valeur",
                ],
                datatype="text",
            )
        ],
    ),
    ". ",
    "Fin du ",
    hm.create_it(pos=hm.Pos.BEGIN, type="closing", x=300),
    "document",
    hm.create_it(pos=hm.Pos.END, type="closing", x=300),
    ".",
]

target_tuv = hm.create_tuv("fr", content=target_content)

# Create a translation unit with metadata
tu = hm.create_tu(
    tuid="complex-nesting-001",
    srclang="en",
    variants=[source_tuv, target_tuv],
    props=[
        hm.create_prop("customer", "acme-corp"),
        hm.create_prop("domain", "technical"),
    ],
    notes=[hm.create_note("Demonstrates full TMX 1.4b Level 2 nesting support")],
)

# Assemble the TMX
tmx = hm.create_tmx(header=header, body=[tu])

# Save
hm.save(tmx, "complex.tmx")

# Verify the nesting structure
print(f"Source TUV has {len(source_tuv.content)} content elements")
print(f"Target TUV has {len(target_tuv.content)} content elements")

# Inspect the nested structure programmatically
def inspect_content(content, indent=0):
    prefix = "  " * indent
    for item in content:
        if isinstance(item, str):
            print(f"{prefix}Text: {repr(item[:50])}...")
        else:
            print(f"{prefix}{item.__class__.__name__}")
            if hasattr(item, 'content') and item.content:
                inspect_content(item.content, indent + 1)

print("\nSource content structure:")
inspect_content(source_tuv.content)

Architecture

Hypomnema is built on three decoupled layers:

Backend Layer (hypomnema.xml.backends)
- Abstracts the XML parser implementation
- LxmlBackend: Fast, feature-rich (requires lxml)
- StandardBackend: Portable, stdlib only
- Implement XmlBackend to add custom backends
Orchestration Layer (hypomnema.xml)
- Serializer: Converts Python objects to XML
- Deserializer: Converts XML to Python objects
- Manages recursion and dispatches to handlers
Handler Layer
- Specialized classes for each TMX element type
- Implement business logic and policy checks
- Examples: NoteSerializer, PropDeserializer

Supported Elements

Hypomnema implements the complete TMX 1.4b object model:

Structural elements: Tmx, Header, Tu (Translation Unit), Tuv (Translation Unit Variant)

Inline elements: Bpt (Begin Paired Tag), Ept (End Paired Tag), It (Isolated Tag), Ph (Placeholder), Hi (Highlight), Sub (Sub-flow)

Auxiliary elements: Prop (Property), Note (Annotation)

Enumerations: Segtype (segmentation level), Pos (tag position), Assoc (placeholder association)

Terminology Reference

See TERMINOLOGY.md for a quick reference of TMX 1.4b terminology used throughout the library.

Contributing

Contributions are welcome. Please read the TMX 1.4b specification first — it is essential understanding for any changes to this library.

License

MIT

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.14

Release history Release notifications | RSS feed

0.8

Apr 9, 2026

0.7

Feb 25, 2026

0.6

Jan 28, 2026

This version

0.5.0

Jan 15, 2026

0.4.4

Dec 19, 2025

0.4.3

Dec 15, 2025

0.4.2

Dec 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypomnema-0.5.0.tar.gz (41.8 kB view details)

Uploaded Jan 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hypomnema-0.5.0-py3-none-any.whl (46.6 kB view details)

Uploaded Jan 15, 2026 Python 3

File details

Details for the file hypomnema-0.5.0.tar.gz.

File metadata

Download URL: hypomnema-0.5.0.tar.gz
Upload date: Jan 15, 2026
Size: 41.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.25 {"installer":{"name":"uv","version":"0.9.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hypomnema-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`98b5d90d4205f49afa41edc5c73a30d128adef51c592f227908485888cd1fd03`
MD5	`307758dcf4577381a7de1f38142c24e5`
BLAKE2b-256	`a9c470e807f615ec3e08e3da595e893a8b517905ff4a3fffd21fb7757b70318f`

See more details on using hashes here.

File details

Details for the file hypomnema-0.5.0-py3-none-any.whl.

File metadata

Download URL: hypomnema-0.5.0-py3-none-any.whl
Upload date: Jan 15, 2026
Size: 46.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.25 {"installer":{"name":"uv","version":"0.9.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hypomnema-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ee541357778268f4c84726558438e5da4ac0d47db6db95222a4a6a4583fc7dfc`
MD5	`a486493c2f20c8a2fb324228afbe97a2`
BLAKE2b-256	`e8acaed231cec17fe312719c0397c2e3d30bed23f9a4448000b9757ffc815417`

See more details on using hashes here.

hypomnema 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hypomnema

TMX 1.4b Level 2 Compliance

Intentionally Omitted Elements

What is TMX?

Why Hypomnema?

Installation

Quick Start

High-Level API

Low-Level API

Policy Configuration

Creating TMX from Scratch

Architecture

Supported Elements

Terminology Reference

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes