Python library for manipulating, creating and editing tmx files

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.13
- Python :: 3.14

Project description

Hypomnema

Industrial-grade TMX 1.4b parsing and serialization for Python.

Hypomnema is a strictly typed infrastructure library for working with TMX 1.4b (Translation Memory eXchange) files. It is designed as a foundation for building localization tools, CAT software, and NLP pipelines.

Warning: Hypomnema is pre-1.0 software. Expect breaking changes without notice until version 1.0.0.

Why Hypomnema?

Most TMX parsers are simple XML wrappers. Hypomnema offers:

Full TMX 1.4b Level 2 Compliance: Arbitrary inline element nesting depth, complete attribute modeling
Policy-Driven Error Handling: Configure exactly how to handle malformed data
Backend Agnostic: Use lxml for speed or standard library xml.etree for zero-dependency deployments
Full Type Safety: Modern Python 3.13+ type annotations with structured dataclasses
Roundtrip Integrity: Deserialize to objects, manipulate, serialize back
Streaming API: Process large TMX files element-by-element without loading everything into memory

What is TMX?

TMX (Translation Memory eXchange) is an open XML standard for exchanging translation memory data between tools and providers. A TMX file contains translation units (TU) with source and target language variants (TUV), each containing segmented text. TMX files often include inline markup for formatting, placeholders, and tags that must be preserved during processing.

Installation

pip install hypomnema
# or
uv add hypomnema

For maximum performance with large files:

pip install "hypomnema[lxml]"
# or
uv add hypomnema[lxml]

Quick Start

import hypomnema as hm

# Load a TMX file
tmx = hm.load("translations.tmx")

# Inspect the content
print(f"Source language: {tmx.header.srclang}")
print(f"Translation units: {len(tmx.body)}")

# Find a specific translation unit
for tu in tmx.body:
    for tuv in tu.variants:
        if tuv.lang == "fr":
            print(f"French: {tuv.text}")

# Save changes
hm.dump(tmx, "output.tmx")

High-Level API

Loading Files

import hypomnema as hm

# Load entire file
tmx = hm.load("input.tmx")

# Streaming: yield translation units one at a time (memory efficient)
for tu in hm.load("large.tmx", filter="tu"):
    print(tu.tuid)

# Filter multiple element types
for element in hm.load("file.tmx", filter=["tu", "prop"]):
    if isinstance(element, hm.Tu):
        print(element.creationtool)
    else:
        print(element.type)

# Specify encoding
tmx = hm.load("file.tmx", encoding="utf-16")

Saving Files

import hypomnema as hm

hm.dump(tmx, "output.tmx")
hm.dump(tmx, "output.tmx", encoding="utf-16")

Element Creation

Convenience functions for creating TMX elements:

import hypomnema as hm

# Structural elements
header = hm.create_header(srclang="en", creationtool="my-tool")
tuv = hm.create_tuv("en", content=["Hello"])
tu = hm.create_tu(tuid="001", variants=[tuv])
tmx = hm.create_tmx(header=header, body=[tu])

# Inline elements
bpt = hm.create_bpt(i=1, type="bold", content=["text"])
ept = hm.create_ept(i=1)
it = hm.create_it(pos=hm.Pos.BEGIN, type="italic")
ph = hm.create_ph(type="variable", x=100)
hi = hm.create_hi(content=["highlighted"])
sub = hm.create_sub(content=["sub-flow"], datatype="text")

# Auxiliary elements
prop = hm.create_prop("customer", "acme-corp")
note = hm.create_note("Translation note")

Text Extraction

Extract plain text content from elements, skipping inline markup:

import hypomnema as hm

tuv = hm.create_tuv(
    "en",
    content=[
        "Hello ",
        hm.create_bpt(i=1, content="Bpt text"),
        "World",
        hm.create_ept(i=1, content="Ept text")
        ],
    )

# Quick access via .text property
print(tuv.text)  # "Hello World"

# Iterate over text segments
for text in hm.iter_text(tuv):
    print(text)  # "Hello " then "Bpt text" then "World" then "Ept text"

# Ignore specific element types
for text in hm.iter_text(tuv, Ignore=[hm.Bpt]):
    print(text)  # "Hello " then "World" then "Ept text"

Policy Configuration

Real-world TMX files are often imperfect. Policies let you configure how Hypomnema handles validation errors:

import logging
import hypomnema as hm
from hypomnema.xml.policy import PolicyValue

policy = hm.XmlPolicy(
    missing_seg=PolicyValue("ignore", logging.WARNING),
    extra_text=PolicyValue("ignore", logging.INFO),
    invalid_attribute_value=PolicyValue("ignore", logging.DEBUG),
    required_attribute_missing=PolicyValue("ignore", logging.ERROR),
)

tmx = hm.load("messy.tmx", policy=policy)
hm.dump(tmx, "clean.tmx", policy=policy)

Available Policy Keys

Deserialization:

missing_handler — No handler for element type
invalid_tag — Unexpected XML tag encountered
required_attribute_missing — Mandatory TMX attribute absent
invalid_attribute_value — Attribute violates TMX spec
extra_text — Unexpected text within elements
missing_seg — TUV missing required segment
multiple_seg — TUV has multiple segments
empty_content — Element has no text content

Serialization:

required_attribute_missing — Mandatory dataclass field is None
invalid_attribute_type — Field type incompatible with XML
invalid_content_type — Content is not a string
missing_handler — No handler for dataclass type
invalid_object_type — Handler received unexpected type
invalid_child_element — Child not permitted by TMX structure
multiple_headers — Multiple header elements
missing_header — Mandatory header missing

Namespace:

invalid_namespace — Invalid namespace prefix or URI
existing_namespace — Namespace already registered
missing_namespace — Namespace not registered

Low-Level API

For finer control over parsing and serialization:

import hypomnema as hm

# Choose backend
backend = hm.LxmlBackend()      # Fast, feature-rich
# or
backend = hm.StandardBackend()  # Portable, stdlib only

# Deserialize
deserializer = hm.Deserializer(backend=backend)
root = backend.parse("file.tmx")
tmx = deserializer.deserialize(root)

# Manipulate
new_tuv = hm.create_tuv("de", content=["Guten Tag"])
new_tu = hm.create_tu(variants=[new_tuv])
tmx.body.append(new_tu)

# Serialize
serializer = hm.Serializer(backend=backend)
xml_element = serializer.serialize(tmx)
backend.write(xml_element, "output.tmx")

QName Support

Work with XML qualified names:

from hypomnema.xml.qname import QName

# Simple name
qname = QName("tag")

# Clark notation
# namespace map required when using prefixed/Clark notation
qname = QName("{http://www.example.com}tag", nsmap={"ns": "http://www.example.com"})
print(qname.uri)             # "http://www.example.com"
print(qname.local_name)      # "tag"
print(qname.prefix)          # "ns"
print(qname.qualified_name)  # "{http://www.example.com}tag"

# Use with tag filtering
for tu in hm.load("file.tmx", filter=qname):
    print(tu.tuid)

Creating TMX from Scratch

import hypomnema as hm

header = hm.create_header(
    srclang="en",
    creationtool="my-tool",
    segtype=hm.Segtype.SENTENCE,
)

source = hm.create_tuv(
    "en",
    content=[
        "Click ",
        hm.create_bpt(i=1, type="link"),
        "here",
        hm.create_ept(i=1),
        " to continue.",
    ],
)

target = hm.create_tuv(
    "fr",
    content=[
        "Cliquez ",
        hm.create_bpt(i=1, type="link"),
        "ici",
        hm.create_ept(i=1),
        " pour continuer.",
    ],
)

tu = hm.create_tu(
    tuid="001",
    variants=[source, target],
    props=[hm.create_prop("domain", "ui")],
    notes=[hm.create_note("Button label")],
)

tmx = hm.create_tmx(header=header, body=[tu])
hm.dump(tmx, "output.tmx")

TMX 1.4b Level 2 Compliance

Hypomnema is the only Python library that fully implements the TMX 1.4b Level 2 specification:

Arbitrary Nesting Depth: No limits on inline element nesting. <bpt>/<ept> pairs, <ph> placeholders, and <sub> elements can nest to any depth.
Complete Inline Element Support: All six inline markup elements (<bpt>, <ept>, <it>, <ph>, <hi>, <sub>) with proper mixed content handling.
Full Attribute Modeling: Every TMX attribute is typed, including enumerations for segtype, pos, and assoc.
Metadata Preservation: Properties and notes supported at all valid nesting levels.

Intentionally Omitted Elements

<ude> — User Defined Encoding
<map> — Character mapping

These elements relate to custom encodings and are rarely encountered. If needed, subclass the handler classes in xml/deserialization/_handlers.py and xml/serialization/_handlers.py.

Architecture

Hypomnema is built on three decoupled layers:

Backend Layer (hypomnema.xml.backends) — Abstracts XML parser implementation
Orchestration Layer (hypomnema.xml) — Manages serialization/deserialization dispatch
Handler Layer — Specialized classes for each TMX element type

Supported Elements

Structural: Tmx, Header, Tu, Tuv

Inline: Bpt, Ept, It, Ph, Hi, Sub

Auxiliary: Prop, Note

Enumerations: Segtype, Pos, Assoc

Terminology Reference

See TERMINOLOGY.md for TMX 1.4b terminology.

Contributing

Contributions are welcome! Please open an issue before submitting a pull request.

License

MIT

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.13
- Python :: 3.14

Release history Release notifications | RSS feed

0.8

Apr 9, 2026

0.7

Feb 25, 2026

This version

0.6

Jan 28, 2026

0.5.0

Jan 15, 2026

0.4.4

Dec 19, 2025

0.4.3

Dec 15, 2025

0.4.2

Dec 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypomnema-0.6.tar.gz (39.4 kB view details)

Uploaded Jan 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hypomnema-0.6-py3-none-any.whl (46.6 kB view details)

Uploaded Jan 28, 2026 Python 3

File details

Details for the file hypomnema-0.6.tar.gz.

File metadata

Download URL: hypomnema-0.6.tar.gz
Upload date: Jan 28, 2026
Size: 39.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hypomnema-0.6.tar.gz
Algorithm	Hash digest
SHA256	`30e067d82ae6bf860e3b488f2d15c76cc51a6230b2ea1abbfdcfaab07820ef30`
MD5	`202bce6cd5c9a8896ee85fed544b815d`
BLAKE2b-256	`74ccf78e75efc433185253d0ffa8055c0c7ed9761bc7d87831cbc19c535d7cd7`

See more details on using hashes here.

File details

Details for the file hypomnema-0.6-py3-none-any.whl.

File metadata

Download URL: hypomnema-0.6-py3-none-any.whl
Upload date: Jan 28, 2026
Size: 46.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hypomnema-0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3e13b996d951dde066e329b2cb3b9e04995e163584b374400fc94a3c0d51e26e`
MD5	`719e783abf596e738f33de7e4b60e4a3`
BLAKE2b-256	`a65b1798239477176b9489d9b8bca9c10bb332f50949d915a5dec9f2e8e34150`

See more details on using hashes here.

hypomnema 0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hypomnema

Why Hypomnema?

What is TMX?

Installation

Quick Start

High-Level API

Loading Files

Saving Files

Element Creation

Text Extraction

Policy Configuration

Low-Level API

QName Support

Creating TMX from Scratch

TMX 1.4b Level 2 Compliance

Intentionally Omitted Elements

Architecture

Supported Elements

Terminology Reference

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes