Python library for manipulating, creating and editing tmx files

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.13
- Python :: 3.14

Project description

Hypomnema

Industrial-grade TMX 1.4b parsing and serialization for Python.

Hypomnema is a strictly typed infrastructure library for working with TMX 1.4b (Translation Memory eXchange) files. It is designed as a foundation for building localization tools, CAT software, and NLP pipelines, focusing on correctness, type safety, and memory efficiency when handling large datasets.

Warning
This project is currently in Alpha. It is a work in progress and should not be used for full production workflows until the 1.0 version is released. API changes may occur.

Why Hypomnema?

While other TMX libraries exist, Hypomnema is built with modern Python engineering standards to address common pain points:

Strict Type Safety: Every TMX element is modeled as a typed Python dataclass. This ensures your code is robust, autocompletion works perfectly, and you catch errors at static analysis time rather than runtime.
Policy-Driven Error Handling: Real-world TMX files are often messy. Instead of crashing on a single malformed date or missing attribute, Hypomnema uses a granular Policy System. You define exactly how to handle specific errors (raise, ignore, use default, or keep raw value) without compromising the integrity of the rest of the file.
Full TMX 1.4b Level 2 Compliance: Supports arbitrary inline element nesting depth and complete attribute modeling.
Memory Efficient: Supports streaming processing for large TMX files.
Backend Agnostic: Works with standard xml or lxml (for performance).

Installation

Install using uv (recommended):

uv add hypomnema

Or using pip:

pip install hypomnema

For maximum performance with large files (enables lxml backend):

uv add "hypomnema[lxml]"
# or
pip install "hypomnema[lxml]"

Quick Start

import hypomnema as hm

# Load a TMX file
tmx = hm.load("translations.tmx")

# Inspect the content
print(f"Source language: {tmx.header.srclang}")
print(f"Translation units: {len(tmx.body)}")

# Find a specific translation unit
for tu in tmx.body:
    for tuv in tu.variants:
        if tuv.lang == "fr":
            print(f"French: {tuv.content}")

# Save changes
hm.dump(tmx, "output.tmx")

Advanced Usage

Streaming Large Files

For large translation memories, use the streaming API to process units one by one without loading the whole file into RAM:

import hypomnema as hm

# Stream translation units ('tu') only
for tu in hm.load("massive_memory.tmx", filter="tu"):
    print(f"Processing TU: {tu.tuid}")
    # Process units here...

Creating and Saving TMX Files

You can programmatically create TMX files using the helper factory functions:

import hypomnema as hm
from hypomnema import helpers

# 1. Create a Header
header = helpers.create_header(
    creationtool="hypomnema",
    segtype="sentence",
    srclang="en-US",
    adminlang="en-US"
)

# 2. Create a Translation Unit (TU) with variants
# TUVs can contain plain text or mixed content with inline tags
tuv_en = helpers.create_tuv("en-US", content="Hello world")
tuv_fr = helpers.create_tuv("fr-FR", content=["Bonjour ", helpers.create_ph(x=1, type="lb"), "le monde"])

tu = helpers.create_tu(
    tuid="1",
    srclang="en-US",
    variants=[tuv_en, tuv_fr]
)

# 3. Create the TMX object
tmx = helpers.create_tmx(header=header, body=[tu])

# 4. Save to disk
hm.dump(tmx, "output.tmx")

Policy Configuration

Real-world TMX files are often imperfect. Policies let you configure how Hypomnema handles validation errors:

import logging
import hypomnema as hm
from hypomnema.xml.policy import Behavior, XmlDeserializationPolicy

policy = XmlDeserializationPolicy(
    missing_seg=Behavior("ignore", logging.WARNING),
    extra_text=Behavior("ignore", logging.INFO),
)

tmx = hm.load("messy.tmx", deserializer_policy=policy)

Available Policy Keys

Deserialization:

invalid_child_tag: Action for unexpected child elements.
missing_text_content: Action for elements missing required text.
invalid_tag: Action for unexpected element tags.
extra_text: Action for unexpected text content.
required_attribute_missing: Action for missing required attributes.
multiple_seg: Action for multiple elements in .
multiple_headers: Action for multiple elements.
invalid_datetime_value: Action for unparsable datetime values.
invalid_enum_value: Action for invalid enum values.
invalid_int_value: Action for unparsable integer values.
missing_deserialization_handler: Action for missing element handlers.
missing_seg: Action for elements without .
multiple_body: Action for multiple elements.
missing_header: Action for elements without .
missing_body: Action for elements without .

Serialization:

invalid_element_type: Action for unexpected object types.
missing_text_content: Action for objects missing required text.
required_attribute_missing: Action for missing required attributes.
invalid_child_element: Action for invalid child element types.
invalid_attribute_type: Action for attributes with wrong types.
missing_serialization_handler: Action for missing element handlers.

Namespace:

existing_namespace: Action when registering an already-existing prefix.
inexistent_namespace: Action when resolving an unregistered prefix.

Text Extraction

Extract plain text content from elements, skipping inline markup:

from hypomnema import helpers, Bpt

tuv = helpers.create_tuv(
    "en",
    content=[
        "Hello ",
        helpers.create_bpt(i=1, content="Bpt text"),
        "World",
        helpers.create_ept(i=1, content="Ept text")
        ],
    )

# Quick access via text helper
print(helpers.text(tuv))  # "Hello World"

# Iterate over text segments
for text in helpers.iter_text(tuv):
    print(text)  # "Hello " then "Bpt text" then "World" then "Ept text"

# Ignore specific element types
for text in helpers.iter_text(tuv, ignore=Bpt):
    print(text)  # "Hello " then "World" then "Ept text"

TMX 1.4b Level 2 Compliance

Hypomnema is the only Python library that fully implements the TMX 1.4b Level 2 specification:

Arbitrary Nesting Depth: No limits on inline element nesting. <bpt>/<ept> pairs, <ph> placeholders, and <sub> elements can nest to any depth.
Complete Inline Element Support: All six inline markup elements (<bpt>, <ept>, <it>, <ph>, <hi>, <sub>) with proper mixed content handling.
Full Attribute Modeling: Every TMX attribute is typed, including enumerations for segtype, pos, and assoc.
Metadata Preservation: Properties and notes supported at all valid nesting levels.

Development

To contribute or run tests locally:

Clone the repository.
Install dependencies using uv:
```
uv sync
```
Run the test suite:
```
uv run pytest
```

License

MIT

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.13
- Python :: 3.14

Release history Release notifications | RSS feed

0.8

Apr 9, 2026

This version

0.7

Feb 25, 2026

0.6

Jan 28, 2026

0.5.0

Jan 15, 2026

0.4.4

Dec 19, 2025

0.4.3

Dec 15, 2025

0.4.2

Dec 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypomnema-0.7.tar.gz (48.9 kB view details)

Uploaded Feb 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hypomnema-0.7-py3-none-any.whl (55.3 kB view details)

Uploaded Feb 25, 2026 Python 3

File details

Details for the file hypomnema-0.7.tar.gz.

File metadata

Download URL: hypomnema-0.7.tar.gz
Upload date: Feb 25, 2026
Size: 48.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hypomnema-0.7.tar.gz
Algorithm	Hash digest
SHA256	`a45eb0259ee8c4c4f9745999bcbdfbcfd8cbf8d9a740177b1ab6dcbf4fb7053f`
MD5	`d4246c292758e73e76702d38c6578963`
BLAKE2b-256	`256d1d78a67edfc68f7ddec2a052e0875a564d10728bd97a2ffea88172be05d1`

See more details on using hashes here.

File details

Details for the file hypomnema-0.7-py3-none-any.whl.

File metadata

Download URL: hypomnema-0.7-py3-none-any.whl
Upload date: Feb 25, 2026
Size: 55.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hypomnema-0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`90d8d770132c7be28dc2fa235dfb7c7124a20e59831dc8485940131117a85ea7`
MD5	`91e76e48a14c826e854e1ff62a635178`
BLAKE2b-256	`3e90e453885ddc5febaa5904c3d4a7568a114fcf91019bc268afc7115a07bcf4`

See more details on using hashes here.

hypomnema 0.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hypomnema

Why Hypomnema?

Installation

Quick Start

Advanced Usage

Streaming Large Files

Creating and Saving TMX Files

Policy Configuration

Text Extraction

TMX 1.4b Level 2 Compliance

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes