Skip to main content

A type-safe localization toolkit for parsing, converting, and matching TMX, XLIFF, PO, JSON, HTML, CSV, XLSX, and IDML files.

Project description

lokit

[!WARNING] Beta Release: lokit is currently in Beta. The API is volatile and subject to rapid, breaking changes prior to the official V1 release.

lokit is a high-performance, strictly type-safe, and highly memory-efficient localization toolkit for Python. Supports Python 3.12+.

Unlike legacy tools that wrap around XML DOM element trees in-memory, lokit represents a shift away from XML-based localization interchange formats towards native language parsing. It ingests localization formats (TMX, XLIFF, PO, XLSX, CSV, JSON, HTML, IDML) and compiles them into a strict, unified structural data model. This enables not just parsing, but robust data manipulation, semantic extraction, and advanced translation memory features out-of-the-box. Lokit focuses on streaming and asynchronous processing rather than synchronous events using in-memory files.

This format type can be easily converted to JSON for interchange with other systems. I've made parsing and data transfers as native as possible by capturing all elements of traditional interchange formats in a common format structure. This allows for much better compatibility, especially in terms of segment matching and leveraging as it uses flattened strings as standard. Tags are preserved but as a common format, meaning the structure parsed from XLIFF will be the same as the structure parsed from HTML.

These legacy file formats have supported vendor-lock in for many year, making it difficult for any client to move to another system. Seeing that this is a major issue across the domain, something new is needed where vendors do not use hidden, legacy technology to lock in their clients. Localization deserves innovation.

Note: SDKs in other languages are coming soon.

Core Features

lokit provides a comprehensive suite of tools for managing localization data:

  • Native Structural Modeling: Converts disjointed interchange formats into a strict, unified Python Data class, ensuring complete type safety across your entire localization pipeline.
  • Advanced Matching Engine: Provides Exact Matching, Fuzzy Matching (via SequenceMatcher), and In-Context Exact (ICE) Matching leveraging previous and next segment context, as well as inline tag signatures.
  • Deep Sub-segment Extraction: Automatically parses and isolates inline tags, properties, and formatting markers, allowing for safe manipulation of text without corrupting code.
  • Semantic Querying: Easily traverse and filter translation units using complex predicates, exact ID lookups, or deep nested JSON path querying (where()).
  • Plural Support: Native extraction and structuring of pluralized translation units.
  • Universal Format Conversion: Instantly import and export between any supported format (e.g., TMX to JSON, HTML to XLIFF) with zero data loss.
  • Synchronous and Asynchronous Streaming: Process massive enterprise files natively using Python async generators to keep memory overhead to an absolute minimum.

Parsing Performance vs Translate-Toolkit

When dealing with enterprise-scale localization environments, parsing performance and memory efficiency are paramount. lokit is designed to be significantly leaner and faster than the industry standard.

In a stress-test benchmark on a 612 MB TMX file containing 557,058 segments, parsing to XLIFF and back into TMX over 3 consecutive iterations, lokit yielded the following comparative averages:

Library Avg Duration (s) Peak Memory (MB) Memory Efficiency
lokit (async) 57.5s 213.8 MB ~10.6x Less RAM
translate-toolkit 60.0s 2,275.7 MB ~2.3 GB

Because translate-toolkit loads whole files into string buffers and C-level DOM trees synchronously, its memory spikes to over 2.2 Gigabytes. lokit leverages generator-based async streaming, allowing it to complete the exact same workload using 10.6x less RAM, while operating slightly faster overall.

This memory safety allows for parallel processing of events, making it suitable for large-scale localization workflows and backend systems.

SDK Usage Reference

Lokit operates around a central BaseStructure dataclass model, which standardizes localization units and segments. This instructs better standardization and branching in a more language native way compared to XML based file formats. Parsing SDKs are added for both extraction and export tasks for localization interchange formats along with common file types.

Installation

Install lokit via pip:

pip install lokit-python

Basic Parsing and Conversion

Converting files synchronously is straightforward using the modular importers and exporters APIs.

from lokit.importers import import_tmx
from lokit.exporters import export_xliff

# Parse a localization file into a BaseStructure
document = import_tmx("path/to/source.tmx")

print(f"Loaded {len(document.data)} units")
print(f"Source Locale: {document.source_locale}")
print(f"Target Locale: {document.target_locale}")

# Export the BaseStructure out to a new format
export_xliff(document, "path/to/target.xliff")

Asynchronous Streaming for Massive Files

For files spanning hundreds of megabytes, parsing the entire DOM structure into memory is inefficient. Lokit supports stream-parsing natively.

import asyncio
from lokit.importers import import_tmx_async
from lokit.exporters.xliff import export_xliff_async
from lokit.data.structure import BaseStructure

async def process_large_file():
    units = {}
    
    # Stream the file in an asynchronous generator
    async for unit_id, unit_data in import_tmx_async("massive_file.tmx"):
        units[unit_id] = unit_data

    # Reconstruct the document safely
    doc = BaseStructure(
        source_locale="en_US", 
        target_locale="de_DE", 
        data=units
    )
    
    # Export asynchronously
    await export_xliff_async(doc, "massive_output.xliff")

asyncio.run(process_large_file())

Advanced Querying and Matching

The Lokit logic wrapper provides access to the powerful matching engine and data manipulation features.

from lokit.logic import Lokit

# Wrap a parsed document or path in the Lokit logic engine
engine = Lokit.parse("path/to/source.xliff")

# Query specific nested data structures
button_units = engine.where("extensions.component", "checkout_button")

# Perform fuzzy matching against translation memory
results = engine.fuzzy_find("Complete your purchase", limit=5, threshold=0.75)
for match in results:
    print(f"Match found: {match.unit_id} (Score: {match.score})")

# Perform strict In-Context Exact (ICE) matching
ice_match = engine.match(
    source="Submit",
    target_unit_id="submit_btn_1",
    previous_source="Enter your email",
    require_context=True
)

Supported Formats

  • TMX (Translation Memory eXchange)
  • XLIFF (XML Localization Interchange File Format)
  • PO/POT (Gettext Portable Object)
  • XLSX / CSV (Spreadsheets)
  • JSON (Key-Value nested localization trees)
  • HTML (Hypertext Markup)
  • IDML (InDesign Markup Language)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lokit_python-0.1.2.tar.gz (53.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lokit_python-0.1.2-cp313-cp313-win_amd64.whl (826.1 kB view details)

Uploaded CPython 3.13Windows x86-64

lokit_python-0.1.2-cp313-cp313-win32.whl (717.4 kB view details)

Uploaded CPython 3.13Windows x86

lokit_python-0.1.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (916.9 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

lokit_python-0.1.2-cp313-cp313-macosx_11_0_arm64.whl (730.1 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

lokit_python-0.1.2-cp313-cp313-macosx_10_13_x86_64.whl (782.1 kB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

lokit_python-0.1.2-cp312-cp312-win_amd64.whl (825.8 kB view details)

Uploaded CPython 3.12Windows x86-64

lokit_python-0.1.2-cp312-cp312-win32.whl (717.0 kB view details)

Uploaded CPython 3.12Windows x86

lokit_python-0.1.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (919.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

lokit_python-0.1.2-cp312-cp312-macosx_11_0_arm64.whl (732.5 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

lokit_python-0.1.2-cp312-cp312-macosx_10_13_x86_64.whl (784.0 kB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

File details

Details for the file lokit_python-0.1.2.tar.gz.

File metadata

  • Download URL: lokit_python-0.1.2.tar.gz
  • Upload date:
  • Size: 53.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lokit_python-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d48bf7bdf3f41252c94d7ac27691dbc93a52440715202e4fc73f7de7a5701f28
MD5 2a1f604dfbc2253e8a4f276d94465289
BLAKE2b-256 69e51ffcd99aeec727be7547e4b2ed8aa7ce236830104bfe0b07a2b55b6864fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.2.tar.gz:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.2-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 fd374dd77232b35c8d8e5a881609b0eaf1c41b48b47765b7d2301f9c705d5f1c
MD5 a3caa3978a6f2930a4875951e45818eb
BLAKE2b-256 305d9bc8786c40a3298aebdbd4a7457a206c9dbe7a696f9a22907e239471e92a

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.2-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.2-cp313-cp313-win32.whl.

File metadata

  • Download URL: lokit_python-0.1.2-cp313-cp313-win32.whl
  • Upload date:
  • Size: 717.4 kB
  • Tags: CPython 3.13, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lokit_python-0.1.2-cp313-cp313-win32.whl
Algorithm Hash digest
SHA256 a7b7ae7be257b1ebaf9b3760b43a9123ba53c7930678fe77c9cc5c30dad768cc
MD5 dd23f86dee63fe4bec47edb065d780a9
BLAKE2b-256 59223aaa5100775c42c15c12ac2280da99e2a2f90d6d4341004fa44c23868628

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.2-cp313-cp313-win32.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 afb6d2f5bde638133089509cbcd04e52fe91f6526bee237c61d31f7f58f7a32c
MD5 4844b91618e1100d572fea933fd577f1
BLAKE2b-256 a30cc6c79c1ebf53ee9b91896f51c069da1c008ef4ed3ad5c9001b11fd7112e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d24a937a74e26f53361818347f07e4a45bfe0be0f280b119e974d5a702be8a8a
MD5 4344539d47e78370097eaa983309b82e
BLAKE2b-256 3871b21f8274910c1b5003d65eb326b4fd53c1b617639772e4f5111ff64548e2

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.2-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.2-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.2-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 6961f3e801d0597c5359470c270eddd128f687c7606558f6833acfcf00cab7e6
MD5 141540bf64406245f631d41858eaf99c
BLAKE2b-256 4191bd07deff69bc3b6391c6c401b0eddc85d566dc1cbf663965fdf3e1d15acc

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.2-cp313-cp313-macosx_10_13_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.2-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 760548f4604947b00b177deb0d22f28e0d5016c42c035660bd1144cb2d548023
MD5 0ad20951441e6a1663c47eee91d7249d
BLAKE2b-256 f41d6c4d143456f2fb158cfc122a893be84488844e1378999cc2c2986f845d40

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.2-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.2-cp312-cp312-win32.whl.

File metadata

  • Download URL: lokit_python-0.1.2-cp312-cp312-win32.whl
  • Upload date:
  • Size: 717.0 kB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lokit_python-0.1.2-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 f7352c96e5021a25f4ca0d49ca2f13a43e6e177fb5918557a580a25cff01e7ea
MD5 54a8508f7735b996ec7642025c7bb88c
BLAKE2b-256 10081c2f4efe1dad33325a72369b2df302fc86a5158b39774d4207756a7cddfb

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.2-cp312-cp312-win32.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 dc34bba004631d6f3ede9299a6ed581adb90685ce122db1fe839b934bdbe134a
MD5 951fbf89949c92e8b3da50590b370a9a
BLAKE2b-256 c559e2a81b262014ac0a7be0ee6addb5163ff08fc7f8bcb7fff50680081dd60c

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 686ad1fdc5774365f8a514997eafec26b024ffa134398d63dbea0b7f0eb71097
MD5 a6c034570994d97d4233cfa1386aabcb
BLAKE2b-256 b1f9d3c0471d0c539a651244fe0eac85758b52d3ae90dfac92686a60996c0b28

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.2-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.2-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 c2eb40c4f8fa265cdc0c1fa45c4e667efacb07f722aff0f12a8ed16e94d8ca71
MD5 94ee67fd55bbeabdef3373a4f0f79d21
BLAKE2b-256 650f16eb8fd8395dbf1b132caf5eb6d795230ffe2eb8d730084107fb401f55c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.2-cp312-cp312-macosx_10_13_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page