Skip to main content

A type-safe localization toolkit for parsing, converting, and matching TMX, XLIFF, PO, JSON, HTML, CSV, XLSX, and IDML files.

Project description

lokit

[!WARNING] Beta Release: lokit is currently in Beta. The API is volatile and subject to rapid, breaking changes prior to the official V1 release.

lokit is a high-performance, strictly type-safe, and highly memory-efficient localization toolkit for Python. Supports Python 3.12+.

Unlike legacy tools that wrap around XML DOM element trees in-memory, lokit represents a shift away from XML-based localization interchange formats towards native language parsing. It ingests localization formats (TMX, XLIFF, PO, XLSX, CSV, JSON, HTML, IDML) and compiles them into a strict, unified structural data model. This enables not just parsing, but robust data manipulation, semantic extraction, and advanced translation memory features out-of-the-box. Lokit focuses on streaming and asynchronous processing rather than synchronous events using in-memory files.

This format type can be easily converted to JSON for interchange with other systems. I've made parsing and data transfers as native as possible by capturing all elements of traditional interchange formats in a common format structure. This allows for much better compatibility, especially in terms of segment matching and leveraging as it uses flattened strings as standard. Tags are preserved but as a common format, meaning the structure parsed from XLIFF will be the same as the structure parsed from HTML.

These legacy file formats have supported vendor-lock in for many year, making it difficult for any client to move to another system. Seeing that this is a major issue across the domain, something new is needed where vendors do not use hidden, legacy technology to lock in their clients. Localization deserves innovation.

Note: SDKs in other languages are coming soon.

Core Features

lokit provides a comprehensive suite of tools for managing localization data:

  • Native Structural Modeling: Converts disjointed interchange formats into a strict, unified Python Data class, ensuring complete type safety across your entire localization pipeline.
  • Advanced Matching Engine: Provides Exact Matching, Fuzzy Matching (via SequenceMatcher), and In-Context Exact (ICE) Matching leveraging previous and next segment context, as well as inline tag signatures.
  • Deep Sub-segment Extraction: Automatically parses and isolates inline tags, properties, and formatting markers, allowing for safe manipulation of text without corrupting code.
  • Semantic Querying: Easily traverse and filter translation units using complex predicates, exact ID lookups, or deep nested JSON path querying (where()).
  • Plural Support: Native extraction and structuring of pluralized translation units.
  • Universal Format Conversion: Instantly import and export between any supported format (e.g., TMX to JSON, HTML to XLIFF) with zero data loss.
  • Synchronous and Asynchronous Streaming: Process massive enterprise files natively using Python async generators to keep memory overhead to an absolute minimum.

Parsing Performance vs Translate-Toolkit

When dealing with enterprise-scale localization environments, parsing performance and memory efficiency are paramount. lokit is designed to be significantly leaner and faster than the industry standard.

In a stress-test benchmark on a 612 MB TMX file containing 557,058 segments, parsing to XLIFF and back into TMX over 3 consecutive iterations, lokit yielded the following comparative averages:

Library Avg Duration (s) Peak Memory (MB) Memory Efficiency
lokit (async) 57.5s 213.8 MB ~10.6x Less RAM
translate-toolkit 60.0s 2,275.7 MB ~2.3 GB

Because translate-toolkit loads whole files into string buffers and C-level DOM trees synchronously, its memory spikes to over 2.2 Gigabytes. lokit leverages generator-based async streaming, allowing it to complete the exact same workload using 10.6x less RAM, while operating slightly faster overall.

This memory safety allows for parallel processing of events, making it suitable for large-scale localization workflows and backend systems.

SDK Usage Reference

Lokit operates around a central BaseStructure dataclass model, which standardizes localization units and segments. This instructs better standardization and branching in a more language native way compared to XML based file formats. Parsing SDKs are added for both extraction and export tasks for localization interchange formats along with common file types.

Installation

Install lokit via pip:

pip install lokit-python

Basic Parsing and Conversion

Converting files synchronously is straightforward using the modular importers and exporters APIs.

from lokit.importers import import_tmx
from lokit.exporters import export_xliff

# Parse a localization file into a BaseStructure
document = import_tmx("path/to/source.tmx")

print(f"Loaded {len(document.data)} units")
print(f"Source Locale: {document.source_locale}")
print(f"Target Locale: {document.target_locale}")

# Export the BaseStructure out to a new format
export_xliff(document, "path/to/target.xliff")

Asynchronous Streaming for Massive Files

For files spanning hundreds of megabytes, parsing the entire DOM structure into memory is inefficient. Lokit supports stream-parsing natively.

import asyncio
from lokit.importers import import_tmx_async
from lokit.exporters.xliff import export_xliff_async
from lokit.data.structure import BaseStructure

async def process_large_file():
    units = {}
    
    # Stream the file in an asynchronous generator
    async for unit_id, unit_data in import_tmx_async("massive_file.tmx"):
        units[unit_id] = unit_data

    # Reconstruct the document safely
    doc = BaseStructure(
        source_locale="en_US", 
        target_locale="de_DE", 
        data=units
    )
    
    # Export asynchronously
    await export_xliff_async(doc, "massive_output.xliff")

asyncio.run(process_large_file())

Advanced Querying and Matching

The Lokit logic wrapper provides access to the powerful matching engine and data manipulation features.

from lokit.logic import Lokit

# Wrap a parsed document or path in the Lokit logic engine
engine = Lokit.parse("path/to/source.xliff")

# Query specific nested data structures
button_units = engine.where("extensions.component", "checkout_button")

# Perform fuzzy matching against translation memory
results = engine.fuzzy_find("Complete your purchase", limit=5, threshold=0.75)
for match in results:
    print(f"Match found: {match.unit_id} (Score: {match.score})")

# Perform strict In-Context Exact (ICE) matching
ice_match = engine.match(
    source="Submit",
    target_unit_id="submit_btn_1",
    previous_source="Enter your email",
    require_context=True
)

Supported Formats

  • TMX (Translation Memory eXchange)
  • XLIFF (XML Localization Interchange File Format)
  • PO/POT (Gettext Portable Object)
  • XLSX / CSV (Spreadsheets)
  • JSON (Key-Value nested localization trees)
  • HTML (Hypertext Markup)
  • IDML (InDesign Markup Language)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lokit_python-0.1.1.tar.gz (50.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lokit_python-0.1.1-cp313-cp313-win_amd64.whl (788.7 kB view details)

Uploaded CPython 3.13Windows x86-64

lokit_python-0.1.1-cp313-cp313-win32.whl (685.6 kB view details)

Uploaded CPython 3.13Windows x86

lokit_python-0.1.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (876.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

lokit_python-0.1.1-cp313-cp313-macosx_11_0_arm64.whl (693.0 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

lokit_python-0.1.1-cp313-cp313-macosx_10_13_x86_64.whl (743.6 kB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

lokit_python-0.1.1-cp312-cp312-win_amd64.whl (788.3 kB view details)

Uploaded CPython 3.12Windows x86-64

lokit_python-0.1.1-cp312-cp312-win32.whl (685.8 kB view details)

Uploaded CPython 3.12Windows x86

lokit_python-0.1.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (879.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

lokit_python-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (694.6 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

lokit_python-0.1.1-cp312-cp312-macosx_10_13_x86_64.whl (745.1 kB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

File details

Details for the file lokit_python-0.1.1.tar.gz.

File metadata

  • Download URL: lokit_python-0.1.1.tar.gz
  • Upload date:
  • Size: 50.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lokit_python-0.1.1.tar.gz
Algorithm Hash digest
SHA256 743a84118ef6bf81351c925505454989a00d9d94e3bc836b5d5c927f4be2c568
MD5 c2a9428b9c45e8ebd9eb613b967d69ee
BLAKE2b-256 ca3880602086eb7d368bd93305f44d4732f591869165ad875cf0ba0795b597fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.1.tar.gz:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.1-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 f080fb52c0da5d8c0586b0be8a2ad77e8169407a5dbc2575f5f94f6034e3805e
MD5 0a229ba7ae098fac433dcf1911802cfd
BLAKE2b-256 f697e95124124ff003fe3d8dfb2dccd12950f892d5e9562310d1dab885066e37

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.1-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.1-cp313-cp313-win32.whl.

File metadata

  • Download URL: lokit_python-0.1.1-cp313-cp313-win32.whl
  • Upload date:
  • Size: 685.6 kB
  • Tags: CPython 3.13, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lokit_python-0.1.1-cp313-cp313-win32.whl
Algorithm Hash digest
SHA256 eed80d4a51461ca33e48558fe5c85f5a3227f42d0a23d1ebad7b6a6747f33e73
MD5 b2dde914de61ca0b25f88e61125463de
BLAKE2b-256 f48e4dfc26f0930314add0a594d0e7b014852c5dcad83288d88fd772b828d1b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.1-cp313-cp313-win32.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 948f39773d17416a269dee94d3d4b1c6008bcb66aa6df5b570310ca9fe0efb93
MD5 b21dc4eab9b0e96672df1af6030b12e6
BLAKE2b-256 00b727dd09da01bd1af56c27bec3b26a0ee982c2d3129cb55ccfc77e66250ad3

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 14ce32d26731eec92309ed2676ddc38e65b272ed5b08f3bf4b362697aa0e6e7f
MD5 076c69425f02f35b4164997b92de6ae2
BLAKE2b-256 7da2520b956a1cedce6705e6bae7d808ad1c57fd7421523509e8eb047ba0ba91

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.1-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.1-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 46a28e5f458ffc288a6291535006959e186f7c5e480faf9fb7a4b3e7b696ced9
MD5 d7791d4843d9dd55a6157797287577ac
BLAKE2b-256 4e4e8827947768835ccd5024d92b4ebb662f94b32ae4ce0fde09bd3b796a4440

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.1-cp313-cp313-macosx_10_13_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 aabe9b55398bf324501e938f6b1351d94c7ea736dae7f47c810eef6b526a33ce
MD5 bab0756ee80f3a694407ca0c6a454e7f
BLAKE2b-256 2a22582659562f4bdafb6f438bf06814ace9a316a42860de2067f0ca8f549fd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.1-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.1-cp312-cp312-win32.whl.

File metadata

  • Download URL: lokit_python-0.1.1-cp312-cp312-win32.whl
  • Upload date:
  • Size: 685.8 kB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lokit_python-0.1.1-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 75128833d683dba6162c0200f39d00bb40c67eb6d44c2d9b05945e71af06f55b
MD5 d07fd35f3db3044212daa4a50961145f
BLAKE2b-256 684a0b65287b46bd4387c484612fff44ef66a7c0189583da52803d697852673a

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.1-cp312-cp312-win32.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b8e7f7eec52061121da346c34b1fad6643e13b6698604fd00ccf286e87b59c6f
MD5 2fa1a886ef2aecd4860706c258f62266
BLAKE2b-256 21d774efb114a0ce68c9fb9b1ec2020f7ebe315729a4f446a09b64065f390a2c

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6359c4839401df70972ba8ecc8e71c5c1f6f5bbb32e85982e226c3d34addbf9f
MD5 e6253f92a50b74c422641dd6ca338ca9
BLAKE2b-256 18f1ec83b1aa10871740777d542f177a048b486a0787a7dd8a6ae7f36ee02f4f

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.1-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.1-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 158942d8fae1baab9ef5a709fa4f348f44525582eb4c9213f228d9d658f44639
MD5 7df630b22af911c189ec8c436632b683
BLAKE2b-256 0a46a261b7bbbaa328a6f22d48ce0b473f6908ed6e0055a6179184ced6e14331

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.1-cp312-cp312-macosx_10_13_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page