Skip to main content

A type-safe localization toolkit for parsing, converting, and matching TMX, XLIFF, PO, JSON, HTML, CSV, XLSX, and IDML files.

Project description

lokit

[!WARNING] Beta Release: lokit is currently in Beta. The API is volatile and subject to rapid, breaking changes prior to the official V1 release.

lokit is a high-performance, strictly type-safe, and highly memory-efficient localization toolkit for Python. Supports Python 3.12+.

Unlike legacy tools that wrap around XML DOM element trees in-memory, lokit represents a shift away from XML-based localization interchange formats towards native language parsing. It ingests localization formats (TMX, XLIFF, PO, XLSX, CSV, JSON, HTML, IDML) and compiles them into a strict, unified structural data model. This enables not just parsing, but robust data manipulation, semantic extraction, and advanced translation memory features out-of-the-box. Lokit focuses on streaming and asynchronous processing rather than synchronous events using in-memory files.

This format type can be easily converted to JSON for interchange with other systems. I've made parsing and data transfers as native as possible by capturing all elements of traditional interchange formats in a common format structure. This allows for much better compatibility, especially in terms of segment matching and leveraging as it uses flattened strings as standard. Tags are preserved but as a common format, meaning the structure parsed from XLIFF will be the same as the structure parsed from HTML.

These legacy file formats have supported vendor-lock in for many year, making it difficult for any client to move to another system. Seeing that this is a major issue across the domain, something new is needed where vendors do not use hidden, legacy technology to lock in their clients. Localization deserves innovation.

Note: SDKs in other languages are coming soon.

Core Features

lokit provides a comprehensive suite of tools for managing localization data:

  • Native Structural Modeling: Converts disjointed interchange formats into a strict, unified Python Data class, ensuring complete type safety across your entire localization pipeline.
  • Advanced Matching Engine: Provides Exact Matching, Fuzzy Matching (via SequenceMatcher), and In-Context Exact (ICE) Matching leveraging previous and next segment context, as well as inline tag signatures.
  • Deep Sub-segment Extraction: Automatically parses and isolates inline tags, properties, and formatting markers, allowing for safe manipulation of text without corrupting code.
  • Semantic Querying: Easily traverse and filter translation units using complex predicates, exact ID lookups, or deep nested JSON path querying (where()).
  • Plural Support: Native extraction and structuring of pluralized translation units.
  • Universal Format Conversion: Instantly import and export between any supported format (e.g., TMX to JSON, HTML to XLIFF) with zero data loss.
  • Synchronous and Asynchronous Streaming: Process massive enterprise files natively using Python async generators to keep memory overhead to an absolute minimum.

Parsing Performance vs Translate-Toolkit

When dealing with enterprise-scale localization environments, parsing performance and memory efficiency are paramount. lokit is designed to be significantly leaner and faster than the industry standard.

In a stress-test benchmark on a 612 MB TMX file containing 557,058 segments, parsing to XLIFF and back into TMX over 3 consecutive iterations, lokit yielded the following comparative averages:

Library Avg Duration (s) Peak Memory (MB) Memory Efficiency
lokit (async) 57.5s 213.8 MB ~10.6x Less RAM
translate-toolkit 60.0s 2,275.7 MB ~2.3 GB

Because translate-toolkit loads whole files into string buffers and C-level DOM trees synchronously, its memory spikes to over 2.2 Gigabytes. lokit leverages generator-based async streaming, allowing it to complete the exact same workload using 10.6x less RAM, while operating slightly faster overall.

This memory safety allows for parallel processing of events, making it suitable for large-scale localization workflows and backend systems.

SDK Usage Reference

Lokit operates around a central BaseStructure dataclass model, which standardizes localization units and segments. This instructs better standardization and branching in a more language native way compared to XML based file formats. Parsing SDKs are added for both extraction and export tasks for localization interchange formats along with common file types.

Installation

Install lokit via pip:

pip install lokit-python

Basic Parsing and Conversion

Converting files synchronously is straightforward using the modular importers and exporters APIs.

from lokit.importers import import_tmx
from lokit.exporters import export_xliff

# Parse a localization file into a BaseStructure
document = import_tmx("path/to/source.tmx")

print(f"Loaded {len(document.data)} units")
print(f"Source Locale: {document.source_locale}")
print(f"Target Locale: {document.target_locale}")

# Export the BaseStructure out to a new format
export_xliff(document, "path/to/target.xliff")

Asynchronous Streaming for Massive Files

For files spanning hundreds of megabytes, parsing the entire DOM structure into memory is inefficient. Lokit supports stream-parsing natively.

import asyncio
from lokit.importers import import_tmx_async
from lokit.exporters.xliff import export_xliff_async
from lokit.data.structure import BaseStructure

async def process_large_file():
    units = {}
    
    # Stream the file in an asynchronous generator
    async for unit_id, unit_data in import_tmx_async("massive_file.tmx"):
        units[unit_id] = unit_data

    # Reconstruct the document safely
    doc = BaseStructure(
        source_locale="en_US", 
        target_locale="de_DE", 
        data=units
    )
    
    # Export asynchronously
    await export_xliff_async(doc, "massive_output.xliff")

asyncio.run(process_large_file())

Advanced Querying and Matching

The Lokit logic wrapper provides access to the powerful matching engine and data manipulation features.

from lokit.logic import Lokit

# Wrap a parsed document or path in the Lokit logic engine
engine = Lokit.parse("path/to/source.xliff")

# Query specific nested data structures
button_units = engine.where("extensions.component", "checkout_button")

# Perform fuzzy matching against translation memory
results = engine.fuzzy_find("Complete your purchase", limit=5, threshold=0.75)
for match in results:
    print(f"Match found: {match.unit_id} (Score: {match.score})")

# Perform strict In-Context Exact (ICE) matching
ice_match = engine.match(
    source="Submit",
    target_unit_id="submit_btn_1",
    previous_source="Enter your email",
    require_context=True
)

Supported Formats

  • TMX (Translation Memory eXchange)
  • XLIFF (XML Localization Interchange File Format)
  • PO/POT (Gettext Portable Object)
  • XLSX / CSV (Spreadsheets)
  • JSON (Key-Value nested localization trees)
  • HTML (Hypertext Markup)
  • IDML (InDesign Markup Language)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lokit_python-0.1.3.tar.gz (56.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lokit_python-0.1.3-cp313-cp313-win_amd64.whl (871.6 kB view details)

Uploaded CPython 3.13Windows x86-64

lokit_python-0.1.3-cp313-cp313-win32.whl (754.0 kB view details)

Uploaded CPython 3.13Windows x86

lokit_python-0.1.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (970.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

lokit_python-0.1.3-cp313-cp313-macosx_11_0_arm64.whl (773.7 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

lokit_python-0.1.3-cp313-cp313-macosx_10_13_x86_64.whl (831.1 kB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

lokit_python-0.1.3-cp312-cp312-win_amd64.whl (871.2 kB view details)

Uploaded CPython 3.12Windows x86-64

lokit_python-0.1.3-cp312-cp312-win32.whl (754.2 kB view details)

Uploaded CPython 3.12Windows x86

lokit_python-0.1.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (973.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

lokit_python-0.1.3-cp312-cp312-macosx_11_0_arm64.whl (775.9 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

lokit_python-0.1.3-cp312-cp312-macosx_10_13_x86_64.whl (832.6 kB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

File details

Details for the file lokit_python-0.1.3.tar.gz.

File metadata

  • Download URL: lokit_python-0.1.3.tar.gz
  • Upload date:
  • Size: 56.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lokit_python-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b51e38c668545891879e172760c60eb2b3a01b92a1fae18904bd32495703d878
MD5 c1b1688c6dd406b92aef1be225aa428d
BLAKE2b-256 8aabc68d60b50052782ee4633dfbc78a45a270840d3a0df8fc15f89ba60a6d0c

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.3.tar.gz:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.3-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.3-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 d5e7182ca9e0bd583b99781cd87042b36912ef85ea768dfe6aaf9a9b23ff22e7
MD5 4de49f70888de07e4fa0ab7e9fab918e
BLAKE2b-256 05324b3b3a50199c0635de7b65c108deb03478639e2978adcc3df4b4d57a8c01

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.3-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.3-cp313-cp313-win32.whl.

File metadata

  • Download URL: lokit_python-0.1.3-cp313-cp313-win32.whl
  • Upload date:
  • Size: 754.0 kB
  • Tags: CPython 3.13, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lokit_python-0.1.3-cp313-cp313-win32.whl
Algorithm Hash digest
SHA256 f473acf0d9af0a43a12ff8ee68cc80aeb91aff7a4e2706de9ab87606c3023d31
MD5 d8c9ab92cbb9bf42bdb9be960cb9521c
BLAKE2b-256 291b9ad74d0ce38c28fe219e6a3b9cdfa9a017b03c8d0f08890084533fb22d08

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.3-cp313-cp313-win32.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 46901b7e9f5f222963e230d32160eb645a42b7bde45fc40d8ca48dcf60d168c3
MD5 91e05511733f8e07ebb68ff0b4f5151a
BLAKE2b-256 99ab28ec27fd2d5786594734cb7e03efe99dd68a65233164f174cba9badff07b

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8eb82173acd03eeef2c7bd766b964abf85825ec8ecf69bed6197e8f05ceb3516
MD5 f0fc202f63b418e09a8489d56e2f1553
BLAKE2b-256 470f3546350f9bc83b1c71636698b154d36348021420a9309a8138f57ebe617b

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.3-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.3-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.3-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 630f19dfbbc0f8edb97fa8aef71b438aae2512cd5cf1841d083f55ffc20251a3
MD5 4ae3d95751a4ef7642d3881bfba18acd
BLAKE2b-256 2be8e069a8a61214f1ab748ceab889f591355c5a0f92061c3b5fd1a053f3f44a

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.3-cp313-cp313-macosx_10_13_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.3-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 b6580851cd142489afaf0b77a1924c5341d2800f386f9d4fc2322dea0a79ce1d
MD5 95f1c577add0010b7d2ff2f5802f1a2c
BLAKE2b-256 d78cd30460396d096523f7fb99190fb447f5e9a2a34a84f3cd9043ea977acfc3

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.3-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.3-cp312-cp312-win32.whl.

File metadata

  • Download URL: lokit_python-0.1.3-cp312-cp312-win32.whl
  • Upload date:
  • Size: 754.2 kB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lokit_python-0.1.3-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 cc69e6fdac84cda18e6d980027c55c08116fea4a9db3dd14a92304818213b487
MD5 4d98bb96506897ba91fa8e782a34d1bc
BLAKE2b-256 6de928c5d048d313b5de64f1820215d3486836802b28d6c9d2c2dd29630a11cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.3-cp312-cp312-win32.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fe43f01b96fa762d7de01d22a91635d6a49ddf2125bc37af729bc844a3900756
MD5 70f1bf6f83d5f9b23e066d2179e1f60f
BLAKE2b-256 8568917bd0424d249070fdfa0850f4c93929e372af689cd62205e95e694e56f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f968ac6af23f7c5168a53e957b5fb1b16e1522f45514b89fd4c256293842db8d
MD5 c042c6cd7b322412fc53fa773460cfa1
BLAKE2b-256 7db093144c2a52809e3fbab88669c4f7614794049aef6f0643cadab5db16ce3d

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.3-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.3-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.3-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 2d0ce45350f86f467cb50b245221c2a35a308b6a8c4ae38b930a6a44c42660d5
MD5 6a38ecd2b41dd63dd7dcda15d4e10800
BLAKE2b-256 56a2c61c480c0648336e633740b6cc84fc89f6b32fe1e16bc23db3eea0119e31

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.3-cp312-cp312-macosx_10_13_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page