Skip to main content

Add your description here

Project description

lokit

[!WARNING] Beta Release: lokit is currently in Beta. The API is volatile and subject to rapid, breaking changes prior to the official V1 release.

lokit is a high-performance, strictly type-safe, and highly memory-efficient localization toolkit for Python. Supports Python 3.12+.

Unlike legacy tools that wrap around XML DOM element trees in-memory, lokit represents a shift away from XML-based localization interchange formats towards native language parsing. It ingests localization formats (TMX, XLIFF, PO, XLSX, CSV, JSON, HTML, IDML) and compiles them into a strict, unified structural data model. This enables not just parsing, but robust data manipulation, semantic extraction, and advanced translation memory features out-of-the-box. Lokit focuses on streaming and asynchronous processing rather than synchronous events using in-memory files.

This format type can be easily converted to JSON for interchange with other systems. I've made parsing and data transfers as native as possible by capturing all elements of traditional interchange formats in a common format structure. This allows for much better compatibility, especially in terms of segment matching and leveraging as it uses flattened strings as standard. Tags are preserved but as a common format, meaning the structure parsed from XLIFF will be the same as the structure parsed from HTML.

These legacy file formats have supported vendor-lock in for many year, making it difficult for any client to move to another system. Seeing that this is a major issue across the domain, something new is needed where vendors do not use hidden, legacy technology to lock in their clients. Localization deserves innovation.

Note: SDKs in other languages are coming soon.

Core Features

lokit provides a comprehensive suite of tools for managing localization data:

  • Native Structural Modeling: Converts disjointed interchange formats into a strict, unified Python Data class, ensuring complete type safety across your entire localization pipeline.
  • Advanced Matching Engine: Provides Exact Matching, Fuzzy Matching (via SequenceMatcher), and In-Context Exact (ICE) Matching leveraging previous and next segment context, as well as inline tag signatures.
  • Deep Sub-segment Extraction: Automatically parses and isolates inline tags, properties, and formatting markers, allowing for safe manipulation of text without corrupting code.
  • Semantic Querying: Easily traverse and filter translation units using complex predicates, exact ID lookups, or deep nested JSON path querying (where()).
  • Plural Support: Native extraction and structuring of pluralized translation units.
  • Universal Format Conversion: Instantly import and export between any supported format (e.g., TMX to JSON, HTML to XLIFF) with zero data loss.
  • Synchronous and Asynchronous Streaming: Process massive enterprise files natively using Python async generators to keep memory overhead to an absolute minimum.

Parsing Performance vs Translate-Toolkit

When dealing with enterprise-scale localization environments, parsing performance and memory efficiency are paramount. lokit is designed to be significantly leaner and faster than the industry standard.

In a stress-test benchmark on a 612 MB TMX file containing 557,058 segments, parsing to XLIFF and back into TMX over 3 consecutive iterations, lokit yielded the following comparative averages:

Library Avg Duration (s) Peak Memory (MB) Memory Efficiency
lokit (async) 57.5s 213.8 MB ~10.6x Less RAM
translate-toolkit 60.0s 2,275.7 MB ~2.3 GB

Because translate-toolkit loads whole files into string buffers and C-level DOM trees synchronously, its memory spikes to over 2.2 Gigabytes. lokit leverages generator-based async streaming, allowing it to complete the exact same workload using 10.6x less RAM, while operating slightly faster overall.

This memory safety allows for parallel processing of events, making it suitable for large-scale localization workflows and backend systems.

SDK Usage Reference

Lokit operates around a central BaseStructure dataclass model, which standardizes localization units and segments. This instructs better standardization and branching in a more language native way compared to XML based file formats. Parsing SDKs are added for both extraction and export tasks for localization interchange formats along with common file types.

Installation

Install lokit via pip:

pip install lokit-python

Basic Parsing and Conversion

Converting files synchronously is straightforward using the modular importers and exporters APIs.

from lokit.importers import import_tmx
from lokit.exporters import export_xliff

# Parse a localization file into a BaseStructure
document = import_tmx("path/to/source.tmx")

print(f"Loaded {len(document.data)} units")
print(f"Source Locale: {document.source_locale}")
print(f"Target Locale: {document.target_locale}")

# Export the BaseStructure out to a new format
export_xliff(document, "path/to/target.xliff")

Asynchronous Streaming for Massive Files

For files spanning hundreds of megabytes, parsing the entire DOM structure into memory is inefficient. Lokit supports stream-parsing natively.

import asyncio
from lokit.importers import import_tmx_async
from lokit.exporters.xliff import export_xliff_async
from lokit.data.structure import BaseStructure

async def process_large_file():
    units = {}
    
    # Stream the file in an asynchronous generator
    async for unit_id, unit_data in import_tmx_async("massive_file.tmx"):
        units[unit_id] = unit_data

    # Reconstruct the document safely
    doc = BaseStructure(
        source_locale="en_US", 
        target_locale="de_DE", 
        data=units
    )
    
    # Export asynchronously
    await export_xliff_async(doc, "massive_output.xliff")

asyncio.run(process_large_file())

Advanced Querying and Matching

The Lokit logic wrapper provides access to the powerful matching engine and data manipulation features.

from lokit.logic import Lokit

# Wrap a parsed document or path in the Lokit logic engine
engine = Lokit.parse("path/to/source.xliff")

# Query specific nested data structures
button_units = engine.where("extensions.component", "checkout_button")

# Perform fuzzy matching against translation memory
results = engine.fuzzy_find("Complete your purchase", limit=5, threshold=0.75)
for match in results:
    print(f"Match found: {match.unit_id} (Score: {match.score})")

# Perform strict In-Context Exact (ICE) matching
ice_match = engine.match(
    source="Submit",
    target_unit_id="submit_btn_1",
    previous_source="Enter your email",
    require_context=True
)

Supported Formats

  • TMX (Translation Memory eXchange)
  • XLIFF (XML Localization Interchange File Format)
  • PO/POT (Gettext Portable Object)
  • XLSX / CSV (Spreadsheets)
  • JSON (Key-Value nested localization trees)
  • HTML (Hypertext Markup)
  • IDML (InDesign Markup Language)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lokit_python-0.1.0.tar.gz (46.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lokit_python-0.1.0-cp313-cp313-win_amd64.whl (820.4 kB view details)

Uploaded CPython 3.13Windows x86-64

lokit_python-0.1.0-cp313-cp313-win32.whl (706.2 kB view details)

Uploaded CPython 3.13Windows x86

lokit_python-0.1.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (919.4 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

lokit_python-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (738.9 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

lokit_python-0.1.0-cp313-cp313-macosx_10_13_x86_64.whl (794.4 kB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

lokit_python-0.1.0-cp312-cp312-win_amd64.whl (820.0 kB view details)

Uploaded CPython 3.12Windows x86-64

lokit_python-0.1.0-cp312-cp312-win32.whl (706.7 kB view details)

Uploaded CPython 3.12Windows x86

lokit_python-0.1.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (924.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

lokit_python-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (740.5 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

lokit_python-0.1.0-cp312-cp312-macosx_10_13_x86_64.whl (796.4 kB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

File details

Details for the file lokit_python-0.1.0.tar.gz.

File metadata

  • Download URL: lokit_python-0.1.0.tar.gz
  • Upload date:
  • Size: 46.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lokit_python-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c2a6ee0dff9cfc2af3189ee920d6c412aca7a7ba719a2f7d9a3ece3765d3a897
MD5 cc15be9a76360625a0ffcf0cab382cf1
BLAKE2b-256 2944ae68902cf6e53309bfbe88781a46fe7d4cb6b76f33b0684c5b9abbde8356

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.0.tar.gz:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 6c90232216c89d22e87062a47773912d5c75eecc82c856259763b6473bc6137e
MD5 14d67ef4032e1e3533293239a3b3847b
BLAKE2b-256 196588fa0591f896bdc6dd860bb801db3fa09fbffe88c2884eede83f305ec362

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.0-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.0-cp313-cp313-win32.whl.

File metadata

  • Download URL: lokit_python-0.1.0-cp313-cp313-win32.whl
  • Upload date:
  • Size: 706.2 kB
  • Tags: CPython 3.13, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lokit_python-0.1.0-cp313-cp313-win32.whl
Algorithm Hash digest
SHA256 5a5111920955f3636961eb2514aa67d0110d5e2dd7c1d73fcc587fb48642b4e5
MD5 def22065237f2a1497434ad9d74b8f15
BLAKE2b-256 ba16c2a6049324e33ac80758caf05eb27fafbc8e689695e98da6876666cbf4aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.0-cp313-cp313-win32.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 984010febabac5c29c69c8bc555701e514bda21ae91e9535bbe64e76063db338
MD5 44bc9bb2b9685eca3091bbdb831ac300
BLAKE2b-256 5371fb1406dc554c43e8f6525e6e80ddb9f0b6c734c3dd1c5ec7e17879d850aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 44bdc60c3d0d2f7b017c93d78ee222b7b498d1e1d7d779b07b8440be5d0a12d8
MD5 95b3b16b9dceb3adb825ea736cccfeec
BLAKE2b-256 df9009987f0373d7f6ce7b324255a227f619e635ef76ab3b31d4bfcf6723b89b

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.0-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.0-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 959c3215534b52b9474054816b43da00e863ea9911a65998f8a35384a3b57441
MD5 7c4314d64bc31d38791d9e03c35d6583
BLAKE2b-256 d54a4437b9d8470f835cc57b703b81d3e80feb070adf690d76c19185a77e6205

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.0-cp313-cp313-macosx_10_13_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 5428e431e70f2ebc5f3b8b4d02df2409a1798355971fa42c4fe16f51d60647e3
MD5 5ea0d757541d3e7e0d70511a5da8d52e
BLAKE2b-256 3cb161022405cf62c6893bd57076ec38d56fc988d58e90a4e2c6a82ea675f7be

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.0-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.0-cp312-cp312-win32.whl.

File metadata

  • Download URL: lokit_python-0.1.0-cp312-cp312-win32.whl
  • Upload date:
  • Size: 706.7 kB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lokit_python-0.1.0-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 554cc2887f7d2b47004d7c0127c4e84035211e6f3a25474ef606a5add5092d11
MD5 b35da63d638ac6e44c3dd7b7e4aff023
BLAKE2b-256 0787a69cb7154117232cc748ad9cd579e7ecc7f22d684d8891d229619414bb57

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.0-cp312-cp312-win32.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ca36282c7f0497a5b0ada9379ebc959105efcabf567b748f8202925bdb8e1a1e
MD5 0711b7dc46d2d21f7466054d09612c8f
BLAKE2b-256 9ece7ea2bacfd031e66d14df75582c203bd6628b3a903baf7e53b79a0626d2d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 91193d423c5cd6dcd3d76c2d004f1f412769bb6860d6c3122526f3a496d8f1f9
MD5 893f48af589179691460e37c6ef788d9
BLAKE2b-256 80a414dcdb0cab0fbf6c432c5c8da5bf70c8cffbdb66f116c1fcce2be8249316

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokit_python-0.1.0-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for lokit_python-0.1.0-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 c2d901997e35fd78f5fc1af589d3ca3a11ece68ddebe9b5753aff2a541fab27e
MD5 b47b2489b9864f83d16b053b4be0a1f5
BLAKE2b-256 5cc5f20d77ce82834fe61c565bd21e37f50417a2c8b7ff12715929c7d387da0b

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokit_python-0.1.0-cp312-cp312-macosx_10_13_x86_64.whl:

Publisher: publish.yml on ciarandarby/lokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page