Skip to main content

Lattice Path Edit Distance and romanization-aware string comparison

Project description

mòine

mòine is a Python and Rust library for romanization-aware string comparison.

It implements Lattice Path Edit Distance (Kaji, 2023), a distance metric that compares strings through possible reading paths rather than only through visible surface characters.

This is useful when romanized input and written Japanese or Chinese look far apart as strings, but stay close in reading space.

>>> import moine

>>> moine.distance("moine", "モイニャ", lang="ja")
2

>>> moine.distance("もいにゃ", "モイニャ", lang="ja")
0

>>> moine.distance("weishiji", "威士忌", lang="zh")
0

>>> moine.distance("布納哈奔", "布納哈本", lang="zh")
0

The project is inspired by Nobuhiro Kaji's EMNLP 2023 Industry Track paper, "Lattice Path Edit Distance: A Romanization-aware Edit Distance for Extracting Misspelling-Correction Pairs from Japanese Search Query Logs".

Name

The project name comes from Moine, a peated malt from Bunnahabhain and one of the developer's favorite Scotch whiskies. In Japanese, the name has several plausible katakana renderings, such as モイニャ, モーイン, and モアンヌ, which makes it a fitting name for a project about readings, spelling variation, and ambiguity in input sequences.

Features

  • Japanese comparison with UniDic-CWJ-derived reading artifacts.
  • Chinese comparison with CC-CEDICT-derived no-tone pinyin artifacts.
  • Plain string Levenshtein-compatible distance helpers.
  • Lattice-aware Damerau-Levenshtein distance for adjacent transpositions.
  • Normalized similarity / ratio helpers in 0.0..=1.0.
  • RapidFuzz-inspired APIs such as cdist and partial matching helpers.

When To Use

mòine is best used after another system has produced candidates: lexical retrieval, n-gram search, BM25, embeddings, a product catalog, or an entity list. Use mòine to rescore those candidates in reading space.

Good fit Poor fit
Romanized, kana, kanji, or pinyin input mixed together Same-script typo matching only
Query correction, search suggest, and candidate reranking Replacing a full search engine
Japanese and Mandarin pinyin Chinese entity matching Cantonese/Jyutping or arbitrary languages
Pipelines that can download dictionary artifacts explicitly Install-only workflows with no data step
Hundreds or thousands of candidates after retrieval Brute-force scoring over a whole corpus

Installation

Install the Python package:

pip install moine
uv pip install moine

Install the Rust command-line tool:

cargo install moine

The packages do not bundle dictionary data. Download the language artifacts you need explicitly:

uv run python -m moine download ja
uv run python -m moine download zh

moine download ja
moine download zh

Quick Start

Use the top-level Python API when you want mòine to load the default dictionary for a language:

import moine

print(moine.distance("もいにゃ", "モイニャ", lang="ja"))  # 0
print(moine.ratio("ピィート", "ピート", lang="ja"))  # 0.7142857142857143
print(moine.partial_ratio("ウイスキー", "ういすきーをのんでいます", lang="ja"))  # 1.0
print(moine.distance("weishiji", "威士忌", lang="zh"))  # 0

Load a dictionary explicitly when you want to control startup cost or artifact location:

import moine

dictionary = moine.load_dict(lang="ja")
moine.set_default_dictionary(dictionary)

print(moine.distance("もいにゃ", "モイニャ", lang="ja"))  # 0

Use cdist for query-by-choice matrices:

import moine

scores = moine.cdist(
    ["もいにゃ", "ぴーと", "ピィート"],
    ["モイニャ", "ピート", "ピーと", "ピィート"],
    lang="ja",
    metric="damerau_distance",
    score_cutoff=1,
)

For search or entity matching, generate candidates with your existing system and use mòine as a reading-aware reranker:

import moine

query = "moine"
candidates = ["モイニャ", "モーイン", "モアンヌ", "ストイーシャ"]

scores = moine.cdist(
    [query],
    candidates,
    lang="ja",
    metric="distance",
)[0]

ranked = sorted(zip(candidates, scores), key=lambda item: item[1])
print(ranked)
# [('モイニャ', 2), ('モーイン', 2), ('モアンヌ', 3), ('ストイーシャ', 7)]

Score interpretation is intentionally simple: distance=0 means the best reading paths are identical, distance metrics are smaller-is-better, ratio and normalized_similarity are in 0.0..=1.0 and larger-is-better, and score_cutoff filters in the RapidFuzz style.

Command Line

Most users only need the public runtime commands:

moine download ja
moine download zh
moine list
moine where
moine compare --left "もいにゃ" --right "モイニャ" \
  --artifact-metadata /path/to/moine-unidic-cwj-202512/metadata.yaml
moine chinese-compare --left weishiji --right 威士忌 \
  --artifact-metadata /path/to/moine-cedict-20260520/metadata.yaml

The artifact bundle, verification, archive, and diagnostic commands are maintainer-facing tools for producing and checking release assets. They are documented in docs/development.md and docs/release_process.md.

Documentation

Developer and maintainer notes live under docs/, starting with docs/development.md and docs/release_process.md. See CONTRIBUTING.md before opening pull requests.

How It Differs From RapidFuzz

RapidFuzz is the better fit when both inputs should be compared directly as surface strings and you need a broad set of highly optimized fuzzy-matching scorers. mòine focuses on a narrower problem: comparing strings through possible reading paths before edit distance is computed.

Limitations

  • mòine does not reproduce the original paper's private search-query-log evaluation.
  • Dictionary-backed comparison requires separately distributed dictionary artifacts.
  • UniDic matching intentionally does not use MeCab/Viterbi costs.
  • Chinese support is Mandarin pinyin only; it does not model Cantonese/Jyutping or non-Mandarin readings.
  • processor, score_hint, NumPy dtype options, and worker parallelism are not part of the initial cdist API.

Reference

[!CAUTION] This project is not the official implementation by the paper author.

@inproceedings{kaji-2023-lattice,
    title = "Lattice Path Edit Distance: A {R}omanization-aware Edit Distance for Extracting Misspelling-Correction Pairs from {J}apanese Search Query Logs",
    author = "Kaji, Nobuhiro",
    editor = "Wang, Mingxuan  and
      Zitouni, Imed",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-industry.24/",
    doi = "10.18653/v1/2023.emnlp-industry.24",
    pages = "233--242",
    abstract = "Edit distance has been successfully used to extract training data, i.e., misspelling-correction pairs, of spelling correction models from search query logs in languages including English. However, the success does not readily apply to Japanese, where misspellings are often dissimilar to correct spellings due to the romanization-based input methods. To address this problem, we introduce lattice path edit distance, which utilizes romanization lattices to efficiently consider all possible romanized forms of input strings. Empirical experiments using Japanese search query logs demonstrated that the lattice path edit distance outperformed baseline methods including the standard edit distance combined with an existing transliterator and morphological analyzer. A training data collection pipeline that uses the lattice path edit distance has been deployed in production at our search engine for over a year."
}

License

mòine source code is licensed under either MIT or Apache-2.0. See LICENSE-MIT and LICENSE-APACHE.

Dictionary data is separate. UniDic-derived and CC-CEDICT-derived artifacts carry their own license and attribution metadata, and should keep dictionary license information separate from the mòine source-code license. See THIRD_PARTY_NOTICES.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moine-0.1.0.tar.gz (95.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

moine-0.1.0-cp314-cp314-win_amd64.whl (535.9 kB view details)

Uploaded CPython 3.14Windows x86-64

moine-0.1.0-cp314-cp314-manylinux_2_28_x86_64.whl (703.8 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ x86-64

moine-0.1.0-cp314-cp314-macosx_11_0_arm64.whl (631.4 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

moine-0.1.0-cp313-cp313-win_amd64.whl (535.9 kB view details)

Uploaded CPython 3.13Windows x86-64

moine-0.1.0-cp313-cp313-manylinux_2_28_x86_64.whl (703.5 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

moine-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (631.0 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

moine-0.1.0-cp312-cp312-win_amd64.whl (536.2 kB view details)

Uploaded CPython 3.12Windows x86-64

moine-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl (705.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

moine-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (631.3 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

moine-0.1.0-cp311-cp311-win_amd64.whl (539.9 kB view details)

Uploaded CPython 3.11Windows x86-64

moine-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl (709.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

moine-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (634.1 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file moine-0.1.0.tar.gz.

File metadata

  • Download URL: moine-0.1.0.tar.gz
  • Upload date:
  • Size: 95.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for moine-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ba0ac585c98f60b8812ad09d37701d6d23107b17daa06753b7662f1181971d5b
MD5 697cc67578742e34181dfe2ce69e0dab
BLAKE2b-256 b43e21970051e7431002525986babc04f7fa43b287cfa58947f334244da31d89

See more details on using hashes here.

Provenance

The following attestation bundles were made for moine-0.1.0.tar.gz:

Publisher: release.yml on tagucci/moine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file moine-0.1.0-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: moine-0.1.0-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 535.9 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for moine-0.1.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 73a08c18d0254cef0ba2666c20c4fc6a8712b1a905e5659677f7cefcc29e7072
MD5 f8ea4cb6226269621daa21ea8c0a196c
BLAKE2b-256 d4b71f6c9bea1260dc38ca24c0edaa1a58786f392801833f79e3d36187405acb

See more details on using hashes here.

Provenance

The following attestation bundles were made for moine-0.1.0-cp314-cp314-win_amd64.whl:

Publisher: release.yml on tagucci/moine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file moine-0.1.0-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for moine-0.1.0-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4803c9364480a121bd65aba7aaf225ad8c35b32cb93b958fbba14ce118e1498f
MD5 16f69058b0a42bf9e1c88d0d94378454
BLAKE2b-256 c2bd804009dc9cae1da1372d0be01c5321b366bc5c57739807b5a92cff41418c

See more details on using hashes here.

Provenance

The following attestation bundles were made for moine-0.1.0-cp314-cp314-manylinux_2_28_x86_64.whl:

Publisher: release.yml on tagucci/moine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file moine-0.1.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for moine-0.1.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 94c5343a0d58141f6b00f624981d3b709877c8cecfea111943a50183c7c1d248
MD5 76e231c8d883cf7e8b8cedbf3aa76c85
BLAKE2b-256 cb89ab5dae8efab941e4663e653e12c69ab6de51ef6fc00780c695897e3a5286

See more details on using hashes here.

Provenance

The following attestation bundles were made for moine-0.1.0-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on tagucci/moine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file moine-0.1.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: moine-0.1.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 535.9 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for moine-0.1.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 f1ddf46061cb136fc7b09818d6947c6547c397327e13e0dffa130ccd36b9194a
MD5 df8de9c2645409389ad50c82f3818598
BLAKE2b-256 df140fe5a671be1d7b67b806cdadaa39e574b21cfe0d82e5c39cba1ee1f34f06

See more details on using hashes here.

Provenance

The following attestation bundles were made for moine-0.1.0-cp313-cp313-win_amd64.whl:

Publisher: release.yml on tagucci/moine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file moine-0.1.0-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for moine-0.1.0-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e3f8ae5d929e26ea0734e2ba50cfc6f14b2b90f3ea0aa68ff715e1c4c5606bee
MD5 371a5eb50fefa1da0b1fbdba3d13d0aa
BLAKE2b-256 7a5114d1b04a75e6f35446c39e1a8b0d850c1a2d76a785f7d082bb45ef279306

See more details on using hashes here.

Provenance

The following attestation bundles were made for moine-0.1.0-cp313-cp313-manylinux_2_28_x86_64.whl:

Publisher: release.yml on tagucci/moine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file moine-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for moine-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 40410f9dd4ce6e31ce4d72e561b435835172533d7d96c8eea70f1f9d9525f085
MD5 4a26d728e24f200d9415518bea249608
BLAKE2b-256 b51dfb2f527909c07510129a59f9b0c0cea613abc5750c6631c889ae3586aaa1

See more details on using hashes here.

Provenance

The following attestation bundles were made for moine-0.1.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on tagucci/moine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file moine-0.1.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: moine-0.1.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 536.2 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for moine-0.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 08f157b5e51769b32db16f7cee1faa8af99593c6915a950045e41e2a5465d0a8
MD5 37a48927e7b3c15457c25200d6904f97
BLAKE2b-256 2759728196d7ec9fbdd4913587786041d71c2d4d2fb8001581e881a480b5f056

See more details on using hashes here.

Provenance

The following attestation bundles were made for moine-0.1.0-cp312-cp312-win_amd64.whl:

Publisher: release.yml on tagucci/moine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file moine-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for moine-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 eaa8b1fdd99f9cc7bc00fe3211a12b73da8daf34836693f6785e7e1f28c5ae25
MD5 8e55d281885a4220799b941f7c876e66
BLAKE2b-256 3ea522468b6d352d88b9539a2efe21c499d347a4b9dd2f4ad5fff2984779d593

See more details on using hashes here.

Provenance

The following attestation bundles were made for moine-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: release.yml on tagucci/moine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file moine-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for moine-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e375c2a1d7e4212ec9efe34b8984aea77f12e16c94897442d5cf6c07963006f8
MD5 ef4ad10fe4236f7f243638440038efc4
BLAKE2b-256 0a72d8b6252ecf3517fa05f336be840dba565bdc0bacf9d4df11e4a2d0dab96f

See more details on using hashes here.

Provenance

The following attestation bundles were made for moine-0.1.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on tagucci/moine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file moine-0.1.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: moine-0.1.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 539.9 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for moine-0.1.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 47fbcfaa34eb7ca2618510bcc5dfd7106182380916b6ade1c3bd38fa94d33e93
MD5 e4168820931e27cf14a02070ba723997
BLAKE2b-256 a21b046e89b728cbff5fa2f7c92c724b1ed845f5babbb995d9758c209bae11c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for moine-0.1.0-cp311-cp311-win_amd64.whl:

Publisher: release.yml on tagucci/moine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file moine-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for moine-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 609daf5070aa57ce686d4c8d8ab7a46adfeedfe6b36dc5477c5bbd2df56e7ed5
MD5 4ce735649b808235a55c3b3069963a6b
BLAKE2b-256 3b62efab24ce2ef6c644780e30228af4410177dcae8856b4e166c678c6181d0b

See more details on using hashes here.

Provenance

The following attestation bundles were made for moine-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: release.yml on tagucci/moine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file moine-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for moine-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e0914746fd0917c528100e20bcf6f0eb1031985d01dd6bb8ab293825ab3ba752
MD5 f871bbe8d20a8d0395debc0cfd05138d
BLAKE2b-256 de5c0e9981d8585b39dd8ee14acf5d185607bd4c8aee359c680ae33b43f2a28f

See more details on using hashes here.

Provenance

The following attestation bundles were made for moine-0.1.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on tagucci/moine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page