Skip to main content

Build and query a local Japanese dictionary SQLite database.

Project description

Kotodex Package Guide

Kotodex builds a local SQLite database from Japanese dictionary sources and exposes a compact Python API for vocabulary, kanji, example sentence, and combined dictionary lookups.

Use it when you want Japanese dictionary data available locally without parsing XML or CSV source files at runtime.

Contents

Install

Default install, using the Sudachi core dictionary:

pip install kotodex

Optional Sudachi dictionary sizes:

pip install "kotodex[small]"
pip install "kotodex[core]"
pip install "kotodex[full]"

Development install from a checkout:

pip install -e ".[dev]"

Build A Database

Kotodex queries a local SQLite database. Build one before using the API.

kotodex update
kotodex rebuild --db /home/user/jisho.db
kotodex status --db /home/user/jisho.db

Default paths:

  • Source cache: ~/.cache/kotodex/sources
  • Database: ~/.local/share/kotodex/jisho.db

Use --db whenever you want to control the database location explicitly.

Python Quick Start

from kotodex import Jisho

with Jisho("/home/user/jisho.db") as j:
    print(j.imi("食べる"))
    print(j.kanji("食"))
    print(j.examples("食べる"))
    print(j.query("食べる"))

Jisho can also be managed manually:

from kotodex import Jisho

j = Jisho("/home/user/jisho.db")
try:
    result = j.imi("猫")
    print(result.meaning)
finally:
    j.close()

Vocabulary Lookups

Use imi() for meaning-focused vocabulary lookup. lookup() is an alias for imi().

with Jisho("/home/user/jisho.db") as j:
    result = j.imi("待つ")

print(result.found)
print(result.word)
print(result.reading)
print(result.meaning)
print(result.jlpt)
print(result.pos)

Return more ranked matches with zenbu=True:

with Jisho("/home/user/jisho.db") as j:
    result = j.imi("食べ%る", zenbu=True)

for entry in result.entries:
    print(entry.word, entry.reading, entry.meaning)

Use SQL wildcards in vocabulary lookups:

j.imi("猫%", zenbu=True)
j.imi("%する", zenbu=True)

Use romaji input:

result = j.imi("taberu", romaji=True)
print(result.word)
print(result.reading)
print(result.reading_romaji)

Include linked example sentences on vocabulary entries:

result = j.imi("食べる", examples=2)

for sentence in result.examples:
    print(sentence.japanese)
    print(sentence.english)

Derived verb forms are recovered when direct lookup fails:

result = j.imi("待てる")

print(result.word)             # 待つ
print(result.reading)          # まつ
print(result.surface_reading)  # まてる
print(result.origin)           # 待つ
print(result.derivation)       # potential

Direct matches always win. Derivation metadata is only present when Kotodex had to recover a dictionary base form.

Kanji Lookups

Look up kanji directly:

result = j.kanji("猫")

print(result.literal)
print(result.meaning)
print(result.readings)
print(result.kun_readings)
print(result.on_readings)
print(result.stroke_count)
print(result.jlpt)

Look up every kanji in a string:

result = j.kanji("日本語")

for kanji in result.results:
    print(kanji.literal, kanji.meaning)

Search by radicals:

j.kanji(radicals=["氵", "木"])
j.kanji(radicals=["氵", "木"], radical_match="any")

Search by stroke count:

j.kanji(strokes=9)
j.kanji(strokes=(8, 10))
j.kanji(strokes=range(8, 11))

Search by JLPT level:

j.kanji(jlpt="N5", zenbu=True)

Include examples and similar kanji:

result = j.kanji("食", examples=2)

print(result.first.examples)
print(result.similar)

Disable similar kanji if you only need the base record:

j.kanji("食", include_similar=False)

Example Sentences

Use examples() to retrieve Tatoeba-linked example sentences.

result = j.examples("食べる", limit=5)

for sentence in result.sentences:
    print(sentence.japanese)
    print(sentence.english)
    print(sentence.attribution)

Filter by difficulty or JLPT level:

j.examples("食べる", difficulty="N5")
j.examples("食べる", jlpt="N5")

Use romaji input:

j.examples("taberu", romaji=True)

Combined Queries

Use query() when you want vocabulary, kanji, names, example sentences, and provenance in one result.

result = j.query("食べる")

print(result.found)
print(result.vocabulary)
print(result.kanji)
print(result.names)
print(result.examples)
print(result.provenance)

Combined queries also inherit vocabulary derivation metadata:

result = j.query("待てる")

print(result.vocabulary[0].word)  # 待つ
print(result.origin)              # 待つ
print(result.derivation)          # potential
print(result.surface_reading)     # まてる

Result Objects

All result objects support:

result.to_dict()
result.to_json(indent=2)

ImiLookupResult

Common fields and shortcuts:

  • query: original query text
  • lemma: Sudachi dictionary form when available
  • found: True when at least one entry was returned
  • count: number of returned entries
  • entries: list of ImiEntry
  • first: first entry or None
  • word, reading, meaning, meanings, jlpt, pos: shortcuts for the first entry
  • origin, derivation, surface_reading: populated for derived-form recovery

ImiEntry

Vocabulary entry fields:

  • word
  • reading
  • definitions
  • meaning
  • jlpt
  • pos
  • reading_romaji
  • common
  • priority
  • example
  • examples
  • entry_id
  • ent_seq
  • source
  • extra

KanjiLookupResult

Common fields and shortcuts:

  • query
  • found
  • count
  • results
  • first
  • literal, meaning, meanings, readings
  • kun_readings, on_readings
  • stroke_count, strokes
  • jlpt
  • radicals
  • similar

KanjiEntry

Kanji entry fields:

  • literal
  • meanings
  • on_readings
  • kun_readings
  • radicals
  • stroke_count
  • grade
  • jlpt
  • freq
  • radical_classical
  • on_romaji
  • kun_romaji
  • similar
  • examples
  • extra

ExampleLookupResult

Example lookup fields:

  • query
  • found
  • count
  • sentences
  • difficulty
  • lemma
  • first

ExampleSentence

Sentence fields:

  • tatoeba_id
  • japanese
  • english
  • attribution
  • difficulty
  • japanese_romaji
  • source

QueryResult

Combined query fields:

  • query
  • lemma
  • found
  • vocabulary
  • kanji
  • names
  • examples
  • provenance
  • raw
  • origin
  • derivation
  • surface_reading

JSON Export

Use JSON export for API responses, notebooks, scripts, and debugging.

print(j.imi("食べる").to_json(indent=2))
print(j.kanji("食").to_json(indent=2))
print(j.examples("食べる").to_json(indent=2))
print(j.query("食べる").to_json(indent=2))

Disable escaped Japanese text if you pass your own JSON settings:

j.query("食べる").to_json(indent=2, ensure_ascii=False)

ensure_ascii=False is already the default.

CLI Usage

Download or refresh source files:

kotodex update
kotodex update --force
kotodex update --cache-dir /tmp/kotodex-sources

Build a database:

kotodex rebuild --db /home/user/jisho.db
kotodex rebuild --force --db /home/user/jisho.db
kotodex rebuild --db /home/user/jisho.db --cache-dir /tmp/kotodex-sources

Check database and source status:

kotodex status --db /home/user/jisho.db

Vocabulary lookup:

kotodex imi 食べる --db /home/user/jisho.db
kotodex imi taberu --romaji --db /home/user/jisho.db
kotodex imi '食べ%る' --zenbu --db /home/user/jisho.db
kotodex imi 食べる --examples 2 --json --db /home/user/jisho.db

Kanji lookup and search:

kotodex kanji  --db /home/user/jisho.db
kotodex kanji 日本語 --db /home/user/jisho.db
kotodex kanji --radical  --radical  --db /home/user/jisho.db
kotodex kanji --radical  --radical  --radical-match any --db /home/user/jisho.db
kotodex kanji --strokes 9 --db /home/user/jisho.db
kotodex kanji --strokes 8-10 --db /home/user/jisho.db
kotodex kanji --jlpt N5 --zenbu --json --db /home/user/jisho.db

Example sentences:

kotodex examples 食べる --limit 5 --db /home/user/jisho.db
kotodex examples 食べる --difficulty N5 --db /home/user/jisho.db
kotodex examples taberu --romaji --json --db /home/user/jisho.db

Combined query:

kotodex query 食べる --db /home/user/jisho.db
kotodex query 食べる --examples 10 --json --db /home/user/jisho.db

Lemma Helper

Use Sudachi-based normalization directly when you only need the dictionary form.

from kotodex.lemma import get_lemma

print(get_lemma("待てる"))
print(get_lemma("食べました"))

Choose a Sudachi dictionary size:

get_lemma("食べました", dict_type="small")
get_lemma("食べました", dict_type="core")
get_lemma("食べました", dict_type="full")

Provenance And Licensing

Kotodex stores source provenance in the generated database.

with Jisho("/home/user/jisho.db") as j:
    print(j.notice())
    print(j.provenance())

Important licensing notes:

  • Source-derived content comes from EDRDG and Tatoeba and has attribution obligations.
  • Generated databases may be subject to CC BY-SA 4.0 due to EDRDG-derived content.
  • Per-sentence Tatoeba attribution is exposed as ExampleSentence.attribution.
  • Use Jisho.notice() and Jisho.provenance() to inspect the generated database metadata.

Typical Workflows

Build once, query many times:

kotodex update
kotodex rebuild --db ./jisho.db
from kotodex import Jisho

with Jisho("./jisho.db") as j:
    print(j.imi("猫").meaning)

Create a local JSON lookup endpoint:

from kotodex import Jisho

def lookup_json(text: str) -> str:
    with Jisho("./jisho.db") as j:
        return j.query(text).to_json(indent=2)

Export study data:

from kotodex import Jisho

with Jisho("./jisho.db") as j:
    result = j.imi("食べ%る", zenbu=True)
    rows = [(entry.word, entry.reading, entry.meaning, entry.jlpt) for entry in result.entries]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kotodex-0.3.0.tar.gz (289.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kotodex-0.3.0-py3-none-any.whl (286.8 kB view details)

Uploaded Python 3

File details

Details for the file kotodex-0.3.0.tar.gz.

File metadata

  • Download URL: kotodex-0.3.0.tar.gz
  • Upload date:
  • Size: 289.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for kotodex-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fd2c850e7cca2f7a0fb0c829448fcf6ce718c5658f61a3c73fee6fb660239d40
MD5 6f03fbe02dceb917b891ae982e60701b
BLAKE2b-256 e5f40253d9e40dfa3e5b4062513fd43c393b2261cf02696d21820569fb4af3e7

See more details on using hashes here.

File details

Details for the file kotodex-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: kotodex-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 286.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for kotodex-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 018ec57eddcf2069ce067bcedc6ef2116bf244483b417d013927629be47c5ea4
MD5 fac0414c22ef5a26b805a56c8aa2afe2
BLAKE2b-256 bc7eabb66be6bd22b232a4f89e784810bce2062d01aeb4551fca21c4add3db92

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page