Build and query a local Japanese dictionary SQLite database.
Project description
Kotodex Package Guide
Kotodex builds a local SQLite database from Japanese dictionary sources and exposes a compact Python API for vocabulary, kanji, example sentence, and combined dictionary lookups.
Use it when you want Japanese dictionary data available locally without parsing XML or CSV source files at runtime.
Contents
- Install
- Build A Database
- Python Quick Start
- Vocabulary Lookups
- Kanji Lookups
- Example Sentences
- Combined Queries
- Result Objects
- JSON Export
- CLI Usage
- Lemma Helper
- Provenance And Licensing
Install
Default install, using the Sudachi core dictionary:
pip install kotodex
Optional Sudachi dictionary sizes:
pip install "kotodex[small]"
pip install "kotodex[core]"
pip install "kotodex[full]"
Development install from a checkout:
pip install -e ".[dev]"
Build A Database
Kotodex queries a local SQLite database. Build one before using the API.
kotodex update
kotodex rebuild --db /home/user/jisho.db
kotodex status --db /home/user/jisho.db
Default paths:
- Source cache:
~/.cache/kotodex/sources - Database:
~/.local/share/kotodex/jisho.db
Use --db whenever you want to control the database location explicitly.
Python Quick Start
from kotodex import Jisho
with Jisho("/home/user/jisho.db") as j:
print(j.imi("食べる"))
print(j.kanji("食"))
print(j.examples("食べる"))
print(j.query("食べる"))
Jisho can also be managed manually:
from kotodex import Jisho
j = Jisho("/home/user/jisho.db")
try:
result = j.imi("猫")
print(result.meaning)
finally:
j.close()
Vocabulary Lookups
Use imi() for meaning-focused vocabulary lookup. lookup() is an alias for imi().
with Jisho("/home/user/jisho.db") as j:
result = j.imi("待つ")
print(result.found)
print(result.word)
print(result.reading)
print(result.meaning)
print(result.jlpt)
print(result.pos)
Return more ranked matches with zenbu=True:
with Jisho("/home/user/jisho.db") as j:
result = j.imi("食べ%る", zenbu=True)
for entry in result.entries:
print(entry.word, entry.reading, entry.meaning)
Use SQL wildcards in vocabulary lookups:
j.imi("猫%", zenbu=True)
j.imi("%する", zenbu=True)
Use romaji input:
result = j.imi("taberu", romaji=True)
print(result.word)
print(result.reading)
print(result.reading_romaji)
Include linked example sentences on vocabulary entries:
result = j.imi("食べる", examples=2)
for sentence in result.examples:
print(sentence.japanese)
print(sentence.english)
Derived verb forms are recovered when direct lookup fails:
result = j.imi("待てる")
print(result.word) # 待つ
print(result.reading) # まつ
print(result.surface_reading) # まてる
print(result.origin) # 待つ
print(result.derivation) # potential
Direct matches always win. Derivation metadata is only present when Kotodex had to recover a dictionary base form.
Kanji Lookups
Look up kanji directly:
result = j.kanji("猫")
print(result.literal)
print(result.meaning)
print(result.readings)
print(result.kun_readings)
print(result.on_readings)
print(result.stroke_count)
print(result.jlpt)
Look up every kanji in a string:
result = j.kanji("日本語")
for kanji in result.results:
print(kanji.literal, kanji.meaning)
Search by radicals:
j.kanji(radicals=["氵", "木"])
j.kanji(radicals=["氵", "木"], radical_match="any")
Search by stroke count:
j.kanji(strokes=9)
j.kanji(strokes=(8, 10))
j.kanji(strokes=range(8, 11))
Search by JLPT level:
j.kanji(jlpt="N5", zenbu=True)
Include examples and similar kanji:
result = j.kanji("食", examples=2)
print(result.first.examples)
print(result.similar)
Disable similar kanji if you only need the base record:
j.kanji("食", include_similar=False)
Example Sentences
Use examples() to retrieve Tatoeba-linked example sentences.
result = j.examples("食べる", limit=5)
for sentence in result.sentences:
print(sentence.japanese)
print(sentence.english)
print(sentence.attribution)
Filter by difficulty or JLPT level:
j.examples("食べる", difficulty="N5")
j.examples("食べる", jlpt="N5")
Use romaji input:
j.examples("taberu", romaji=True)
Combined Queries
Use query() when you want vocabulary, kanji, names, example sentences, and provenance in one result.
result = j.query("食べる")
print(result.found)
print(result.vocabulary)
print(result.kanji)
print(result.names)
print(result.examples)
print(result.provenance)
Combined queries also inherit vocabulary derivation metadata:
result = j.query("待てる")
print(result.vocabulary[0].word) # 待つ
print(result.origin) # 待つ
print(result.derivation) # potential
print(result.surface_reading) # まてる
Result Objects
All result objects support:
result.to_dict()
result.to_json(indent=2)
ImiLookupResult
Common fields and shortcuts:
query: original query textlemma: Sudachi dictionary form when availablefound:Truewhen at least one entry was returnedcount: number of returned entriesentries: list ofImiEntryfirst: first entry orNoneword,reading,meaning,meanings,jlpt,pos: shortcuts for the first entryorigin,derivation,surface_reading: populated for derived-form recovery
ImiEntry
Vocabulary entry fields:
wordreadingdefinitionsmeaningjlptposreading_romajicommonpriorityexampleexamplesentry_ident_seqsourceextra
KanjiLookupResult
Common fields and shortcuts:
queryfoundcountresultsfirstliteral,meaning,meanings,readingskun_readings,on_readingsstroke_count,strokesjlptradicalssimilar
KanjiEntry
Kanji entry fields:
literalmeaningson_readingskun_readingsradicalsstroke_countgradejlptfreqradical_classicalon_romajikun_romajisimilarexamplesextra
ExampleLookupResult
Example lookup fields:
queryfoundcountsentencesdifficultylemmafirst
ExampleSentence
Sentence fields:
tatoeba_idjapaneseenglishattributiondifficultyjapanese_romajisource
QueryResult
Combined query fields:
querylemmafoundvocabularykanjinamesexamplesprovenanceraworiginderivationsurface_reading
JSON Export
Use JSON export for API responses, notebooks, scripts, and debugging.
print(j.imi("食べる").to_json(indent=2))
print(j.kanji("食").to_json(indent=2))
print(j.examples("食べる").to_json(indent=2))
print(j.query("食べる").to_json(indent=2))
Disable escaped Japanese text if you pass your own JSON settings:
j.query("食べる").to_json(indent=2, ensure_ascii=False)
ensure_ascii=False is already the default.
CLI Usage
Download or refresh source files:
kotodex update
kotodex update --force
kotodex update --cache-dir /tmp/kotodex-sources
Build a database:
kotodex rebuild --db /home/user/jisho.db
kotodex rebuild --force --db /home/user/jisho.db
kotodex rebuild --db /home/user/jisho.db --cache-dir /tmp/kotodex-sources
Check database and source status:
kotodex status --db /home/user/jisho.db
Vocabulary lookup:
kotodex imi 食べる --db /home/user/jisho.db
kotodex imi taberu --romaji --db /home/user/jisho.db
kotodex imi '食べ%る' --zenbu --db /home/user/jisho.db
kotodex imi 食べる --examples 2 --json --db /home/user/jisho.db
Kanji lookup and search:
kotodex kanji 食 --db /home/user/jisho.db
kotodex kanji 日本語 --db /home/user/jisho.db
kotodex kanji --radical 氵 --radical 木 --db /home/user/jisho.db
kotodex kanji --radical 氵 --radical 木 --radical-match any --db /home/user/jisho.db
kotodex kanji --strokes 9 --db /home/user/jisho.db
kotodex kanji --strokes 8-10 --db /home/user/jisho.db
kotodex kanji --jlpt N5 --zenbu --json --db /home/user/jisho.db
Example sentences:
kotodex examples 食べる --limit 5 --db /home/user/jisho.db
kotodex examples 食べる --difficulty N5 --db /home/user/jisho.db
kotodex examples taberu --romaji --json --db /home/user/jisho.db
Combined query:
kotodex query 食べる --db /home/user/jisho.db
kotodex query 食べる --examples 10 --json --db /home/user/jisho.db
Lemma Helper
Use Sudachi-based normalization directly when you only need the dictionary form.
from kotodex.lemma import get_lemma
print(get_lemma("待てる"))
print(get_lemma("食べました"))
Choose a Sudachi dictionary size:
get_lemma("食べました", dict_type="small")
get_lemma("食べました", dict_type="core")
get_lemma("食べました", dict_type="full")
Provenance And Licensing
Kotodex stores source provenance in the generated database.
with Jisho("/home/user/jisho.db") as j:
print(j.notice())
print(j.provenance())
Important licensing notes:
- Source-derived content comes from EDRDG and Tatoeba and has attribution obligations.
- Generated databases may be subject to CC BY-SA 4.0 due to EDRDG-derived content.
- Per-sentence Tatoeba attribution is exposed as
ExampleSentence.attribution. - Use
Jisho.notice()andJisho.provenance()to inspect the generated database metadata.
Typical Workflows
Build once, query many times:
kotodex update
kotodex rebuild --db ./jisho.db
from kotodex import Jisho
with Jisho("./jisho.db") as j:
print(j.imi("猫").meaning)
Create a local JSON lookup endpoint:
from kotodex import Jisho
def lookup_json(text: str) -> str:
with Jisho("./jisho.db") as j:
return j.query(text).to_json(indent=2)
Export study data:
from kotodex import Jisho
with Jisho("./jisho.db") as j:
result = j.imi("食べ%る", zenbu=True)
rows = [(entry.word, entry.reading, entry.meaning, entry.jlpt) for entry in result.entries]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kotodex-0.3.0.tar.gz.
File metadata
- Download URL: kotodex-0.3.0.tar.gz
- Upload date:
- Size: 289.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd2c850e7cca2f7a0fb0c829448fcf6ce718c5658f61a3c73fee6fb660239d40
|
|
| MD5 |
6f03fbe02dceb917b891ae982e60701b
|
|
| BLAKE2b-256 |
e5f40253d9e40dfa3e5b4062513fd43c393b2261cf02696d21820569fb4af3e7
|
File details
Details for the file kotodex-0.3.0-py3-none-any.whl.
File metadata
- Download URL: kotodex-0.3.0-py3-none-any.whl
- Upload date:
- Size: 286.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
018ec57eddcf2069ce067bcedc6ef2116bf244483b417d013927629be47c5ea4
|
|
| MD5 |
fac0414c22ef5a26b805a56c8aa2afe2
|
|
| BLAKE2b-256 |
bc7eabb66be6bd22b232a4f89e784810bce2062d01aeb4551fca21c4add3db92
|