Skip to main content

Python package for accessing a comprehensive Japanese language database.

Project description

Kotobase

Kotobase is a Japanese language Python package which provides simple programmatic access to various data sources via a pre-built database which is updated weekly via a GitHub action.

Documentation

Detailed documentation is avilable at the GitHub Pages Site

Data Sources

Kotobase uses data from these sources to build its Database.

  • JMDict : Japanese-Multilingual Dictionary.

  • JMnedict : A dictionary of Japanese proper names.

  • KanjiDic2 : A comprehensive kanji dictionary.

  • Tatoeba : A large database of example sentences.

  • JLPT Lists : Curated list of Grammar, Vocabulary and Kanji separated by Japanese Language Proficiency Test levels, made available on Jonathan Weller's website.

Licenses

The licenses of these data sources and the NOTICE is available at docs/licenses in this repository.

Features

  • Comprehensive Lookups → Search for words (kanji, kana, or romaji), kanji, and proper names.

  • Organized Data → Get detailed information including readings, senses, parts of speech, kanji stroke counts, meanings, and JLPT levels formatted into Python Data Objects.

  • Example Sentences → Find example sentences from Tatoeba that contain the searched query.

  • Wildcard Search → Use * or % for wildcard searches.

  • Command-Line Interface → User-friendly CLI for quick lookups from the terminal.

  • Self-Contained → All data is stored in a local SQLite database, so it's fast and works offline.

  • Easy Database Management → Includes commands to automatically download the latest pre-built database from the public Drive or download source files and build the database locally.

Installation

  • Install the package
pip install kotobase

This will install the kotobase package and its dependencies, and it will also make the kotobase command-line tool available in your shell.

  • Pull the Database from Drive or Build it locally by running of the commands below in the environment you installed kotobase
# Pull from Drive
kotobase pull-db
# Build locally
kotobase build

The database will be downloaded or built internally in the package at kotobase/src/db/kotobase.db and will be available for use.

Usage

Kotobase can be used as a command-line tool or as a Python library.

Command-Line Interface

The kotobase command provides several subcommands for different types of lookups.

General Lookup

The lookup command is the most comprehensive way to search for a word.

kotobase lookup 日本語

This will show you dictionary entries, kanji information, JLPT levels, and example sentences for the word "日本語".

Options:

  • -n, --names: Include proper names from JMnedict in the search.
  • -w, --wildcard: Treat * or % as wildcards in the search term.
  • -s, --sentences: Specify the number of example sentences to show.
  • --json-out: Output the full results as a JSON object.

Kanji Lookup

To get information about a specific kanji character:

kotobase kanji 

This will display the kanji's grade, stroke count, meanings, on'yomi, and kun'yomi readings, and JLPT level.

JLPT Lookup

To check the JLPT level for a word or kanji:

kotobase jlpt 勉強

Python API

You can also use Kotobase in your own Python code.

from kotobase import Kotobase

kb = Kotobase()

# Comprehensive lookup
result = kb.lookup("日本語")
print(result.to_json())

# Get info for a single kanji
kanji_info = kb.kanji("語")
print(kanji_info)

# Get example sentences
sentences = kb.sentences("勉強")
for sentence in sentences:
    print(sentence.text)

Database

Kotobase relies on a local SQLite database.

You can also build it from the source files yourself.

The following commands are available for managing the database:

  • kotobase pull-db: Downloads the pre-built SQLite database from a public Google Drive Folder. This file is overwritten every week with a rebuilt database from updated sources. The rebuilding and overwriting is managed by a GitHub action in this repository.

  • kotobase build: Builds the SQLite database from the raw source files. This will download the latest version of the source files (Except Tanos JLPT lists which are shipped with the package itself.) and build the database locally.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kotobase-0.2.7.tar.gz (279.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kotobase-0.2.7-py3-none-any.whl (299.7 kB view details)

Uploaded Python 3

File details

Details for the file kotobase-0.2.7.tar.gz.

File metadata

  • Download URL: kotobase-0.2.7.tar.gz
  • Upload date:
  • Size: 279.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for kotobase-0.2.7.tar.gz
Algorithm Hash digest
SHA256 4d44e7dc9068df4ec0c6c838cc2909e7db58859fb6ff82e50271031ca5320eec
MD5 0f6e488ae3bfa35dd0dbfa7e4d2b2d8f
BLAKE2b-256 3cc726252684a091f36f1b56089d679023063390a8edc46dea6f35db9f74b072

See more details on using hashes here.

File details

Details for the file kotobase-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: kotobase-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 299.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for kotobase-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 db1ce114af928e20d52baeddbbc705a5b1cfedd90ec36af8628a9d6b311febac
MD5 12d96d58403cb1f3ebff054f1a975f80
BLAKE2b-256 b4b894c74ca465db83d8208810ea566e44b2767382e3561e53006649c3211cbf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page