Skip to main content

A Japanese text tokenizer with POS tagging and Jisho.org integration.

Project description

KanaSplit - Japanese Text Tokenizer

KanaSplit is a powerful and efficient Japanese text tokenizer with part-of-speech (POS) tagging and Jisho.org integration.

KanaSplit Logo

🚀 Features

  • Tokenization: Splits Japanese sentences into words and morphemes.
  • POS Tagging: Provides grammatical category for each token.
  • Furigana Support: Extracts readings for kanji words.
  • Jisho.org API Integration: Retrieves word meanings and definitions.
  • Command-Line Interface (CLI): Allows easy text tokenization from the terminal.
  • Graphical User Interface (GUI): Provides a user-friendly Tkinter-based interface.

📦 Installation

Install from PyPI (Recommended)

The easiest way to install KanaSplit is via pip:

pip install kanasplit

Install from Source

Alternatively, you can clone the repository and install it manually:

git clone https://github.com/byteMe394/KanaSplit.git
cd KanaSplit
pip install -r requirements.txt

🖥 OS-Specific Installation Instructions

Windows

Simply run:

pip install kanasplit

macOS

macOS users need to install MeCab manually before using KanaSplit:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install mecab mecab-ipadic
pip install mecab-python3
pip install kanasplit

GNU/Linux (Debian/Ubuntu-based)

For Debian-based Linux distributions, install MeCab before installing KanaSplit:

sudo apt update
sudo apt install mecab mecab-ipadic-utf8
pip install mecab-python3
pip install kanasplit

For other distributions, use the package manager of your choice.


🎮 Usage

Command Line Interface (CLI)

You can tokenize Japanese text directly from the terminal:

kanasplit-cli "こんにちは世界"

Example Output:

Tokenized Text:
- こんにちは (Interjection)
- 世界 (Noun)

Fetching meanings from Jisho.org...
Word: こんにちは - Reading: こんにちは - Meanings: hello, good day
Word: 世界 - Reading: せかい - Meanings: world, society, universe

Graphical User Interface (GUI)

KanaSplit also includes a Tkinter-based GUI for users who prefer a graphical interface.

To launch the GUI, simply run:

python GUI.py

This will open a window where you can enter Japanese text, process it, and view tokenized results.


🛠 Dependencies

KanaSplit requires the following dependencies, which are installed automatically:

  • ratelimit
  • MeCab
  • requests

Note: Tkinter is built into Python and does not require installation.


🤝 Contributing

If you'd like to contribute, clone the repository and submit a pull request! You can install additional development tools with:

pip install -r requirements.txt

📜 License

This project is licensed under the MIT License.


📬 Contact

For questions or support, feel free to reach out:


🎌 Happy Tokenizing! 🎌

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kanasplit-1.0.2.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kanasplit-1.0.2-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file kanasplit-1.0.2.tar.gz.

File metadata

  • Download URL: kanasplit-1.0.2.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for kanasplit-1.0.2.tar.gz
Algorithm Hash digest
SHA256 f84bb7b3ae586e7329fb61a9b4fa3ae05dee2912f03d0a1c0eab2845e917076a
MD5 4e99397affa3d601288a939c044d8349
BLAKE2b-256 3f2b6c310e90661a703c8f1b8e151dc1437e92e0c42703f87338b87c5b11c721

See more details on using hashes here.

File details

Details for the file kanasplit-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: kanasplit-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for kanasplit-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6c47556e5685b48ec3055c3bf62967efd881f82bd50631bd5e5267224226d6a9
MD5 a650e2697539ce50ec5aa6fe3e75aed7
BLAKE2b-256 ac205ca9254920b81d280ddfb585c6f236b3e591a4d749946adc618372668c01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page