A Japanese text tokenizer with POS tagging and Jisho.org integration.
Project description
KanaSplit - Japanese Text Tokenizer
KanaSplit is a powerful and efficient Japanese text tokenizer with part-of-speech (POS) tagging and Jisho.org integration.
🚀 Features
- Tokenization: Splits Japanese sentences into words and morphemes.
- POS Tagging: Provides grammatical category for each token.
- Furigana Support: Extracts readings for kanji words.
- Jisho.org API Integration: Retrieves word meanings and definitions.
- Command-Line Interface (CLI): Allows easy text tokenization from the terminal.
- Graphical User Interface (GUI): Provides a user-friendly Tkinter-based interface.
📦 Installation
Install from PyPI (Recommended)
The easiest way to install KanaSplit is via pip:
pip install kanasplit
Install from Source
Alternatively, you can clone the repository and install it manually:
git clone https://github.com/byteMe394/KanaSplit.git
cd KanaSplit
pip install -r requirements.txt
🖥 OS-Specific Installation Instructions
Windows
Simply run:
pip install kanasplit
macOS
macOS users need to install MeCab manually before using KanaSplit:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install mecab mecab-ipadic
pip install mecab-python3
pip install kanasplit
GNU/Linux (Debian/Ubuntu-based)
For Debian-based Linux distributions, install MeCab before installing KanaSplit:
sudo apt update
sudo apt install mecab mecab-ipadic-utf8
pip install mecab-python3
pip install kanasplit
For other distributions, use the package manager of your choice.
🎮 Usage
Command Line Interface (CLI)
You can tokenize Japanese text directly from the terminal:
kanasplit-cli "こんにちは世界"
Example Output:
Tokenized Text:
- こんにちは (Interjection)
- 世界 (Noun)
Fetching meanings from Jisho.org...
Word: こんにちは - Reading: こんにちは - Meanings: hello, good day
Word: 世界 - Reading: せかい - Meanings: world, society, universe
Graphical User Interface (GUI)
KanaSplit also includes a Tkinter-based GUI for users who prefer a graphical interface.
To launch the GUI, simply run:
python GUI.py
This will open a window where you can enter Japanese text, process it, and view tokenized results.
🛠 Dependencies
KanaSplit requires the following dependencies, which are installed automatically:
ratelimitMeCabrequests
Note: Tkinter is built into Python and does not require installation.
🤝 Contributing
If you'd like to contribute, clone the repository and submit a pull request! You can install additional development tools with:
pip install -r requirements.txt
📜 License
This project is licensed under the MIT License.
📬 Contact
For questions or support, feel free to reach out:
- GitHub: byteMe394
- Email: joseantonio_tf@outlook.com
🎌 Happy Tokenizing! 🎌
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kanasplit-1.0.2.tar.gz.
File metadata
- Download URL: kanasplit-1.0.2.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f84bb7b3ae586e7329fb61a9b4fa3ae05dee2912f03d0a1c0eab2845e917076a
|
|
| MD5 |
4e99397affa3d601288a939c044d8349
|
|
| BLAKE2b-256 |
3f2b6c310e90661a703c8f1b8e151dc1437e92e0c42703f87338b87c5b11c721
|
File details
Details for the file kanasplit-1.0.2-py3-none-any.whl.
File metadata
- Download URL: kanasplit-1.0.2-py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c47556e5685b48ec3055c3bf62967efd881f82bd50631bd5e5267224226d6a9
|
|
| MD5 |
a650e2697539ce50ec5aa6fe3e75aed7
|
|
| BLAKE2b-256 |
ac205ca9254920b81d280ddfb585c6f236b3e591a4d749946adc618372668c01
|