A tool for normalizing Korean text
Project description
Korean Text Normalizer
Korean Text Normalizer is a Python package for normalizing Korean text. It provides various functions to process and clean up Korean text data.
Features
- Expand common Korean abbreviations
- Perform basic spell checking
- Normalize emoticons
- Detect and correct sentence boundaries
- Separate and combine Korean jamo (syllable characters)
Installation
You can install the package using pip:
pip install korean-text-normalizer
Usage
Here's a basic example of how to use the Korean Text Normalizer:
from korean_text_normalizer import KoreanTextNormalizer
normalizer = KoreanTextNormalizer()
text = "ㅎㅇ! 오늘 날씨가 좋네요ㄱㅅ ^_^ 내일도 날씨가 좋았으면"
normalized_text = normalizer.normalize(text)
print(normalized_text)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for korean-text-normalizer-0.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b35e1d2e98128584cda1c1390b3b05d70416f6625740e81bf6cb0a08ee561e6e |
|
MD5 | fc1695969dfb3fb36c77f710c61ffad3 |
|
BLAKE2b-256 | aac64c9feff94806f4a6f5cc751e2c25329dd6d605a289bc94a876d35793fbcb |
Close
Hashes for korean_text_normalizer-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11bdef3b67145e596d717bde5fb0a7f27e6b97a085cdfe6a61012eb5d77c59c3 |
|
MD5 | 077210388ea34a2445376e0362f25b20 |
|
BLAKE2b-256 | 32370c9486ca934be0eb35e6d46de5af573115fe783b16d847b424456a8178fc |