A tool for normalizing Korean text
Project description
Korean Text Normalizer
Korean Text Normalizer is a Python package for normalizing Korean text. It provides various functions to process and clean up Korean text data.
Features
- Expand common Korean abbreviations
- Perform basic spell checking
- Normalize emoticons
- Detect and correct sentence boundaries
- Separate and combine Korean jamo (syllable characters)
Installation
You can install the package using pip:
pip install korean-text-normalizer
Usage
Here's a basic example of how to use the Korean Text Normalizer:
from korean_text_normalizer import KoreanTextNormalizer
normalizer = KoreanTextNormalizer()
text = "ㅎㅇ! 오늘 날씨가 좋네요ㄱㅅ ^_^ 내일도 날씨가 좋았으면"
normalized_text = normalizer.normalize(text)
print(normalized_text)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for korean-text-normalizer-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06b40dc01c97179fa1dcfc30ec08c9dfacd17024f72d37b932b8871649260944 |
|
MD5 | fb66f4610a5cc143a89f6f7a08897d57 |
|
BLAKE2b-256 | 861c4fb03f3c27a277435e45b62f52c0fa50877b228e3474a1b00bea58bf2ca6 |
Close
Hashes for korean_text_normalizer-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe7b98ec1786f9e5a1ca45e7f2d1c5cc1cfd3d60d5e394a194d9697a833f9424 |
|
MD5 | b4cb886c1ea547e5bc91aa0de2d0fccb |
|
BLAKE2b-256 | eab1833c44ca373071185f31e5a0e458547738b290f063153b745cb3a0143385 |