High-performance Text processing library for Thai language, built with Rust
Project description
Thongna 🌾
Thongna (ท้องนา) is a high-performance text processing library for the Thai language, built with Rust and exposed as a Python package. Designed to handle the complexities of Thai text with the speed and efficiency that Rust provides, Thongna is perfect for developers looking to integrate advanced text processing features into their applications.
Features
- Efficient Thai text normalization: Clean and standardize Thai text by removing or replacing special characters, whitespace, and more.
- Fast and reliable: Built with Rust, Thongna offers the performance you need for large-scale text processing.
- Python integration: Easily use Thongna in your Python projects with its simple and intuitive API.
Installation
To install Thongna, ensure you have Python and Rust installed, then use pip
:
pip install thongna
Usage Here's a quick example of how to use Thongna for basic text processing:
import thongna
# Example text
thai_text = "สวัสดีค่ะ! นี่คือทดสอบการใช้งาน Thongna 🌾"
# Normalize the text
normalized_text = thongna.normalize_text(thai_text)
print("Normalized Text:", normalized_text)
Functions
- normalize_text(text: str) -> str: Normalize Thai text by cleaning up unwanted characters and ensuring consistent formatting.
- replace_characters(text: str, replacements: dict) -> str: Replace specific characters in the text based on a given dictionary of replacements.
- More features to come...
Why Thongna? 🌾
The name "Thongna" (ท้องนา) means "rice field" in Thai, symbolizing growth, nourishment, and the foundational aspects of life. Just like a rice field sustains life, Thongna provides the essential tools for working with Thai text, ensuring that your applications can grow and thrive.
Contributing
We welcome contributions from the community! If you’d like to contribute to Thongna, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Submit a pull request with a clear explanation of your changes.
License
Thongna is licensed under the MIT License. See the LICENSE file for more details.
Contact
For any questions, suggestions, or issues, feel free to open an issue or contact the maintainers directly.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for thongna-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b16059056886dbcb66a31ced13fbee2b61fc7afb97da1d543e238e8099bb5e9 |
|
MD5 | 86bdf7ff6d8ce1b4235c3aca90e980d1 |
|
BLAKE2b-256 | 43ba5cba10e1f24b6417b6bf658c1d28717661cff63b70ace57eb3097eeeb83b |