A Turkish syllable splitter implemented in C with Python bindings
Project description
Turkish Syllable Splitter
turkish-syllable is a library for syllabification of Turkish text, written in C and accessible using Python connectors. It works quickly and efficiently, produces results that follow Turkish spelling rules, and offers optional inclusion of punctuation.
Important Note: This library is able to separate the syllables of words of Turkish origin according to the rules of the Turkish Language Association (TDK), but it does not provide a definitive solution for words of foreign origin. Although these words are often spelled correctly, incorrect spelling can be encountered due to language structure.
Features
- Turkish Spelling: Works according to the spelling rules specific to the Turkish language (for example, “merhaba” →
['mer', 'ha', 'ba']). - Punctuation Support: Optionally adds punctuation marks and spaces to the syllable list (
with_punctuationparameter). - Fast Performance: C-based algorithm provides fast results even for large texts.
- Platform Compatibility: Works on Linux based systems (manylinux compatible).
Installation
You can install it via PyPI:
pip install turkish-syllable
Sample Usage
Using with Python:
from turkish_syllable import syllabify
# with punctuation
result = syllabify("Merhaba, dünya!") # default value of with_punctuation is True
print(result)
# output: ['Mer', 'ha', 'ba', ',', ' ', 'dün', 'ya', '!']
# without punctuation
result = syllabify("Merhaba, dünya!", with_punctuation=False)
print(result)
# output: ['Mer', 'ha', 'ba', 'dün', 'ya']
Using with command line:
# with punctuation (default)
python -m turkish_syllable -i input.txt -o output.txt -p
# or enter the text directly:
python -m turkish_syllable -p
# sample input: "Merhaba, dünya!"
# output: Mer ha ba , dün ya !
# without punctuation
python -m turkish_syllable -i input.txt -o output.txt --no-punctuation
# or:
python -m turkish_syllable --no-punctuation
# sample input: "Merhaba, dünya!"
# output: Mer ha ba dün ya
Technical Details
- Language: The algorithm is written in C and linked to Python with ctypes.
- Spelling Algorithm: It follows the natural distinctions between vowels and consonants according to Turkish spelling rules. It is optimized for special cases (for example, words with 3 or 4 letters).
- Dependencies: No extra Python dependencies are required, only standard libraries are used.
- File Structure:
- syllable.c: C source code containing the spelling logic.
- libsyllable.so: Compiled shared library.
- csyllable_en.py: Python linker.
Requirements
- Python 3.6 or higher
- Linux operating system (with manylinux compatible build)
License
Distributed under this project (MIT).
Contribution
If you want to contribute:
- Fork the repository: github
- Make your changes and send pull request.
Contact
For questions or suggestions: ahmetozdemiir.ao@gmail.com
Version History
- 0.1.1: Added
with_punctuationparameter, shortened function name tosyllabify. - 0.1.0: Initial release.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file turkish_syllable-0.1.2.tar.gz.
File metadata
- Download URL: turkish_syllable-0.1.2.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc0893171f69a0b6cc144e42544f4e834bc33807fa5695d27a86894fdec8bf8f
|
|
| MD5 |
2686aff9757d3c5d418d050ae1ed00f5
|
|
| BLAKE2b-256 |
0b309ea2ba37e98f522e1976703836f7738db6dfa57ddd6fdc087fcc8d7e0843
|
File details
Details for the file turkish_syllable-0.1.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl.
File metadata
- Download URL: turkish_syllable-0.1.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 21.6 kB
- Tags: CPython 3.10, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a021db1be5b08140bb8cef824698e4be1c44348abf5b7ebd2c10567432b2cd9
|
|
| MD5 |
a9ef99749017892118ea063c5f5d4666
|
|
| BLAKE2b-256 |
07beb2aa25d8822c4e3c1a4d68036cb210367ac561beb972a218484eb43b4565
|