Skip to main content

A Turkish syllable splitter implemented in C with Python bindings

Project description

Turkish Syllable Splitter

turkish-syllable is a library for syllabification of Turkish text, written in C and accessible using Python connectors. It works quickly and efficiently, produces results that follow Turkish spelling rules, and offers optional inclusion of punctuation.

Important Note: This library is able to separate the syllables of words of Turkish origin according to the rules of the Turkish Language Association (TDK), but it does not provide a definitive solution for words of foreign origin. Although these words are often spelled correctly, incorrect spelling can be encountered due to language structure.

Features

  • Turkish Spelling: Works according to the spelling rules specific to the Turkish language (for example, “merhaba” → ['mer', 'ha', 'ba']).
  • Punctuation Support: Optionally adds punctuation marks and spaces to the syllable list (with_punctuation parameter).
  • Fast Performance: C-based algorithm provides fast results even for large texts.
  • Platform Compatibility: The library is platform independent as of version 0.2.0.

Installation

You can install it via PyPI:

pip install turkish-syllable

Sample Usage

Using with Python:

from turkish_syllable import syllabify

# with punctuation
result = syllabify("Merhaba, dünya!") # default value of with_punctuation is True
print(result)
# output: ['Mer', 'ha', 'ba', ',', ' ', 'dün', 'ya', '!']

# without punctuation
result = syllabify("Merhaba, dünya!", with_punctuation=False)
print(result)
# output: ['Mer', 'ha', 'ba', 'dün', 'ya']

or directly on the file:

from turkish_syllable.csyllable_tr import process_input_output

input_file = "input.txt"
output_file = "output.txt"

"""
function:
	- process_input_output: function that does the spelling on files
parameters:
	- input_file:  file with the text to be spelled
	- output_file: the name of the file where the spelled text will be written
	- with_punctuation: indicates whether punctuation and space characters should be included in the spelling 		process (default=True)
"""
process_input_output(input_file=input_file, output_file=output_file, with_punctuation=True)

with open(output_file, "r", encoding="utf-8") as f:
    print("With punctuation:")
    print(f.read())

process_input_output(input_file=input_file, output_file=output_file, with_punctuation=False)

with open(output_file, "r", encoding="utf-8") as f:
    print("\nWithout punctuation:")
    print(f.read())

Using with command line:

# with punctuation (default)
python3 -m turkish_syllable -i input.txt -o output.txt -p
# or enter the text directly:
python3 -m turkish_syllable -p
# sample input: "Merhaba, dünya!"
# output: Mer ha ba ,   dün ya !

# without punctuation
python3 -m turkish_syllable -i input.txt -o output.txt --no-punctuation
# or:
python3 -m turkish_syllable --no-punctuation
# sample input: "Merhaba, dünya!"
# output: Mer ha ba dün ya

Technical Details

  • Language: The algorithm is written in C and linked to Python with ctypes.
  • Spelling Algorithm: It follows the natural distinctions between vowels and consonants according to Turkish spelling rules. It is optimized for special cases (for example, words with 3 or 4 letters).
  • Dependencies: No extra Python dependencies are required, only standard libraries are used.
  • File Structure:
    • syllable.c: C source code containing the spelling logic.
    • libsyllable.so: Compiled shared library (Linux-many).
    • libsyllable.dll: Compiled shared library (Windows).
    • libsyllable.dylib: Compiled shared library (MacOS).
    • csyllable_en.py: Python linker.

Requirements

  • Python 3.6 or higher
  • It can run on all operating systems.

License

Distributed under this project (MIT).

Contribution

If you want to contribute:

  1. Fork the repository: github
  2. Make your changes and send pull request.

Contact

For questions or suggestions: ahmetozdemiir.ao@gmail.com

Version History

  • 0.2.1: Platform independency, README improved
  • 0.1.4: README improved
  • 0.1.3: README improved and fixing some bugs
  • 0.1.2: Fixing some bugs.
  • 0.1.1: Added with_punctuation parameter, shortened function name to syllabify.
  • 0.1.0: Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

turkish_syllable-0.2.29-cp310-cp310-win_amd64.whl (14.2 kB view details)

Uploaded CPython 3.10Windows x86-64

turkish_syllable-0.2.29-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl (14.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.5+ x86-64

turkish_syllable-0.2.29-cp310-cp310-macosx_10_9_universal2.whl (13.0 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file turkish_syllable-0.2.29-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for turkish_syllable-0.2.29-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 45393c64c5e5a480ef7cf23bede9475d2a9822cc48e4cb34ba8abc3e8d219f86
MD5 0438b135c6368c07428deaf867dc4dd8
BLAKE2b-256 676c12e8b08720bafbb40db0a4abb4b27d6d10669540a745e4fb69ede24dfa20

See more details on using hashes here.

File details

Details for the file turkish_syllable-0.2.29-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for turkish_syllable-0.2.29-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2730863b1f1b7791ae2166daadea492c25b465087cda4e8a10808c6cc364fec6
MD5 390bde72ee25b50a0cf9f4c4c1045f6d
BLAKE2b-256 cc0cac2004495a344bb0a7c9417a69d90e24ab077aa00cc8a55d5974b8a0c070

See more details on using hashes here.

File details

Details for the file turkish_syllable-0.2.29-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for turkish_syllable-0.2.29-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 00e678fbb50e4b7287484363b8cb4c0776482d7fea17c33dad43088168d0fdba
MD5 0e62a64ba60f3cc706ebf249402bce51
BLAKE2b-256 5330beae7dd634da37418d84e5c64f063578da5c4e66da38d7a9fb19f43cad8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page