The grammar analyzer for Turkish
Project description
Dizge: The grammar analyzer for Turkish
Table of Contents
- Introduction
- How to Use
2.1. Installation and First Run
2.2.competence
Functions
2.3.tools
Functions
2.4. Data Loading
2.5. Analysis and Getting Result - Contact
Introduction
Dizge is an open-source Python project for linguistic analysis of Turkish data. The project consists of two modules:
competence
: It represents the linguistic knowledge of an ideal native speaker, and it's programmed in the Object Oriented Programming (OOP) principles.tools
: It includes rule-based functions representing the linguistic processes.
The current version (v0.1.x) of the project has a few phonetic/phonological functions, as a pre-version.
For the theoretical background of Dizge, the following resources have benefitted from:
- Ergenç, İ., & Uzun, İ. P. (2020). Türkçenin ses dizgesi (2nd ed.). Seçkin Yayınevi.
- IPA Chart, http://www.internationalphoneticassociation.org/content/ipa-chart, available under a Creative Commons Attribution-Sharealike 3.0 Unported License. Copyright © 2018 International Phonetic Association.
You can cite our project in your publications:
Mutlu, M. U., Yetimaslan, N., & Atagün, İ. (2021). Dizge: The grammar analyzer for Turkish. https://github.com/dizge
Also, Dizge has a no-code web service on https://dizge.pythonanywhere.com.
How to Use
Installation and First Run
You can use the pip tool to install the package to your system with the following pip install dizge
command. Then, it's easy to call it in any Python script:
import dizge
competence
Functions
Dizge has all the Turkish phonemes and their linguistic features. For example, you can easily get a list of Turkish vowels and consonants:
for i in [vars(phoneme) for phoneme in dizge.vowels]:
print(i)
for i in [vars(phoneme) for phoneme in dizge.consonants]:
print(i)
Similarly, you can check if any phonemes have a feature with the help of functions in competence
. It's enough to give the grapheme of any phonemes as string inputs to the following functions:
isVowel()
: If a phoneme is a vowel, it'll return True; otherwise, it'll return False.isUnrounded()
: If a phoneme is an unrounded vowel, it'll return True; otherwise, it'll return False.isRounded()
: If a phoneme is a rounded vowel, it'll return True; otherwise, it'll return False.isClose()
: If a phoneme is a closed vowel, it'll return True; otherwise, it'll return False.isCloseMid()
: If a phoneme is a mid-closed vowel, it'll return True; otherwise, it'll return False.isOpenMid()
: If a phoneme is a mid-open vowel, it'll return True; otherwise, it'll return False.isOpen()
: If a phoneme is an open vowel, it'll return True; otherwise, it'll return False.isFront()
: If a phoneme is a front vowel, it'll return True; otherwise, it'll return False.isCentral()
: If a phoneme is a central vowel, it'll return True; otherwise, it'll return False.isBack()
: If a phoneme is a back vowel, it'll return True; otherwise, it'll return False.isConsonant()
: If a phoneme is a consonant, it'll return True; otherwise, it'll return False.isPlosive()
: If a phoneme is a plosive consonant, it'll return True; otherwise, it'll return False.isNasal()
: If a phoneme is a nasal consonant, it'll return True; otherwise, it'll return False.isTrill()
: If a phoneme is a trilled consonant, it'll return True; otherwise, it'll return False.isTaporFlap()
: If a phoneme is a tapped or flapped consonant, it'll return True; otherwise, it'll return False.isFricative()
: If a phoneme is a fricative consonant, it'll return True; otherwise, it'll return False.isLateralFricative()
: If a phoneme is a lateral fricative consonant, it'll return True; otherwise, it'll return False.isApproximant()
: If a phoneme is an approximantal consonant, it'll return True; otherwise, it'll return False.isLateralApproximant()
: If a phoneme is a lateral approximantal consonant, it'll return True; otherwise, it'll return False.isBilabial()
: If a phoneme is a bilabial consonant, it'll return True; otherwise, it'll return False.isLabiodental()
: If a phoneme is a labiodental consonant, it'll return True; otherwise, it'll return False.isDental()
: If a phoneme is a dental consonant, it'll return True; otherwise, it'll return False.isAlveolar()
: If a phoneme is an alveolar consonant, it'll return True; otherwise, it'll return False.isPostalveolar()
: If a phoneme is a postalveolar consonant, it'll return True; otherwise, it'll return False.isRetroflex()
: If a phoneme is a retroflex consonant, it'll return True; otherwise, it'll return False.isPalatal()
: If a phoneme is a palatal consonant, it'll return True; otherwise, it'll return False.isVelar()
: If a phoneme is a velar consonant, it'll return True; otherwise, it'll return False.isUvular()
: If a phoneme is an ulvular consonant, it'll return True; otherwise, it'll return False.isPharyngeal()
: If a phoneme is a pharyngeal consonant, it'll return True; otherwise, it'll return False.isGlottal()
: If a phoneme is a glottal consonant, it'll return True; otherwise, it'll return False.isVoiced()
: If a phoneme is voiced, it'll return True; otherwise, it'll return False.isVoiceless()
: If a phoneme is voiceless, it'll return True; otherwise, it'll return False.
tools
Functions
In this module, there are some functions of linguictic processes, like transcription (Grapheme-to-Phoneme or G2P), syllablization, etc.
First, the softG()
function shows the effects, such as vowel shifting or lengthening, of the <ğ> grapheme without a phoneme value:
>>> dizge.softG('dağ')
'daː'
>>> dizge.softG('göğüs')
'göːüs'
>>> dizge.softG('eğlence')
'eylence'
We provide two options for syllabilzation analysis. First of them is the orthography-based analysis, as the traditional method. syllable_o()
works in this way.
>>> dizge.syllable_o('afyonkarahisarlılaştıramadıklarımızdanmışçasına')
['af', 'yon', 'ka', 'ra', 'hi', 'sar', 'lı', 'laş', 'tı', 'ra', 'ma', 'dık', 'la', 'rı', 'mız', 'dan', 'mış', 'ça', 'sı', 'na']
However, if you care about the phonetic occurences during the syllablization, you can use the syllable_p()
function.
>>> dizge.syllable_p('afyonkarahisarlılaştıramadıklarımızdanmışçasına')
[('ɑf', 'VC'), ('jɔŋ', 'CVC'), ('kɑ', 'CV'), ('ɾɑ', 'CV'), ('çI', 'CV'), ('sɑɾ', 'CVC'), ('łɨ', 'CV'), ('łɑʃ', 'CVC'), ('tɨ', 'CV'), ('ɾɑ', 'CV'), ('mɑ', 'CV'), ('dɨk', 'CVC'), ('łɑ', 'CV'), ('ɾɨ', 'CV'), ('mɨz', 'CVC'), ('dɑn', 'CVC'), ('mɨʃ', 'CVC'), ('tʃɑ', 'CV'), ('sɨ', 'CV'), ('nɑ', 'CV')]
Also, you can get a comprehensive calculation of the analysis with the countSyllable()
function:
>>> dizge.countSyllable('afyonkarahisarlılaştıramadıklarımızdanmışçasına')
{'VC': 1, 'CVC': 7, 'CV': 12}
You can analyze the vowel harmony of a word, as a signal of Turkish originated words, with the harmony()
function. It checks e-type and i-type harmonies of the word and returns both in a tuple:
>>> dizge.harmony('ankara')
(True, True)
Last but not least, the g2p()
function transcribes from graphemes to phonemes in a rule-based way:
>>> dizge.g2p('afyonkarahisarlılaştıramadıklarımızdanmışçasına')
'ɑfjɔŋkɑɾɑçIsɑɾłɨłɑʃtɨɾɑmɑdɨkłɑɾɨmɨzdɑnmɨʃtʃɑsɨnɑ'
DISCLAIMER:
- Unlike our reference resources, we ignored the half-long occurences. It may be implemented in the upcoming updates.
- Also, we provide more than one alternative for the examples our reference resources provide alternatives based on the spoken language. You can pick the one you need by indexing.
Data Loading
While using Dizge, you can work on tabular data in the xlsx and csv formats or plain-text data in the txt format. It's possible to use the 3rd party data processing libraries or built-in Python functions to load the data, based on your choice. Here's an example usage:
data = open('data/sample.txt', 'r', encoding="utf-8").read()
After loading the data, you can select the part you need to analyze, and you can standardize the data before starting the analysis. If you prefer to use the analyze()
function we'll mention in the next section, the function already standardizes your data. Let's look at how to use the standardize()
function:
words = dizge.standardize(data)
Analysis and Getting Result
You can use the analyze()
function to analyze the data easily. It's enough to give two inputs, i.e.nthe data you work on, and the tools you want to use. That's all:
>>> tools = ["g2p", "syllable_o", "syllable_p", "countSyllable", "harmony"]
>>> result = dizge.analyze(words, tools)
Contact
For any requests about Dizge, you can use the following contact info:
Email: dizgenlp@gmail.com
X (formerly Twitter): @dizgenlp
Instagram: @dizgenlp
Additionally, please feel free to create issue or PRs (pull requests).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dizge-0.1.6.tar.gz
.
File metadata
- Download URL: dizge-0.1.6.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d33ed6034718d409cd9ad955c9509b77e6b7f940352b41402bb093b56c338f6 |
|
MD5 | e8e6b15e4721dd9ff8243ca7b3c1719c |
|
BLAKE2b-256 | a31fa0e3fc73bb0064ae8f8a675735a12df80be0ecd1f623b8cf0af420d77cd5 |
File details
Details for the file dizge-0.1.6-py3-none-any.whl
.
File metadata
- Download URL: dizge-0.1.6-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1be3a1b83d540642c1b6320d6c3d330d7edceb71579ea6fad6f8c98fb679d23f |
|
MD5 | 0cff2519b1982b7c7b4eba6f329896a5 |
|
BLAKE2b-256 | 3e455483b7458ec80faf645e80cd91b285f4d3179c388ff75e4a1f18f7f30266 |