Georgian Language Hyphenation Library v2.2.6 - Preserves compound word hyphens
Project description
Georgian Hyphenation
Georgian Language Hyphenation Library - Fast, accurate syllabification for Georgian (ქართული) text with support for Python 3.7+.
Features
- ✅ Accurate Georgian syllabification based on phonetic rules
- ✅ Harmonic consonant clusters recognition (ბრ, გრ, კრ, etc.)
- ✅ Gemination handling (double consonant splitting)
- ✅ Exception dictionary for irregular words
- ✅ Preserves compound word hyphens (new in v2.2.5)
- ✅ Zero dependencies
- ✅ Lightweight and fast
- ✅ Type hints for better IDE support
Installation
pip install georgian-hyphenation
Quick Start
from georgian_hyphenation import GeorgianHyphenator
# Create hyphenator instance
hyphenator = GeorgianHyphenator()
# Hyphenate a word
result = hyphenator.hyphenate('საქართველო')
print(result) # საქართველო
# Get syllables as a list
syllables = hyphenator.get_syllables('თბილისი')
print(syllables) # ['თბი', 'ლი', 'სი']
# Hyphenate entire text
text = 'საქართველო არის ძალიან ლამაზი ქვეყანა'
hyphenated = hyphenator.hyphenate_text(text)
print(hyphenated)
Usage
Basic Hyphenation
from georgian_hyphenation import GeorgianHyphenator
hyphenator = GeorgianHyphenator()
# Single word
print(hyphenator.hyphenate('კომპიუტერი'))
# Output: კომპიუტერი
# Multiple words
print(hyphenator.hyphenate_text('პროგრამირება არის შემოქმედება'))
# Output: პროგრამირება არის შემოქმედება
Custom Hyphen Character
# Use visible hyphen instead of soft hyphen
hyphenator = GeorgianHyphenator(hyphen_char='-')
print(hyphenator.hyphenate('საქართველო'))
# Output: სა-ქარ-თვე-ლო
# Use custom separator
hyphenator = GeorgianHyphenator(hyphen_char='•')
print(hyphenator.hyphenate('საქართველო'))
# Output: სა•ქარ•თვე•ლო
Get Syllables as List
hyphenator = GeorgianHyphenator()
syllables = hyphenator.get_syllables('განათლება')
print(syllables) # ['გა', 'ნათ', 'ლე', 'ბა']
# Count syllables
word = 'უნივერსიტეტი'
syllable_count = len(hyphenator.get_syllables(word))
print(f'{word} has {syllable_count} syllables')
Custom Dictionary
hyphenator = GeorgianHyphenator()
# Add custom hyphenation patterns
custom_words = {
'განათლება': 'გა-ნათ-ლე-ბა',
'უნივერსიტეტი': 'უ-ნი-ვერ-სი-ტე-ტი'
}
hyphenator.load_library(custom_words)
print(hyphenator.hyphenate('განათლება'))
# Uses your custom pattern
Load Default Dictionary
hyphenator = GeorgianHyphenator()
# Load built-in exception dictionary
hyphenator.load_default_library()
# Now hyphenator will use dictionary for common words
# and fall back to algorithm for unknown words
Compound Words (v2.2.5+)
The library now preserves existing hyphens in compound words:
hyphenator = GeorgianHyphenator()
# Compound words keep their hyphens
print(hyphenator.hyphenate('მაგ-რამ'))
# Output: მაგ-რამ (hyphen preserved)
print(hyphenator.hyphenate('ხელ-ფეხი'))
# Output: ხელ-ფეხი (hyphen preserved)
Convenience Functions
For quick one-off usage without creating an instance:
from georgian_hyphenation import hyphenate, get_syllables, hyphenate_text
# Quick hyphenation
print(hyphenate('საქართველო'))
# Quick syllable extraction
print(get_syllables('თბილისი'))
# Quick text hyphenation
print(hyphenate_text('ეს არის ტექსტი'))
Export Formats
TeX Pattern Format
from georgian_hyphenation import to_tex_pattern
pattern = to_tex_pattern('საქართველო')
print(pattern) # .სა1ქარ1თვე1ლო.
Hunspell Format
from georgian_hyphenation import to_hunspell_format
hunspell = to_hunspell_format('საქართველო')
print(hunspell) # სა=ქარ=თვე=ლო
Algorithm
The library uses a sophisticated phonetic algorithm based on Georgian syllable structure:
Rules Applied:
- Vowel Detection: Identifies Georgian vowels (ა, ე, ი, ო, უ)
- Consonant Cluster Analysis: Recognizes 70+ harmonic clusters
- Gemination Rules: Splits double consonants (კკ → კკ)
- Orphan Prevention: Ensures minimum syllable length (2 characters on each side)
Supported Harmonic Clusters:
ბლ, ბრ, ბღ, ბზ, გდ, გლ, გმ, გნ, გვ, გზ, გრ, დრ, თლ, თრ, თღ,
კლ, კმ, კნ, კრ, კვ, მტ, პლ, პრ, ჟღ, რგ, რლ, რმ, სწ, სხ, ტკ,
ტპ, ტრ, ფლ, ფრ, ფქ, ფშ, ქლ, ქნ, ქვ, ქრ, ღლ, ღრ, ყლ, ყრ, შთ,
შპ, ჩქ, ჩრ, ცლ, ცნ, ცრ, ცვ, ძგ, ძვ, ძღ, წლ, წრ, წნ, წკ, ჭკ,
ჭრ, ჭყ, ხლ, ხმ, ხნ, ხვ, ჯგ
Syllable Patterns:
- V-V: Split between vowels (გაანალიზა)
- V-C-V: Split after first vowel (მამა)
- V-CC-V: Split between consonants (ბარბარე)
- V-ხრ-V: Keep harmonic clusters together (ასტრონომია)
- V-კკ-V: Split gemination (კლასსი)
API Reference
GeorgianHyphenator(hyphen_char='\u00AD')
Main hyphenator class.
Parameters:
hyphen_char(str): Character to use for hyphenation. Default is soft hyphen (U+00AD)
Methods:
hyphenate(word: str) -> str
Hyphenate a single Georgian word.
get_syllables(word: str) -> List[str]
Get syllables as a list without hyphen characters.
hyphenate_text(text: str) -> str
Hyphenate all Georgian words in text, preserving punctuation and spacing.
load_library(data: Dict[str, str]) -> None
Load custom dictionary mapping words to their hyphenation patterns.
load_default_library() -> None
Load built-in exception dictionary for common irregular words.
apply_algorithm(word: str) -> str
Apply the hyphenation algorithm directly (used internally).
Convenience Functions
hyphenate(word: str, hyphen_char: str = '\u00AD') -> str
get_syllables(word: str) -> List[str]
hyphenate_text(text: str, hyphen_char: str = '\u00AD') -> str
to_tex_pattern(word: str) -> str
to_hunspell_format(word: str) -> str
Performance
- Speed: ~0.05ms per word on average
- Memory: ~50KB with dictionary loaded
- Optimization: Uses
Setfor O(1) cluster lookups
Examples
Text Processing Pipeline
from georgian_hyphenation import GeorgianHyphenator
hyphenator = GeorgianHyphenator()
hyphenator.load_default_library()
def process_document(text):
"""Process Georgian document for web display"""
return hyphenator.hyphenate_text(text)
# Use in your application
article = """
საქართველო არის ერთ-ერთი უძველესი ქვეყანა მსოფლიოში.
თბილისი არის დედაქალაქი და კულტურული ცენტრი.
"""
processed = process_document(article)
E-book Generator
from georgian_hyphenation import GeorgianHyphenator
def format_for_ebook(paragraphs):
hyphenator = GeorgianHyphenator('\u00AD') # soft hyphen
hyphenator.load_default_library()
formatted = []
for paragraph in paragraphs:
formatted.append(hyphenator.hyphenate_text(paragraph))
return '\n\n'.join(formatted)
Syllable Counter
from georgian_hyphenation import get_syllables
def count_syllables_in_text(text):
words = text.split()
total = 0
for word in words:
# Remove punctuation
clean_word = ''.join(c for c in word if c.isalpha())
if clean_word:
syllables = get_syllables(clean_word)
total += len(syllables)
return total
text = "საქართველო არის ლამაზი ქვეყანა"
print(f"Total syllables: {count_syllables_in_text(text)}")
Poetry Analyzer
from georgian_hyphenation import GeorgianHyphenator
def analyze_verse(line):
"""Analyze syllable structure of Georgian poetry"""
hyphenator = GeorgianHyphenator('-')
words = line.split()
analysis = []
for word in words:
syllables = hyphenator.get_syllables(word)
analysis.append({
'word': word,
'syllables': syllables,
'count': len(syllables)
})
return analysis
verse = "მთვარე ანათებს ცისკარზე"
print(analyze_verse(verse))
Testing
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
Changelog
v2.2.5 (2026-01-30)
- ✨ New: Preserves regular hyphens in compound words
- 🐛 Fixed: Hyphen stripping now only removes soft hyphens and zero-width spaces
- 📝 Improved: Documentation and examples
- 🔧 Changed:
_strip_hyphens()method behavior
v2.2.2
- Dictionary support added
- Performance optimizations with Set-based lookups
v2.2.1
- Hybrid engine (Algorithm + Dictionary)
- Harmonic cluster support
- Gemination handling
v2.0.0
- Complete rewrite with academic phonological rules
- Anti-orphan protection
- Type hints added
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
MIT © Guram Zhgamadze
Author
Guram Zhgamadze
- GitHub: @guramzhgamadze
- Email: guramzhgamadze@gmail.com
Related Projects
- georgian-hyphenation (npm) - JavaScript/Node.js version
- Georgian Language Resources
- Unicode Georgian Range
Citation
If you use this library in academic work, please cite:
@software{georgian_hyphenation,
author = {Zhgamadze, Guram},
title = {Georgian Hyphenation Library},
year = {2024},
publisher = {GitHub},
url = {https://github.com/guramzhgamadze/georgian-hyphenation}
}
Acknowledgments
- Based on Georgian phonological and syllabification rules
- Inspired by traditional Georgian typography standards
- Community feedback and contributions
Made with ❤️ for the Georgian language community
ქართული ენის თანამშრომლობისთვის
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file georgian_hyphenation-2.2.6.1.tar.gz.
File metadata
- Download URL: georgian_hyphenation-2.2.6.1.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0bf923b19771698b9a9d12745805b32140f0f8a085359374d0f0969d6df5564
|
|
| MD5 |
ee4c5e0a48fd7ea76a6662162101448a
|
|
| BLAKE2b-256 |
2ca16114abb279475ab265e7bf7719073e234b1d300fb5382751953e4cf3aa7d
|
File details
Details for the file georgian_hyphenation-2.2.6.1-py3-none-any.whl.
File metadata
- Download URL: georgian_hyphenation-2.2.6.1-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8992840599fee9f66bb47da84ac28a8c2b08466b882a70cea315120daad058e3
|
|
| MD5 |
f98c4bc2e4303c1cbcba5b28ecf236a1
|
|
| BLAKE2b-256 |
a2ed40b86ebcca03bea6d8524ea150a48cad44e2c4bb8eca73dc66718d2d0096
|