Skip to main content

Georgian alphabet and language utilities for Natural Language Processing, script conversion and more.

Project description

AnbaniPy

Georgian Python toolkit for NLP, Transliteration and more. Partially based on anbani.js.

Install

pip install anbani

Quickstart

Transliteration example:

from anbani.core.converter import convert, interpret

interpret("გამარჯობა", "asomtavruli")

# 'ႢႠႫႠႰႿႭႡႠ'

Georgianisation example:

from anbani.nlp.georgianisation import georgianise

georgianise("gamarjoba - rogor xar - rasa iqm - kaia kata - kai erti")

# 'გამარჯობა - როგორ ხარ - რასა იქმ - კაია კატა - კაი ერთი'

Convert ebooks with qwerty encoding to unicode Mkhedruli:

from anbani.nlp.utils import ebook2text
from anbani.core.converter import classify_text
from anbani.core.converter import convert

text = ebook2text("/home/george/Dev/georgian-text-corpus/sources/mylibrary/raw/files/ჩარლზ დიკენსი - დევიდ კოპერფილდი.pdf")
print(text[:300])

print(classify_text(text))

print(convert(text, "qwerty", "mkhedruli")[:300])

# Carlz dikensi daviT koperfildi Tavi pirveli dabadeba me viqnebi gmiri Cemive sakuTari Tavgadasavlisa Tu sxva...

# latin

# ჩარლზ დიკენსი დავით კოპერფილდი თავი პირველი დაბადება მე ვიქნები გმირი ჩემივე საკუთარი თავგადასავლისა თუ სხვა...

Expand contractions:

from anbani.nlp.contractions import expand_text

text = "ილია ჭავჭავაძე (დ. 8 ნოემბერი, 1837, სოფელი ყვარელი — გ. 12 სექტემბერი, 1907, წიწამური)"

print(text)
print(expand_text(text))

# ილია ჭავჭავაძე (დ. 8 ნოემბერი, 1837, სოფელი ყვარელი — გ. 12 სექტემბერი, 1907, წიწამური)
# ილია ჭავჭავაძე (დაბადება 8 ნოემბერი, 1837, სოფელი ყვარელი — გარდაცვალება 12 სექტემბერი, 1907, წიწამური)

To-Do

Feel free to fork this repo!

  • Tokenizer
  • Transliteration
  • Expand contractions
  • ebook2pdf converter
  • Stemmer
  • Lemmatizer
  • Stopwords

Resources used

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anbani-0.9.5.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

anbani-0.9.5-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file anbani-0.9.5.tar.gz.

File metadata

  • Download URL: anbani-0.9.5.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for anbani-0.9.5.tar.gz
Algorithm Hash digest
SHA256 7889541161c5939f29b72360f6a8014f4df4d9af5481104bc5b8128d1734fda4
MD5 a02d8a0552bc76f74ea70836f1b5237b
BLAKE2b-256 b5bfd8ac934d874b1c8bc4e1674057c25ace293b6a4fbabaa3bff4ca06f26c1f

See more details on using hashes here.

File details

Details for the file anbani-0.9.5-py3-none-any.whl.

File metadata

  • Download URL: anbani-0.9.5-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for anbani-0.9.5-py3-none-any.whl
Algorithm Hash digest
SHA256 13924fb8eca85560510ad923a453eeb528c2ed02313e67e0d2350a64707c0ca1
MD5 7f8b7799a2b7e3811d130bbdb7aa69ec
BLAKE2b-256 049c99ebd4d6add9c05a42e6fa52a0ecc80fbd19635b7ec9c16c8f8c40061112

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page