Skip to main content

A language processing tool for Sinhalese (සිංහල)

Project description

A language processing tool for Sinhalese (සිංහල).

Update 2020.08.16: Integrated Part of speech tagger and stemmer tool.

Update 2019.07.21: This tool no longer requires java to run sinhala tokenizer. All java code is ported to Python implementation for convenience.

Binder

How to get started

Steps-

  1. Download stat.split.pickle to the resources folder
  2. Import required tools from the sinling module in your desired project (you may have to append this project path to your path environment variable)

How to use

Sinhala Tokenizer

from sinling import SinhalaTokenizer

tokenizer = SinhalaTokenizer()

sentence = '...'  # your sentence

tokenizer.tokenize(sentence)

Sinhala Stemmer (Experimental)

from sinling import SinhalaStemmer

stemmer = SinhalaStemmer()

word = '...'  # your sentence

stemmer.stem(word)

Please cite sinhala-stemmer if you are using this implementation.

Part-of-Speech Tagger

from sinling import SinhalaTokenizer, POSTagger

tokenizer = SinhalaTokenizer()

document = '...'  # may contain multiple sentences

tokenized_sentences = [tokenizer.tokenize(f'{ss}.') for ss in tokenizer.split_sentences(document)]

tagger = POSTagger()

pos_tags = tagger.predict(tokenized_sentences)

Word Joiner (Morphological Joiner)

from sinling import preprocess, word_joiner

w1 = preprocess('මුනි')
w2 = preprocess('උතුමා')
results = word_joiner.join(w1, w2)
# Returns a list of possible results after applying join rules ['මුනිතුමා', ...]

Word Splitter (Morphological Splitter) / corpus based - experimental

from sinling import word_splitter

word = '...'
results = word_splitter.split(word)
# Returns a dict containing debug information, base word and affix

Visit here to see some sample splits.

Contributions

  • Contact wayasas.13@cse.mrt.ac.lk if you would like to contribute to this project.

License

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinling-0.0.1.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinling-0.0.1-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file sinling-0.0.1.tar.gz.

File metadata

  • Download URL: sinling-0.0.1.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for sinling-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b0f6e710490ca552c04a814effb26a24c304898b415bc451f407f5eb98c2ce44
MD5 30c0315f28fe193815710d0a03778f31
BLAKE2b-256 00b1b7c6b40339aa20c56e929af5e733de68c74050c6fc6d9ba9c4b89228be92

See more details on using hashes here.

File details

Details for the file sinling-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: sinling-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for sinling-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b42406faaa179b54b736fb784f6016507f132489d4f53c0d3c764f816e7f1187
MD5 9f8e2201b7a51c76bd9bbcf202c1798a
BLAKE2b-256 294bd9bd7569a691718c4cce64d46ec7d7080071a34d63dde431b13209498ec5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page