Skip to main content

A language processing tool for Sinhalese (සිංහල)

Project description

A language processing tool for Sinhalese (සිංහල).

Update 2020.08.16: Add pypi package @ https://pypi.org/project/sinling/.

Update 2020.08.16: Integrated Part of speech tagger and stemmer tool.

Update 2019.07.21: This tool no longer requires java to run sinhala tokenizer. All java code is ported to Python implementation for convenience.

Binder PyPI version

Installation

from source (recommended)

Steps-

  1. Download stat.split.pickle to the resources folder
  2. Import required tools from the sinling module in your desired project (you may have to append this project path to your path environment variable)

pip

Run the following command in your virtualenv to install this package [need fix].

pip install sinling

How to use

Sinhala Tokenizer

from sinling import SinhalaTokenizer

tokenizer = SinhalaTokenizer()

sentence = '...'  # your sentence

tokenizer.tokenize(sentence)

Sinhala Stemmer (Experimental)

from sinling import SinhalaStemmer

stemmer = SinhalaStemmer()

word = '...'  # your sentence

stemmer.stem(word)

Please cite sinhala-stemmer if you are using this implementation.

Part-of-Speech Tagger

from sinling import SinhalaTokenizer, POSTagger

tokenizer = SinhalaTokenizer()

document = '...'  # may contain multiple sentences

tokenized_sentences = [tokenizer.tokenize(f'{ss}.') for ss in tokenizer.split_sentences(document)]

tagger = POSTagger()

pos_tags = tagger.predict(tokenized_sentences)

Word Joiner (Morphological Joiner)

from sinling import preprocess, word_joiner

w1 = preprocess('මුනි')
w2 = preprocess('උතුමා')
results = word_joiner.join(w1, w2)
# Returns a list of possible results after applying join rules ['මුනිතුමා', ...]

Word Splitter (Morphological Splitter) / corpus based - experimental

from sinling import word_splitter

word = '...'
results = word_splitter.split(word)
# Returns a dict containing debug information, base word and affix

Visit here to see some sample splits.

Contributions

  • Contact wayasas.13@cse.mrt.ac.lk if you would like to contribute to this project.

License

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinling-0.3.1.tar.gz (19.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinling-0.3.1-py3-none-any.whl (20.0 MB view details)

Uploaded Python 3

File details

Details for the file sinling-0.3.1.tar.gz.

File metadata

  • Download URL: sinling-0.3.1.tar.gz
  • Upload date:
  • Size: 19.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for sinling-0.3.1.tar.gz
Algorithm Hash digest
SHA256 46a425205a5635801a60786efaf27ae496ef97d61f79b08a0a85763f60ec90bc
MD5 eb77703597dd7a6d23bd77fded0191f9
BLAKE2b-256 d237d7f4100bacf06422520dc1b8a956e2af57261caffe268d05feb96b0dfaaa

See more details on using hashes here.

File details

Details for the file sinling-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: sinling-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 20.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for sinling-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9520e7511c15a70be37d2d91f54b3708abb1bdc671059d492b391a4c8b2b4466
MD5 9fcb9e76332b8cb3493edaa1df1147cf
BLAKE2b-256 0e487963bf3c9d73186191f571aeb04b2de248d18cf37b89bcb2fa1437d20c65

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page