Skip to main content

A language processing tool for Sinhalese (සිංහල)

Project description

A language processing tool for Sinhalese (සිංහල).

Update 2020.08.16: Add pypi package @ https://pypi.org/project/sinling/.

Update 2020.08.16: Integrated Part of speech tagger and stemmer tool.

Update 2019.07.21: This tool no longer requires java to run sinhala tokenizer. All java code is ported to Python implementation for convenience.

Binder PyPI version

Installation

from source (recommended)

Steps-

  1. Download stat.split.pickle to the resources folder
  2. Import required tools from the sinling module in your desired project (you may have to append this project path to your path environment variable)

pip

Run the following command in your virtualenv to install this package [need fix].

pip install sinling

How to use

Sinhala Tokenizer

from sinling import SinhalaTokenizer

tokenizer = SinhalaTokenizer()

sentence = '...'  # your sentence

tokenizer.tokenize(sentence)

Sinhala Stemmer (Experimental)

from sinling import SinhalaStemmer

stemmer = SinhalaStemmer()

word = '...'  # your sentence

stemmer.stem(word)

Please cite sinhala-stemmer if you are using this implementation.

Part-of-Speech Tagger

from sinling import SinhalaTokenizer, POSTagger

tokenizer = SinhalaTokenizer()

document = '...'  # may contain multiple sentences

tokenized_sentences = [tokenizer.tokenize(f'{ss}.') for ss in tokenizer.split_sentences(document)]

tagger = POSTagger()

pos_tags = tagger.predict(tokenized_sentences)

Word Joiner (Morphological Joiner)

from sinling import preprocess, word_joiner

w1 = preprocess('මුනි')
w2 = preprocess('උතුමා')
results = word_joiner.join(w1, w2)
# Returns a list of possible results after applying join rules ['මුනිතුමා', ...]

Word Splitter (Morphological Splitter) / corpus based - experimental

from sinling import word_splitter

word = '...'
results = word_splitter.split(word)
# Returns a dict containing debug information, base word and affix

Visit here to see some sample splits.

Contributions

  • Contact wayasas.13@cse.mrt.ac.lk if you would like to contribute to this project.

License

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinling-0.3.0.tar.gz (19.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinling-0.3.0-py3-none-any.whl (20.0 MB view details)

Uploaded Python 3

File details

Details for the file sinling-0.3.0.tar.gz.

File metadata

  • Download URL: sinling-0.3.0.tar.gz
  • Upload date:
  • Size: 19.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for sinling-0.3.0.tar.gz
Algorithm Hash digest
SHA256 05f6ae60fa4e2164e082e953c01b72e613a0bb85896fe9cc684aef7e01d5999a
MD5 90170a6e80be5539f07a252db58544f2
BLAKE2b-256 ac812b8cae637f5fd508fe16e58d2582714e84e41aa96f0c8a7fa5f970d410dd

See more details on using hashes here.

File details

Details for the file sinling-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: sinling-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 20.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for sinling-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe4775be07bae8ac936c159f5888953f2b3d7bc3e1c08a26599ac1a12133a277
MD5 ea9fef4dfb31aa236225b721baa73945
BLAKE2b-256 46563dff820e87eb41673ff36051a022c146452f0becd01b8a74568cc9c99b30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page