Skip to main content

A language processing tool for Sinhalese (සිංහල)

Project description

A language processing tool for Sinhalese (සිංහල).

Update 2020.08.16: Add pypi package @ https://pypi.org/project/sinling/.

Update 2020.08.16: Integrated Part of speech tagger and stemmer tool.

Update 2019.07.21: This tool no longer requires java to run sinhala tokenizer. All java code is ported to Python implementation for convenience.

Binder PyPI version

Installation

from source (recommended)

Steps-

  1. Download stat.split.pickle to the resources folder
  2. Import required tools from the sinling module in your desired project (you may have to append this project path to your path environment variable)

pip

Run the following command in your virtualenv to install this package [need fix].

pip install sinling

How to use

Sinhala Tokenizer

from sinling import SinhalaTokenizer

tokenizer = SinhalaTokenizer()

sentence = '...'  # your sentence

tokenizer.tokenize(sentence)

Sinhala Stemmer (Experimental)

from sinling import SinhalaStemmer

stemmer = SinhalaStemmer()

word = '...'  # your sentence

stemmer.stem(word)

Please cite sinhala-stemmer if you are using this implementation.

Part-of-Speech Tagger

from sinling import SinhalaTokenizer, POSTagger

tokenizer = SinhalaTokenizer()

document = '...'  # may contain multiple sentences

tokenized_sentences = [tokenizer.tokenize(f'{ss}.') for ss in tokenizer.split_sentences(document)]

tagger = POSTagger()

pos_tags = tagger.predict(tokenized_sentences)

Word Joiner (Morphological Joiner)

from sinling import preprocess, word_joiner

w1 = preprocess('මුනි')
w2 = preprocess('උතුමා')
results = word_joiner.join(w1, w2)
# Returns a list of possible results after applying join rules ['මුනිතුමා', ...]

Word Splitter (Morphological Splitter) / corpus based - experimental

from sinling import word_splitter

word = '...'
results = word_splitter.split(word)
# Returns a dict containing debug information, base word and affix

Visit here to see some sample splits.

Contributions

  • Contact wayasas.13@cse.mrt.ac.lk if you would like to contribute to this project.

License

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinling-0.3.4.tar.gz (19.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinling-0.3.4-py3-none-any.whl (20.0 MB view details)

Uploaded Python 3

File details

Details for the file sinling-0.3.4.tar.gz.

File metadata

  • Download URL: sinling-0.3.4.tar.gz
  • Upload date:
  • Size: 19.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for sinling-0.3.4.tar.gz
Algorithm Hash digest
SHA256 202a9da4a7e88fb06ad88e13c3ca2345b45d4c360c2415ce0beb16e4629f12e9
MD5 91f82bde2e363d80c2ee8e35b5d520b7
BLAKE2b-256 a0418238c80a118b4c2876c039fe9c8a2aedf1eb00049a92e459c2d78c7c00bd

See more details on using hashes here.

File details

Details for the file sinling-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: sinling-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 20.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for sinling-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9154865a3e19ba6c159a9f5da48df57be98723f20fee89d359dd6207afefde60
MD5 c51407c347e0db973039f71458fff9e0
BLAKE2b-256 c60801ae5aad2346d52a10962db07bc599500b9d6e6676713a55cbad1675d6cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page