Skip to main content

Finnish syllabifier and compound segmenter

Project description

https://travis-ci.org/tsnaomi/finnsyll.svg?branch=master

FinnSyll

FinnSyll is a Python library that syllabifies words according to Finnish syllabification principles. It is also equipped with a Finnish compound splitter. More details/docs to come.

Installation

$ pip install FinnSyll

Basic usage

First, instantiate a FinnSyll object.

>>> from finnsyll import FinnSyll
>>> f = FinnSyll()

To syllabify:

>>> f.syllabify('runoja')
['ru.no.ja']  # internal syllable boundaries are indicated with '.'

To segment compounds:

>>> f.split('sosiaalidemokraattien')
'sosiaali=demokraattien'  # internal word boundaries are indicated with '='

Optional arguments

The syllabifier can be customized along two different parameters: variation and compound splitting.

variation

Instantiating a FinnSyll object with variation=True (default) will allow the syllabifier to return multiple syllabifications if variation is predicted. When variation=True, the syllabifier will return a list. Setting variation to False will cause the syllabifier to return a string containing the first predicted syllabification.

Variation:

>>> f = FinnSyll(variation=True)
>>> f.syllabify('runoja')
['ru.no.ja']
>>> f.syllabify('vapaus')
['va.pa.us', 'va.paus']

No variation:

>>> f = FinnSyll(variation=False)
>>> f.syllabify('runoja')
'ru.no.ja'
>>> f.syllabify('vapaus')
'va.pa.us'

split_compounds

When instantiating a FinnSyll object with split_compounds=True (default), the syllabifier will first attempt to split the input into constituent words before syllabifying it. This forces the syllabifier to insert a syllable boundary in between identified constituent words. The syllabifier will skip this step if split_compounds is set to False.

Compound splitting:

>>> f = FinnSyll(split_compounds=True)
>>> f.syllabify('rahoituserien')  # rahoitus=erien
['ra.hoi.tus.e.ri.en']

No compound splitting:

>>> f = FinnSyll(split_compounds=False)
>>> f.syllabify('rahoituserien')
['ra.hoi.tu.se.ri.en']  # incorrect

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

FinnSyll-2.0.0-py2.py3-none-any.whl (884.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file FinnSyll-2.0.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for FinnSyll-2.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 364df4c6276c95d6327b169724c6d87778d310059ca82f5996577a63a996a95e
MD5 237bbae5a706885f28864e363ba57562
BLAKE2b-256 93a9929b873616be5ec218bf33a3fde332f86aa6a61d306557c8f5a1549a6784

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page