Skip to main content

Finnish syllabifier and compound segmenter

Project description

FinnSyll
********

FinnSyll is a Python library that syllabifies words according to Finnish syllabification principles.
It is also equipped with a Finnish compound splitter.
More details/docs to come.

Installation
============

``$ pip install FinnSyll``

Basic usage
===========

First, instantiate a ``FinnSyll`` object.
::
>>> from finnsyll import FinnSyll
>>> f = FinnSyll()

To syllabify:
::
>>> f.syllabify('runoja')
['ru.no.ja'] # internal syllable boundaries are indicated with '.'

To segment compounds:
::
>>> f.split('sosiaalidemokraattien')
'sosiaali=demokraattien' # internal word boundaries are indicated with '='

Optional arguments
==================

The syllabifier can be customized along two different parameters: variation and compound splitting.

variation
---------

Instantiating a ``FinnSyll`` object with ``variation=True`` (default) will allow the syllabifier to return multiple syllabifications if variation is predicted. When ``variation=True``, the syllabifier will return a list. Setting ``variation`` to ``False`` will cause the syllabifier to return a string containing the first predicted syllabification.

*Variation*:
::
>>> f = FinnSyll(variation=True)
>>> f.syllabify('runoja')
['ru.no.ja']
>>> f.syllabify('vapaus')
['va.pa.us', 'va.paus']

*No variation*:
::
>>> f = FinnSyll(variation=False)
>>> f.syllabify('runoja')
'ru.no.ja'
>>> f.syllabify('vapaus')
'va.pa.us'

split_compounds
---------------

When instantiating a ``FinnSyll`` object with ``split_compounds=True`` (default), the syllabifier will first attempt to split the input into constituent words before syllabifying it. This forces the syllabifier to insert a syllable boundary in between identified constituent words. The syllabifier will skip this step if ``split_compounds`` is set to ``False``.

*Compound splitting*:
::
>>> f = FinnSyll(split_compounds=True)
>>> f.syllabify('rahoituserien') # rahoitus=erien
['ra.hoi.tus.e.ri.en']

*No compound splitting*:
::
>>> f = FinnSyll(split_compounds=False)
>>> f.syllabify('rahoituserien')
['ra.hoi.tu.se.ri.en'] # incorrect


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for FinnSyll, version 1.0.1
Filename, size File type Python version Upload date Hashes
Filename, size FinnSyll-1.0.1-py2.py3-none-any.whl (871.4 kB) File type Wheel Python version py2.py3 Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page