Finnish syllabifier and compound segmenter
Project description
## FinnSyll
FinnSyll is a Python library that syllabifies words according to Finnish syllabification principles. It is also equipped with a Finnish compound splitter. More details/docs to come.
### Installation
`$ pip install FinnSyll`
### Basic usage
First, instantiate a `FinnSyll` object.
` >>> from finnsyll import FinnSyll >>> f = FinnSyll() `
To syllabify: ` >>> f.syllabify('runoja') ['ru.no.ja'] # internal syllable boundaries are indicated with '.' `
To segment compounds: ` >>> f.split('sosiaalidemokraattien') 'sosiaali=demokraattien' # internal word boundaries are indicated with '=' `
### Optional arguments
The syllabifier can be customized along two different parameters: variation and compound splitting.
####variation
Instantiating a `FinnSyll` object with `variation=True` (default) will allow the syllabifier to return multiple syllabifications if variation is predicted. When `variation=True`, the syllabifier will return a list. Setting `variation` to `False` will cause the syllabifier to return a string containing the first predicted syllabification.
Variation: ` >>> f = FinnSyll(variation=True) >>> f.syllabify('runoja') ['ru.no.ja'] >>> f.syllabify('vapaus') ['va.pa.us', 'va.paus'] `
No variation: ` >>> f = FinnSyll(variation=False) >>> f.syllabify('runoja') 'ru.no.ja' >>> f.syllabify('vapaus') 'va.pa.us' `
#### split_compounds
When instantiating a `FinnSyll` object with `split_compounds=True` (default), the syllabifier will first attempt to split the input into constituent words before syllabifying it. This forces the syllabifier to insert a syllable boundary in between identified constituent words. The syllabifier will skip this step if `split_compounds` is set to `False`.
Compound splitting: ` >>> f = FinnSyll(split_compounds=True) >>> f.syllabify('rahoituserien') # rahoitus=erien ['ra.hoi.tus.e.ri.en'] `
No compound splitting: ` >>> f = FinnSyll(split_compounds=False) >>> f.syllabify('rahoituserien') ['ra.hoi.tu.se.ri.en'] # incorrect `
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for FinnSyll-1.0.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb2e99f0d6202dd97651dc9a89ab57996cd04dcd260f0021f26a231abc940ac4 |
|
MD5 | 400151c94fb6219e060b98b156d6573c |
|
BLAKE2b-256 | 3d1d8846f9261181de7e3445ca4d344a25ddc0e3e7fc1c790d51bf6df1cc1578 |