Multilingual syllable annotation pipeline component for spacy
Project description
Spacy Syllables
A spacy 2+ pipeline component for adding multilingual syllable annotation to tokens.
- Uses well established pyphen for the syllables.
- Supports a ton of languages
- Ease of use thx to the awesome pipeline framework in spacy
Install
$ pip install spacy_syllables
which also installs the following dependencies:
- spacy = "^2.2.3"
- pyphen = "^0.9.5"
Usage
The SpacySyllables
class autodetects language from the given spacy nlp instance, but you can also override the detected language by specifying the lang
parameter during instantiation, see how here.
Normal usecase
import spacy
from spacy_syllables import SpacySyllables
nlp = spacy.load("en_core_web_sm")
syllables = SpacySyllables(nlp)
nlp.add_pipe(syllables, after="tagger")
assert nlp.pipe_names == ["tagger", "syllables", "parser", "ner"]
doc = nlp("terribly long")
data = [(token.text, token._.syllables, token._.syllables_count) for token in doc]
assert data == [("terribly", ["ter", "ri", "bly"], 3), ("long", ["long"], 1)]
more examples in tests
Dev setup / testing
we are using
install
then install the dev package and pyenv versions
$ poetry install
$ poetry --session install_pyenv_versions
run tests
$ poetry run nox
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spacy_syllables-0.0.1.tar.gz
(3.8 kB
view hashes)
Built Distribution
Close
Hashes for spacy_syllables-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d09c0bd758b5ef9cd201b7e830c667dd4961c51001ba6989fbe0d69a5623afc |
|
MD5 | 5b356f55b136f76a778ad781d2ce1f7c |
|
BLAKE2b-256 | 7b0aa7e0fd07833e4f1858873de9255b084c0105ae5702ff2de132c9bc7df54f |