spacy pipeline component for syllables
Project description
Spacy Syllables
A spacy 2+ pipeline component for adding multilingual syllable annotation to tokens.
- Uses well established pyphen for the syllables.
- Supports a ton of languages
- Ease of use thx to the awesome pipeline framework in spacy
Install
$ pip install spacy_syllables
which also installs the following dependencies:
- spacy = "^2.2.3"
- pyphen = "^0.9.5"
Usage
The SpacySyllables
class autodetects language from the given spacy nlp instance, but you can also override the detected language by specifying the lang
parameter during instantiation, see how here.
Normal usecase
import spacy
from spacy_syllables import SpacySyllables
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("syllables", after="tagger")
assert nlp.pipe_names == ["tok2vec", "tagger", "syllables", "parser", "ner", "attribute_ruler", "lemmatizer"]
doc = nlp("terribly long")
data = [(token.text, token._.syllables, token._.syllables_count) for token in doc]
assert data == [("terribly", ["ter", "ri", "bly"], 3), ("long", ["long"], 1)]
more examples in tests
Migrating from spacy 2.x to 3.0
In spacy 2.x, spacy_syllables was originally added to the pipeline by instantiating a SpacySyllables
object with the desired options and adding it to the pipeline:
from spacy_syllables import SpacySyllables
syllables = SpacySyllables(nlp, "en_US")
nlp.add_pipe(syllables, after="tagger")
In spacy 3.0, you now add the component to the pipeline simply by adding it by name, setting custom configuration information in the add_pipe()
parameters:
from spacy_syllables import SpacySyllables
nlp.add_pipe("syllables", after="tagger", config={"lang": "en_US"})
In addition, the default pipeline components have changed between 2.x and 3.0; please make sure to update any asserts you have that check for these. e.g.:
spacy 2.x:
assert nlp.pipe_names == ["tagger", "syllables", "parser", "ner"]
spacy 3.0:
assert nlp.pipe_names == ["tok2vec", "tagger", "syllables", "parser", "ner", "attribute_ruler", "lemmatizer"]
Dev setup / testing
install
install the dev package and pyenv versions
$ pip install -e ".[dev]"
$ python -m spacy download en_core_web_sm
run tests
$ black .
$ pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file spacy_syllables-3.0.2.tar.gz
.
File metadata
- Download URL: spacy_syllables-3.0.2.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f45a8307382daa0c65d32a996d84bd5dd90552f42e675f721342c35ba3d032b |
|
MD5 | ea123b4bd0d59ccc906b3d2fc1714d8e |
|
BLAKE2b-256 | 159ab94b12188ef0a08e5b87ab95f2f4018365ade7ff36ba22496e6af1c98b21 |
File details
Details for the file spacy_syllables-3.0.2-py3-none-any.whl
.
File metadata
- Download URL: spacy_syllables-3.0.2-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c67cfc086624c643f510bb05c53c93c323de4357761b500ce8d9e48942618ed |
|
MD5 | f8f406cb85c4ceaf2897574e8769d6c7 |
|
BLAKE2b-256 | bcc0412775c4db008df8f5d3887e0d96fa4d14306b9ba8ae257c21aa98a3ec4b |