Skip to main content

spacy pipeline component for syllables

Project description

spacy syllables

Spacy Syllables

example workflow Latest Version Python Support

Buy Me A Coffee

A spacy 2+ pipeline component for adding multilingual syllable annotation to tokens.

  • Uses well established pyphen for the syllables.
  • Supports a ton of languages
  • Ease of use thx to the awesome pipeline framework in spacy

Install

$ pip install spacy_syllables

which also installs the following dependencies:

  • spacy = "^2.2.3"
  • pyphen = "^0.9.5"

Usage

The SpacySyllables class autodetects language from the given spacy nlp instance, but you can also override the detected language by specifying the lang parameter during instantiation, see how here.

Normal usecase

import spacy
from spacy_syllables import SpacySyllables

nlp = spacy.load("en_core_web_sm")

nlp.add_pipe("syllables", after="tagger")

assert nlp.pipe_names == ["tok2vec", "tagger", "syllables", "parser", "ner", "attribute_ruler", "lemmatizer"]

doc = nlp("terribly long")

data = [(token.text, token._.syllables, token._.syllables_count) for token in doc]

assert data == [("terribly", ["ter", "ri", "bly"], 3), ("long", ["long"], 1)]

more examples in tests

Migrating from spacy 2.x to 3.0

In spacy 2.x, spacy_syllables was originally added to the pipeline by instantiating a SpacySyllables object with the desired options and adding it to the pipeline:

from spacy_syllables import SpacySyllables

syllables = SpacySyllables(nlp, "en_US")

nlp.add_pipe(syllables, after="tagger")

In spacy 3.0, you now add the component to the pipeline simply by adding it by name, setting custom configuration information in the add_pipe() parameters:

from spacy_syllables import SpacySyllables

nlp.add_pipe("syllables", after="tagger", config={"lang": "en_US"})

In addition, the default pipeline components have changed between 2.x and 3.0; please make sure to update any asserts you have that check for these. e.g.:

spacy 2.x:

assert nlp.pipe_names == ["tagger", "syllables", "parser", "ner"]

spacy 3.0:

assert nlp.pipe_names == ["tok2vec", "tagger", "syllables", "parser", "ner", "attribute_ruler", "lemmatizer"]

Dev setup / testing

install

install the dev package and pyenv versions

$ pip install -e ".[dev]"
$ python -m spacy download en_core_web_sm

run tests

$ black .
$ pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_syllables-3.0.2.tar.gz (4.7 kB view hashes)

Uploaded source

Built Distribution

spacy_syllables-3.0.2-py3-none-any.whl (5.1 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page