spaCy pipeline component for adding emoji metadata to Doc, Token and Span objects

These details have not been verified by PyPI

Project links

Homepage

Project description

spacymoji: emoji for spaCy

spaCy extension and pipeline component for adding emoji meta data to Doc objects. Detects emoji consisting of one or more unicode characters, and can optionally merge multi-char emoji (combined pictures, emoji with skin tone modifiers) into one token. Human-readable emoji descriptions are added as a custom attribute, and an optional lookup table can be provided for your own descriptions. The extension sets the custom Doc, Token and Span attributes ._.is_emoji, ._.emoji_desc, ._.has_emoji and ._.emoji. You can read more about custom pipeline components and extension attributes here.

Emoji are matched using spaCy's PhraseMatcher, and looked up in the data table provided by the emoji package.

⏳ Installation

spacymoji requires spacy v3.0.0 or higher. For spaCy v2.x, install spacymoji==2.0.0.

pip install spacymoji

☝️ Usage

Import the component and add it anywhere in your pipeline using the string name of the "emoji" component factory:

import spacy

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("emoji", first=True)
doc = nlp("This is a test 😻 👍🏿")
assert doc._.has_emoji is True
assert doc[2:5]._.has_emoji is True
assert doc[0]._.is_emoji is False
assert doc[4]._.is_emoji is True
assert doc[5]._.emoji_desc == "thumbs up dark skin tone"
assert len(doc._.emoji) == 2
assert doc._.emoji[1] == ("👍🏿", 5, "thumbs up dark skin tone")

spacymoji only cares about the token text, so you can use it on a blank Language instance (it should work for all available languages!), or in a pipeline with a loaded pipeline. If your pipeline includes a tagger, parser and entity recognizer, make sure to add the emoji component as first=True, so the spans are merged right after tokenization, and before the document is parsed. If your text contains a lot of emoji, this might even give you a nice boost in parser accuracy.

Available attributes

The extension sets attributes on the Doc, Span and Token. You can change the attribute names (and other parameters of the Emoji component) by passing them via the config parameter in the nlp.add_pipe(...) method. For more details on custom components and attributes, see the processing pipelines documentation.

Attribute	Type	Description
`Token._.is_emoji`	bool	Whether the token is an emoji.
`Token._.emoji_desc`	str	A human-readable description of the emoji.
`Doc._.has_emoji`	bool	Whether the document contains emoji.
`Doc._.emoji`	List[Tuple[str, int, str]]	`(emoji, index, description)` tuples of the document's emoji.
`Span._.has_emoji`	bool	Whether the span contains emoji.
`Span._.emoji`	List[Tuple[str, int, str]]	`(emoji, index, description)` tuples of the span's emoji.

Settings

You can configure the emoji factory by setting any of the following parameters in the config dictionary:

Setting	Type	Description
`attrs`	Tuple[str, str, str, str]	Attributes to set on the `._` property. Defaults to `('has_emoji', 'is_emoji', 'emoji_desc', 'emoji')`.
`pattern_id`	str	ID of match pattern, defaults to `'EMOJI'`. Can be changed to avoid ID conflicts.
`merge_spans`	bool	Merge spans containing multi-character emoji, defaults to `True`. Will only merge combined emoji resulting in one icon, not sequences.
`lookup`	Dict[str, str]	Optional lookup table that maps emoji strings to custom descriptions, e.g. translations or other annotations.

emoji_config = {"attrs": ("has_e", "is_e", "e_desc", "e"), lookup={"👨‍🎤": "David Bowie"})
nlp.add_pipe(emoji, first=True, config=emoji_config)
doc = nlp("We can be 👨‍🎤 heroes")
assert doc[3]._.is_e
assert doc[3]._.e_desc == "David Bowie"

If you're training a pipeline, you can define the component config in your config.cfg:

[nlp]
pipeline = ["emoji", "ner"]
# ...

[components.emoji]
factory = "emoji"
merge_spans = false

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

3.1.0

May 10, 2023

3.0.1

Apr 20, 2021

3.0.0

Apr 19, 2021

2.0.0

Apr 9, 2019

1.0.0

Dec 9, 2017

0.0.1

Oct 12, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacymoji-3.1.0.tar.gz (9.0 kB view details)

Uploaded May 10, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spacymoji-3.1.0-py2.py3-none-any.whl (8.5 kB view details)

Uploaded May 10, 2023 Python 2Python 3

File details

Details for the file spacymoji-3.1.0.tar.gz.

File metadata

Download URL: spacymoji-3.1.0.tar.gz
Upload date: May 10, 2023
Size: 9.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.9

File hashes

Hashes for spacymoji-3.1.0.tar.gz
Algorithm	Hash digest
SHA256	`55f171fd88bb1131ea7dd19754541c3f9206b19d608ed965b5f95e1e81107e94`
MD5	`da4cff8205125923f6006be335acb79b`
BLAKE2b-256	`ef25fc60fecc03e34078f32402694139bab644e6f64a45341a3270539a93bf8b`

See more details on using hashes here.

File details

Details for the file spacymoji-3.1.0-py2.py3-none-any.whl.

File metadata

Download URL: spacymoji-3.1.0-py2.py3-none-any.whl
Upload date: May 10, 2023
Size: 8.5 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.9

File hashes

Hashes for spacymoji-3.1.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`443df056e4bf23afb1f6ff8a372d9088e02d5eb2bd4a37a51fa0d19c35d0312b`
MD5	`279745c4d6abdc0aebd70641e7c5c687`
BLAKE2b-256	`3c5dcf1f18f9c3a88fc2cd51aad40f7bfeb9657d3c2c937ff950ede3e6029079`

See more details on using hashes here.

spacymoji 3.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

spacymoji: emoji for spaCy

⏳ Installation

☝️ Usage

Available attributes

Settings

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes