Skip to main content

spaCy pipeline component for adding emoticon meta data to Doc, Token and Span objects that are text emoticons.

Project description

spacyemoticon: emoticon for spaCy
**************************

This extension was inpired in `spacymoji <https://pypi.org/project/spacymoji/>`_.

`spaCy v2.0 <https://spacy.io/usage/v2>`_ extension and pipeline component
for adding text emoticon meta data to ``Doc`` objects. Detects text emoticons
consisting in one or more characters or symbols into one token. The extension
sets the custom ``Doc``, ``Token`` and ``Span`` attributes ``._.is_emoticon``,
and ``._.emoticon``. You can read more about custom pipeline
components and extension attributes
`here <https://spacy.io/usage/processing-pipelines>`_.

Emoticon are matched using spaCy's ``PhraseMatcher``, and looked up in the data
table provided by the `"emoticons.py"`_.


⏳ Installation
===============

``spacyemoticon`` requires ``spacy`` v2.0.0 or higher.

.. code:: bash

pip install spacyemoticon

☝️ Usage
========

Import the component and initialise it with the shared ``nlp`` object (i.e. an
instance of ``Language``), which is used to initialise the ``PhraseMatcher``
with the shared vocab, and create the match patterns. Then add the component
anywhere in your pipeline.

.. code:: python

import spacy
from spacyemoticon import Emoticon

nlp = spacy.load('en')
emoticon = Emoticon(nlp)
nlp.add_pipe(emoticon, first=True)

doc = nlp(u"This is a test :) <\3")
assert doc[0]._.is_emoticon == False
assert doc[4]._.is_emoticon == True
assert len(doc._.emoticon) == 2

``spacyemoticon`` only cares about the token text, so you can use it on a blank
``Language`` instance (it should work for all
`available languages <https://spacy.io/usage/models#languages>`_!), or in
a pipeline with a loaded model. If you're loading a model and your pipeline
includes a tagger, parser and entity recognizer, make sure to add the emoticon
component as ``first=True``, so the spans are merged right after tokenization,
and *before* the document is parsed. If your text contains a lot of emoticon, this
might even give you a nice boost in parser accuracy.

Available attributes
--------------------

The extension sets attributes on the ``Doc``, ``Span`` and ``Token``. You can
change the attribute names on initialisation of the extension. For more details
on custom components and attributes, see the
`processing pipelines documentation <https://spacy.io/usage/processing-pipelines#custom-components>`_.

====================== ======= ===
``Token._.is_emoticon`` bool Whether the token is an emoticon.
``Doc._.emoticon`` list ``(emoticon, index, description)`` tuples of the document's emoticon.
``Span._.emoticon`` list ``(emoticon, index, description)`` tuples of the span's emoticon.
====================== ======= ===

Settings
--------

On initialisation of ``Emoticon``, you can define the following settings:

=============== ============ ===
``nlp`` ``Language`` The shared ``nlp`` object. Used to initialise the matcher with the shared ``Vocab``, and create ``Doc`` match patterns.
``attrs`` tuple Attributes to set on the ._ property. Defaults to ``('is_emoticon', 'emoticon')``.
``pattern_id`` unicode ID of match pattern, defaults to ``'EMOTICON'``. Can be changed to avoid ID conflicts.
``merge_spans`` bool Merge spans containing multi-character emoticon, defaults to ``True``. Will only merge combined emoticon resulting in one icon, not sequences.
``lookup`` dict Optional lookup table that maps emoticon text strings to custom descriptions, e.g. translations or other annotations.
=============== ============ ===

.. code:: python

emoticon = Emoticon(nlp, attrs=('has_e', 'e'), lookup={u':S'})
nlp.add_pipe(emoticon)
doc = nlp(u"We can be :S heroes")
assert doc[3]._.is_e


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

spacyemoticon-1.0.2-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

spacyemoticon-1.0.2-py2.py3-none-any.whl (5.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file spacyemoticon-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for spacyemoticon-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ba9d4c9ae87299d356d1379b7293a08cf1cb628dcbe21db70b0caa13467af88c
MD5 50282b267f8211215d94fce9cebb18e8
BLAKE2b-256 1e7d8b2051633a2357d2d4981398a000e4a3c88ac5ceaa845f01ded0b76181ea

See more details on using hashes here.

File details

Details for the file spacyemoticon-1.0.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for spacyemoticon-1.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 63845cfae72ca8055990097717f0015aa58335ea2c440e3c837592a97314d952
MD5 b8b8b3c2e0344ff45d0394f5c35a42e0
BLAKE2b-256 0995d885234eb9a23cb25e37e9e8abf3d9cd0eed11340d13aace4d29dd39d0e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page