spacyemoticon

spaCy pipeline component for adding emoticon meta data to Doc, Token and Span objects that are text emoticons.

Project description

spacyemoticon: emoticon for spaCy
**************************

This extension was inpired in `spacymoji <https://pypi.org/project/spacymoji/>`_.

`spaCy v2.0 <https://spacy.io/usage/v2>`_ extension and pipeline component
for adding text emoticon meta data to ``Doc`` objects. Detects text emoticons
consisting in one or more characters or symbols into one token. The extension
sets the custom ``Doc``, ``Token`` and ``Span`` attributes ``._.is_emoticon``,
and ``._.emoticon``. You can read more about custom pipeline
components and extension attributes
`here <https://spacy.io/usage/processing-pipelines>`_.

Emoticon are matched using spaCy's ``PhraseMatcher``, and looked up in the data
table provided by the `"emoticons.py"`_.

⏳ Installation
===============

``spacyemoticon`` requires ``spacy`` v2.0.0 or higher.

.. code:: bash

pip install spacyemoticon

☝️ Usage
========

Import the component and initialise it with the shared ``nlp`` object (i.e. an
instance of ``Language``), which is used to initialise the ``PhraseMatcher``
with the shared vocab, and create the match patterns. Then add the component
anywhere in your pipeline.

.. code:: python

import spacy
from spacyemoticon import Emoticon

nlp = spacy.load('en')
emoticon = Emoticon(nlp)
nlp.add_pipe(emoticon, first=True)

doc = nlp(u"This is a test :) <\3")
assert doc[0]._.is_emoticon == False
assert doc[4]._.is_emoticon == True
assert len(doc._.emoticon) == 2

``spacyemoticon`` only cares about the token text, so you can use it on a blank
``Language`` instance (it should work for all
`available languages <https://spacy.io/usage/models#languages>`_!), or in
a pipeline with a loaded model. If you're loading a model and your pipeline
includes a tagger, parser and entity recognizer, make sure to add the emoticon
component as ``first=True``, so the spans are merged right after tokenization,
and *before* the document is parsed. If your text contains a lot of emoticon, this
might even give you a nice boost in parser accuracy.

Available attributes
--------------------

The extension sets attributes on the ``Doc``, ``Span`` and ``Token``. You can
change the attribute names on initialisation of the extension. For more details
on custom components and attributes, see the
`processing pipelines documentation <https://spacy.io/usage/processing-pipelines#custom-components>`_.

====================== ======= ===
``Token._.is_emoticon`` bool Whether the token is an emoticon.
``Doc._.emoticon`` list ``(emoticon, index, description)`` tuples of the document's emoticon.
``Span._.emoticon`` list ``(emoticon, index, description)`` tuples of the span's emoticon.
====================== ======= ===

Settings
--------

On initialisation of ``Emoticon``, you can define the following settings:

=============== ============ ===
``nlp`` ``Language`` The shared ``nlp`` object. Used to initialise the matcher with the shared ``Vocab``, and create ``Doc`` match patterns.
``attrs`` tuple Attributes to set on the ._ property. Defaults to ``('is_emoticon', 'emoticon')``.
``pattern_id`` unicode ID of match pattern, defaults to ``'EMOTICON'``. Can be changed to avoid ID conflicts.
``merge_spans`` bool Merge spans containing multi-character emoticon, defaults to ``True``. Will only merge combined emoticon resulting in one icon, not sequences.
``lookup`` dict Optional lookup table that maps emoticon text strings to custom descriptions, e.g. translations or other annotations.
=============== ============ ===

.. code:: python

emoticon = Emoticon(nlp, attrs=('has_e', 'e'), lookup={u':S'})
nlp.add_pipe(emoticon)
doc = nlp(u"We can be :S heroes")
assert doc[3]._.is_e

Project details

Release history Release notifications | RSS feed

This version

1.0.2

May 24, 2018

1.0.0

May 24, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spacyemoticon-1.0.2-py3-none-any.whl (5.1 kB view details)

Uploaded May 24, 2018 Python 3

spacyemoticon-1.0.2-py2.py3-none-any.whl (5.1 kB view details)

Uploaded May 24, 2018 Python 2Python 3

File details

Details for the file spacyemoticon-1.0.2-py3-none-any.whl.

File metadata

Download URL: spacyemoticon-1.0.2-py3-none-any.whl
Upload date: May 24, 2018
Size: 5.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for spacyemoticon-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba9d4c9ae87299d356d1379b7293a08cf1cb628dcbe21db70b0caa13467af88c`
MD5	`50282b267f8211215d94fce9cebb18e8`
BLAKE2b-256	`1e7d8b2051633a2357d2d4981398a000e4a3c88ac5ceaa845f01ded0b76181ea`

See more details on using hashes here.

File details

Details for the file spacyemoticon-1.0.2-py2.py3-none-any.whl.

File metadata

Download URL: spacyemoticon-1.0.2-py2.py3-none-any.whl
Upload date: May 24, 2018
Size: 5.1 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for spacyemoticon-1.0.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`63845cfae72ca8055990097717f0015aa58335ea2c440e3c837592a97314d952`
MD5	`b8b8b3c2e0344ff45d0394f5c35a42e0`
BLAKE2b-256	`0995d885234eb9a23cb25e37e9e8abf3d9cd0eed11340d13aace4d29dd39d0e4`

See more details on using hashes here.

spacyemoticon 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes