spaCy pipeline component for adding emoticon meta data to Doc, Token and Span objects that are text emoticons.
Project description
spacyemoticon: emoticon for spaCy
**************************
This extension was inpired in `spacymoji <https://pypi.org/project/spacymoji/>`_.
`spaCy v2.0 <https://spacy.io/usage/v2>`_ extension and pipeline component
for adding text emoticon meta data to ``Doc`` objects. Detects text emoticons
consisting in one or more characters or symbols into one token. The extension
sets the custom ``Doc``, ``Token`` and ``Span`` attributes ``._.is_emoticon``,
and ``._.emoticon``. You can read more about custom pipeline
components and extension attributes
`here <https://spacy.io/usage/processing-pipelines>`_.
Emoticon are matched using spaCy's ``PhraseMatcher``, and looked up in the data
table provided by the `"emoticons.py"`_.
⏳ Installation
===============
``spacyemoticon`` requires ``spacy`` v2.0.0 or higher.
.. code:: bash
pip install spacyemoticon
☝️ Usage
========
Import the component and initialise it with the shared ``nlp`` object (i.e. an
instance of ``Language``), which is used to initialise the ``PhraseMatcher``
with the shared vocab, and create the match patterns. Then add the component
anywhere in your pipeline.
.. code:: python
import spacy
from spacyemoticon import Emoticon
nlp = spacy.load('en')
emoticon = Emoticon(nlp)
nlp.add_pipe(emoticon, first=True)
doc = nlp(u"This is a test :) <\3")
assert doc[0]._.is_emoticon == False
assert doc[4]._.is_emoticon == True
assert len(doc._.emoticon) == 2
``spacyemoticon`` only cares about the token text, so you can use it on a blank
``Language`` instance (it should work for all
`available languages <https://spacy.io/usage/models#languages>`_!), or in
a pipeline with a loaded model. If you're loading a model and your pipeline
includes a tagger, parser and entity recognizer, make sure to add the emoticon
component as ``first=True``, so the spans are merged right after tokenization,
and *before* the document is parsed. If your text contains a lot of emoticon, this
might even give you a nice boost in parser accuracy.
Available attributes
--------------------
The extension sets attributes on the ``Doc``, ``Span`` and ``Token``. You can
change the attribute names on initialisation of the extension. For more details
on custom components and attributes, see the
`processing pipelines documentation <https://spacy.io/usage/processing-pipelines#custom-components>`_.
====================== ======= ===
``Token._.is_emoticon`` bool Whether the token is an emoticon.
``Doc._.emoticon`` list ``(emoticon, index, description)`` tuples of the document's emoticon.
``Span._.emoticon`` list ``(emoticon, index, description)`` tuples of the span's emoticon.
====================== ======= ===
Settings
--------
On initialisation of ``Emoticon``, you can define the following settings:
=============== ============ ===
``nlp`` ``Language`` The shared ``nlp`` object. Used to initialise the matcher with the shared ``Vocab``, and create ``Doc`` match patterns.
``attrs`` tuple Attributes to set on the ._ property. Defaults to ``('is_emoticon', 'emoticon')``.
``pattern_id`` unicode ID of match pattern, defaults to ``'EMOTICON'``. Can be changed to avoid ID conflicts.
``merge_spans`` bool Merge spans containing multi-character emoticon, defaults to ``True``. Will only merge combined emoticon resulting in one icon, not sequences.
``lookup`` dict Optional lookup table that maps emoticon text strings to custom descriptions, e.g. translations or other annotations.
=============== ============ ===
.. code:: python
emoticon = Emoticon(nlp, attrs=('has_e', 'e'), lookup={u':S'})
nlp.add_pipe(emoticon)
doc = nlp(u"We can be :S heroes")
assert doc[3]._.is_e
**************************
This extension was inpired in `spacymoji <https://pypi.org/project/spacymoji/>`_.
`spaCy v2.0 <https://spacy.io/usage/v2>`_ extension and pipeline component
for adding text emoticon meta data to ``Doc`` objects. Detects text emoticons
consisting in one or more characters or symbols into one token. The extension
sets the custom ``Doc``, ``Token`` and ``Span`` attributes ``._.is_emoticon``,
and ``._.emoticon``. You can read more about custom pipeline
components and extension attributes
`here <https://spacy.io/usage/processing-pipelines>`_.
Emoticon are matched using spaCy's ``PhraseMatcher``, and looked up in the data
table provided by the `"emoticons.py"`_.
⏳ Installation
===============
``spacyemoticon`` requires ``spacy`` v2.0.0 or higher.
.. code:: bash
pip install spacyemoticon
☝️ Usage
========
Import the component and initialise it with the shared ``nlp`` object (i.e. an
instance of ``Language``), which is used to initialise the ``PhraseMatcher``
with the shared vocab, and create the match patterns. Then add the component
anywhere in your pipeline.
.. code:: python
import spacy
from spacyemoticon import Emoticon
nlp = spacy.load('en')
emoticon = Emoticon(nlp)
nlp.add_pipe(emoticon, first=True)
doc = nlp(u"This is a test :) <\3")
assert doc[0]._.is_emoticon == False
assert doc[4]._.is_emoticon == True
assert len(doc._.emoticon) == 2
``spacyemoticon`` only cares about the token text, so you can use it on a blank
``Language`` instance (it should work for all
`available languages <https://spacy.io/usage/models#languages>`_!), or in
a pipeline with a loaded model. If you're loading a model and your pipeline
includes a tagger, parser and entity recognizer, make sure to add the emoticon
component as ``first=True``, so the spans are merged right after tokenization,
and *before* the document is parsed. If your text contains a lot of emoticon, this
might even give you a nice boost in parser accuracy.
Available attributes
--------------------
The extension sets attributes on the ``Doc``, ``Span`` and ``Token``. You can
change the attribute names on initialisation of the extension. For more details
on custom components and attributes, see the
`processing pipelines documentation <https://spacy.io/usage/processing-pipelines#custom-components>`_.
====================== ======= ===
``Token._.is_emoticon`` bool Whether the token is an emoticon.
``Doc._.emoticon`` list ``(emoticon, index, description)`` tuples of the document's emoticon.
``Span._.emoticon`` list ``(emoticon, index, description)`` tuples of the span's emoticon.
====================== ======= ===
Settings
--------
On initialisation of ``Emoticon``, you can define the following settings:
=============== ============ ===
``nlp`` ``Language`` The shared ``nlp`` object. Used to initialise the matcher with the shared ``Vocab``, and create ``Doc`` match patterns.
``attrs`` tuple Attributes to set on the ._ property. Defaults to ``('is_emoticon', 'emoticon')``.
``pattern_id`` unicode ID of match pattern, defaults to ``'EMOTICON'``. Can be changed to avoid ID conflicts.
``merge_spans`` bool Merge spans containing multi-character emoticon, defaults to ``True``. Will only merge combined emoticon resulting in one icon, not sequences.
``lookup`` dict Optional lookup table that maps emoticon text strings to custom descriptions, e.g. translations or other annotations.
=============== ============ ===
.. code:: python
emoticon = Emoticon(nlp, attrs=('has_e', 'e'), lookup={u':S'})
nlp.add_pipe(emoticon)
doc = nlp(u"We can be :S heroes")
assert doc[3]._.is_e
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spacyemoticon-1.0.2-py3-none-any.whl.
File metadata
- Download URL: spacyemoticon-1.0.2-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba9d4c9ae87299d356d1379b7293a08cf1cb628dcbe21db70b0caa13467af88c
|
|
| MD5 |
50282b267f8211215d94fce9cebb18e8
|
|
| BLAKE2b-256 |
1e7d8b2051633a2357d2d4981398a000e4a3c88ac5ceaa845f01ded0b76181ea
|
File details
Details for the file spacyemoticon-1.0.2-py2.py3-none-any.whl.
File metadata
- Download URL: spacyemoticon-1.0.2-py2.py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63845cfae72ca8055990097717f0015aa58335ea2c440e3c837592a97314d952
|
|
| MD5 |
b8b8b3c2e0344ff45d0394f5c35a42e0
|
|
| BLAKE2b-256 |
0995d885234eb9a23cb25e37e9e8abf3d9cd0eed11340d13aace4d29dd39d0e4
|