Replace unicode emojis by its corresponding image representation. Supports Unicode 9 standard.
Project description
# emoji-unicode
[![Build Status](https://img.shields.io/travis/nitely/emoji-unicode.svg?style=flat-square)](https://travis-ci.org/nitely/emoji-unicode)
[![Coverage Status](https://img.shields.io/coveralls/nitely/emoji-unicode.svg?style=flat-square)](https://coveralls.io/r/nitely/emoji-unicode)
[![pypi](https://img.shields.io/pypi/v/emoji-unicode.svg?style=flat-square)](https://pypi.python.org/pypi/emoji-unicode)
[![licence](https://img.shields.io/pypi/l/emoji-unicode.svg?style=flat-square)](https://raw.githubusercontent.com/nitely/emoji-unicode/master/LICENSE)
Replace unicode emojis in a text. Supports *Unicode 10* standard.
## Compatibility
* Python 2.7 ([wide-build](http://emoji-unicode.readthedocs.org/en/latest/python2.html)),
3.3, 3.4, 3.5 and +3.6 (recommended)
## Install
```
$ pip install emoji-unicode
```
## Usage
### Replace
[docs](http://emoji-unicode.readthedocs.org/en/latest/api.html#emoji_unicode.replace)
```python
emoji_unicode.replace(
u'Time to ⛽',
lambda e: u'<img src="{filename}.svg" alt="{raw}">'.format(filename=e.code_points, raw=e.unicode)
)
# Time to <img src="26fd.svg" alt="⛽">
```
> Note: the [Emoji.code_points](http://emoji-unicode.readthedocs.org/en/latest/api.html#emoji_unicode.Emoji.code_points) are normalized.
### Normalize
This function removes optional characters that may appear depending on
the input source (Android, iOS, etc). For example the emoji variation `\\uFE0F`
may (or may not) appear in between a emoji and a skin tone modifier,
making the code points to be different. It should be used
to rename the image files.
[docs](http://emoji-unicode.readthedocs.org/en/latest/api.html#emoji_unicode.normalize)
```python
emoji_unicode.normalize(u'1F468-200D-2764-FE0F-200D-1F468')
# 1f468-2764-1f468
```
### Replace (advanced)
```python
PATTERN = re.compile(emoji_unicode.RE_PATTERN_TEMPLATE)
def match_handler(m):
e = emoji_unicode.Emoji(unicode=m.group('emoji'))
return u'<img src="{filename}.svg" alt="{raw}">'.format(
filename=e.code_points,
raw=e.unicode
)
re.sub(PATTERN, match_handler, u'Time to ⛽')
# Time to <img src="26fd.svg" alt="⛽">
```
## Docs
[docs](http://emoji-unicode.readthedocs.org/en/latest/)
## Unicode 8 emojis
If your current emoji package supports unicode 8,
which means it supports skin tones and [sequences](http://unicode.org/reports/tr51/),
then [normalizing](https://github.com/nitely/emoji-unicode#normalize) the file names
should be enough. But to handle unsupported emojis, for example future sequences,
they should be displayed as multiple glyphs.
Instead of displaying the `woman-kissing-man` glyph you may
display `woman`, `heart`, `kiss`, `man` glyphs.
Here is a example of how this could be handled:
```python
EMOJI_FILES = set(['1f469', '2764', '1f48b', '1f468']) # A set containing the emoji file names
def _render(unicode, code_points):
return u'<img src="{filename}.svg" alt="{alt}">'.format(filename=code_points, alt=unicode)
def render(e):
"""
Return the rendered html for the passed Emoji.
Return the html as multiple glyphs when the
emoji is a sequence not found within the files.
Return the raw unicode when one or more glyphs
are missing.
"""
if e.code_points in EMOJI_FILES:
return _render(e.unicode, e.code_points)
if any(c not in EMOJI_FILES for u, c in e.as_map()):
return e.unicode
return u''.join(_render(u, c) for u, c in e.as_map())
# This assumes `woman-kissing-man.svg` is missing
emoji_unicode.replace(
u'\U0001f469\u200d\u2764\ufe0f\u200d\U0001f48b\u200d\U0001f468',
render
)
# <img src="1f469.svg" alt="\U0001f469"><img src="2764.svg" alt="\u2764"> ...
```
## Dev
The `./emoji_unicode/pattern.py` file is generated
by parsing the `./emoji_unicode/emoji-data.txt` file,
then putting the output in a in-memory copy of
`./emoji_unicode/pattern_template.py`, and lastly
writing the result into `pattern.py`.
To generate the `pattern.py` file, run:
```
make gen
```
## Tests
```
make test
```
## Benchmark
This will run some silly benchmarks.
```
make bench
```
Here is the output on my machine:
```
emoji.replace()
text len: 10000
0.01640868396498263
re.sub() (raw match)
text len: 10000
0.005225047003477812
Text with no emojis
emoji.replace()
text len: 10000
0.0014624089817516506
```
## Acknowledgments
Thanks to [iamcal/emoji-data](https://github.com/iamcal/emoji-data)
for maintaining an incredible source of emojis that allowed me
to make a robust test suite.
## License
MIT
[![Build Status](https://img.shields.io/travis/nitely/emoji-unicode.svg?style=flat-square)](https://travis-ci.org/nitely/emoji-unicode)
[![Coverage Status](https://img.shields.io/coveralls/nitely/emoji-unicode.svg?style=flat-square)](https://coveralls.io/r/nitely/emoji-unicode)
[![pypi](https://img.shields.io/pypi/v/emoji-unicode.svg?style=flat-square)](https://pypi.python.org/pypi/emoji-unicode)
[![licence](https://img.shields.io/pypi/l/emoji-unicode.svg?style=flat-square)](https://raw.githubusercontent.com/nitely/emoji-unicode/master/LICENSE)
Replace unicode emojis in a text. Supports *Unicode 10* standard.
## Compatibility
* Python 2.7 ([wide-build](http://emoji-unicode.readthedocs.org/en/latest/python2.html)),
3.3, 3.4, 3.5 and +3.6 (recommended)
## Install
```
$ pip install emoji-unicode
```
## Usage
### Replace
[docs](http://emoji-unicode.readthedocs.org/en/latest/api.html#emoji_unicode.replace)
```python
emoji_unicode.replace(
u'Time to ⛽',
lambda e: u'<img src="{filename}.svg" alt="{raw}">'.format(filename=e.code_points, raw=e.unicode)
)
# Time to <img src="26fd.svg" alt="⛽">
```
> Note: the [Emoji.code_points](http://emoji-unicode.readthedocs.org/en/latest/api.html#emoji_unicode.Emoji.code_points) are normalized.
### Normalize
This function removes optional characters that may appear depending on
the input source (Android, iOS, etc). For example the emoji variation `\\uFE0F`
may (or may not) appear in between a emoji and a skin tone modifier,
making the code points to be different. It should be used
to rename the image files.
[docs](http://emoji-unicode.readthedocs.org/en/latest/api.html#emoji_unicode.normalize)
```python
emoji_unicode.normalize(u'1F468-200D-2764-FE0F-200D-1F468')
# 1f468-2764-1f468
```
### Replace (advanced)
```python
PATTERN = re.compile(emoji_unicode.RE_PATTERN_TEMPLATE)
def match_handler(m):
e = emoji_unicode.Emoji(unicode=m.group('emoji'))
return u'<img src="{filename}.svg" alt="{raw}">'.format(
filename=e.code_points,
raw=e.unicode
)
re.sub(PATTERN, match_handler, u'Time to ⛽')
# Time to <img src="26fd.svg" alt="⛽">
```
## Docs
[docs](http://emoji-unicode.readthedocs.org/en/latest/)
## Unicode 8 emojis
If your current emoji package supports unicode 8,
which means it supports skin tones and [sequences](http://unicode.org/reports/tr51/),
then [normalizing](https://github.com/nitely/emoji-unicode#normalize) the file names
should be enough. But to handle unsupported emojis, for example future sequences,
they should be displayed as multiple glyphs.
Instead of displaying the `woman-kissing-man` glyph you may
display `woman`, `heart`, `kiss`, `man` glyphs.
Here is a example of how this could be handled:
```python
EMOJI_FILES = set(['1f469', '2764', '1f48b', '1f468']) # A set containing the emoji file names
def _render(unicode, code_points):
return u'<img src="{filename}.svg" alt="{alt}">'.format(filename=code_points, alt=unicode)
def render(e):
"""
Return the rendered html for the passed Emoji.
Return the html as multiple glyphs when the
emoji is a sequence not found within the files.
Return the raw unicode when one or more glyphs
are missing.
"""
if e.code_points in EMOJI_FILES:
return _render(e.unicode, e.code_points)
if any(c not in EMOJI_FILES for u, c in e.as_map()):
return e.unicode
return u''.join(_render(u, c) for u, c in e.as_map())
# This assumes `woman-kissing-man.svg` is missing
emoji_unicode.replace(
u'\U0001f469\u200d\u2764\ufe0f\u200d\U0001f48b\u200d\U0001f468',
render
)
# <img src="1f469.svg" alt="\U0001f469"><img src="2764.svg" alt="\u2764"> ...
```
## Dev
The `./emoji_unicode/pattern.py` file is generated
by parsing the `./emoji_unicode/emoji-data.txt` file,
then putting the output in a in-memory copy of
`./emoji_unicode/pattern_template.py`, and lastly
writing the result into `pattern.py`.
To generate the `pattern.py` file, run:
```
make gen
```
## Tests
```
make test
```
## Benchmark
This will run some silly benchmarks.
```
make bench
```
Here is the output on my machine:
```
emoji.replace()
text len: 10000
0.01640868396498263
re.sub() (raw match)
text len: 10000
0.005225047003477812
Text with no emojis
emoji.replace()
text len: 10000
0.0014624089817516506
```
## Acknowledgments
Thanks to [iamcal/emoji-data](https://github.com/iamcal/emoji-data)
for maintaining an incredible source of emojis that allowed me
to make a robust test suite.
## License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
emoji-unicode-0.4.tar.gz
(24.6 kB
view details)
File details
Details for the file emoji-unicode-0.4.tar.gz
.
File metadata
- Download URL: emoji-unicode-0.4.tar.gz
- Upload date:
- Size: 24.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 046d4b18093adaae2311d5323dc5910159e3e9233dd262287a14284607dde92c |
|
MD5 | 7a109399946669a6b6f5f559826632b1 |
|
BLAKE2b-256 | f22d2eeac58c543477454cf15ba62c5df38d73ef49acc05ef0ccd4ab545605e3 |