Accurately remove and replace emojis in text strings

These details have not been verified by PyPI

Project links

Project description

demoji

Accurately find or remove emojis from a blob of text using data from the Unicode Consortium's emoji code repository.

Major Changes in Version 1.x

Version 1.x of demoji now bundles Unicode data in the package at install time rather than requiring a download of the codes from unicode.org at runtime. Please see the CHANGELOG.md for detail and be familiar with the changes before updating from 0.x to 1.x.

To report any regressions, please open a GitHub issue.

Basic Usage

demoji exports several text-related functions for find-and-replace functionality with emojis:

>>> tweet = """\
... #startspreadingthenews yankees win great start by 🎅🏾 going 5strong innings with 5k’s🔥 🐂
... solo homerun 🌋🌋 with 2 solo homeruns and👹 3run homerun… 🤡 🚣🏼 👨🏽‍⚖️ with rbi’s … 🔥🔥
... 🇲🇽 and 🇳🇮 to close the game🔥🔥!!!….
... WHAT A GAME!!..
... """
>>> demoji.findall(tweet)
{
    "🔥": "fire",
    "🌋": "volcano",
    "👨🏽\u200d⚖️": "man judge: medium skin tone",
    "🎅🏾": "Santa Claus: medium-dark skin tone",
    "🇲🇽": "flag: Mexico",
    "👹": "ogre",
    "🤡": "clown face",
    "🇳🇮": "flag: Nicaragua",
    "🚣🏼": "person rowing boat: medium-light skin tone",
    "🐂": "ox",
}

See below for function API.

Command-line Use

You can use demoji or python -m demoji to replace emojis in file(s) or stdin with their :code: equivalents:

$ cat out.txt
All done! ✨ 🍰 ✨
$ demoji out.txt
All done! :sparkles: :shortcake: :sparkles:

$ echo 'All done! ✨ 🍰 ✨' | demoji
All done! :sparkles: :shortcake: :sparkles:

$ demoji -
we didnt start the 🔥
we didnt start the :fire:

Reference

findall(string: str) -> Dict[str, str]

Find emojis within string. Return a mapping of {emoji: description}.

findall_list(string: str, desc: bool = True) -> List[str]

Find emojis within string. Return a list (with possible duplicates).

If desc is True, the list contains description codes. If desc is False, the list contains emojis.

replace(string: str, repl: str = "") -> str

Replace emojis in string with repl.

replace_with_desc(string: str, sep: str = ":") -> str

Replace emojis in string with their description codes. The codes are surrounded by sep.

last_downloaded_timestamp() -> datetime.datetime

Show the timestamp of last download for the emoji data bundled with the package.

Footnote: Emoji Sequences

Numerous emojis that look like single Unicode characters are actually multi-character sequences. Examples:

The keycap 2️⃣ is actually 3 characters, U+0032 (the ASCII digit 2), U+FE0F (variation selector), and U+20E3 (combining enclosing keycap).
The flag of Scotland 7 component characters, b'\\U0001f3f4\\U000e0067\\U000e0062\\U000e0073\\U000e0063\\U000e0074\\U000e007f' in full esaped notation.

(You can see any of these through s.encode("unicode-escape").)

demoji is careful to handle this and should find the full sequences rather than their incomplete subcomponents.

The way it does this it to sort emoji codes by their length, and then compile a concatenated regular expression that will greedily search for longer emojis first, falling back to shorter ones if not found. This is not by any means a super-optimized way of searching as it has O(N²) properties, but the focus is on accuracy and completeness.

>>> from pprint import pprint
>>> seq = """\
... I bet you didn't know that 🙋, 🙋‍♂️, and 🙋‍♀️ are three different emojis.
... """
>>> pprint(seq.encode('unicode-escape'))  # Python 3
(b"I bet you didn't know that \\U0001f64b, \\U0001f64b\\u200d\\u2642\\ufe0f,"
 b' and \\U0001f64b\\u200d\\u2640\\ufe0f are three different emojis.\\n')

Changelog

1.1.0

Add a __main.py__ to allow running python -m demoji; add an entry-point demoji command; permit stdin (-), file name(s), or piped stdin. Contribution by @jap.

1.0.0

This is a backwards-incompatible release with several substantial changes.

The largest change is that demoji now bundles a static copy of Unicode emoji data with the package at install time, rather than requiring a runtime download of the codes from unicode.org.

Changes below are grouped by their corresponding Semantic Versioning identifier.

SemVer MAJOR:

Drop support for Python 2 and Python 3.5
The demoji package now bundles emoji data that is distributed with the package at install time, rather than requiring a download of the codes from the unicode.org site at runtime (closes #23)
As a result of the above change, the following functions are removed from the demoji API:
- download_codes()
- parse_unicode_sequence()
- parse_unicode_range()
- stream_unicodeorg_emojifile()

SemVer MINOR:

The demoji.DIRECTORY and demoji.CACHEPATH attributes are deprecated due to no longer being functionally in used by the package. Accessing them will warn with a FutureWarning, and these attributes may be removed completely in a future release
demoji can now be installed with optional ujson support for faster loading of emoji data from file (versus the standard library's json, which is the default); use python -m pip install demoji[ujson]
The dependencies requests and colorama have been removed completely
importlib_resources (a backport module) is now required for Python < 3.7
The EMOJI_VERSION attribute, newly added to demoji, is a str denoting the Unicode database version in use

SemVer PATCH:

Fix a typo in demoji.__all__ to properly include demoji.findall_list()
Internal change: Functions that call set_emoji_pattern() are now decorated with a @cache_setter to set the cache
Some unit tests have been removed to update the change in behavior from downloading codes to bundling codes with install
Update README to reflect bundling behavior

0.4.0

Update emoji source list to version 13.1. (See 5090eb5.)
Formally support Python 3.9. (See 6e9c34c.)
Bugfix: ensure that demoji.last_downloaded_timestamp() returns correct UTC time. (See 6c8ad15.)

0.3.0

Feature: add findall_list() and replace_with_desc() functions. (See 7cea333.)
Modernize setup config to use setup.cfg. (See 8f141e7.)

0.2.1

Tox: formally add Python 3.8 tests.

0.2.0

Windows: use the colorama package to support printing ANSI escape sequences on Windows; this introduces colorama as a dependency. (See cd343c1.)
Setup: Fix a bug in setup.py that would require dependencies to be installed prior to installation of demoji in order to find the __version__. (See d5f429c.)
Python 2 + Windows support: use io.open(..., encoding='utf-8') consistently in setup.py. (See 1efec5d.)
Distribution: use a universal wheel in PyPI release. (See 8636a32.)

0.1.5

Performance improvement: use re.escape() rather than failing to compile a small subset of codes.
Remove an unused constant in __init__.py.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Aug 29, 2021

1.0.0

Jul 25, 2021

1.0.0rc1 pre-release

Jul 18, 2021

1.0.0rc0 pre-release

Jul 18, 2021

0.4.0

Dec 13, 2020

0.3.0

Aug 30, 2020

0.3.0rc1 pre-release

Aug 30, 2020

0.2.1

Apr 14, 2020

0.2.0

Apr 14, 2020

0.1.5

May 4, 2019

0.1.4

Feb 19, 2019

0.1.3

Feb 9, 2019

0.1.1

Feb 9, 2019

0.0.2

Feb 9, 2019

0.0.1

Feb 9, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

demoji-1.1.0.tar.gz (46.3 kB view details)

Uploaded Aug 29, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

demoji-1.1.0-py3-none-any.whl (42.9 kB view details)

Uploaded Aug 29, 2021 Python 3

File details

Details for the file demoji-1.1.0.tar.gz.

File metadata

Download URL: demoji-1.1.0.tar.gz
Upload date: Aug 29, 2021
Size: 46.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for demoji-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`072efaeca725e6f63ab59d83abeb55b178842538ed9256455a82ebbd055ff216`
MD5	`de7bc1c03d0b947a445b7de4e4ac7ed3`
BLAKE2b-256	`9d62e6de96cf1ef2c6ac91b84a51af151d791f874529d8b146d3587771d05727`

See more details on using hashes here.

File details

Details for the file demoji-1.1.0-py3-none-any.whl.

File metadata

Download URL: demoji-1.1.0-py3-none-any.whl
Upload date: Aug 29, 2021
Size: 42.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for demoji-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6d3256c909aea299e97fe984f827a2a060c2a8f8bfcbafa7ec9659967c5df50f`
MD5	`51e0d6c9d964248ab65c8dda565da1a9`
BLAKE2b-256	`03669dc4b6d57f3a74ad8cf79f0cc4e965165871bfb3f612db77ccd4e0200b38`

See more details on using hashes here.

demoji 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

demoji

Major Changes in Version 1.x

Basic Usage

Command-line Use

Reference

Footnote: Emoji Sequences

Changelog

1.1.0

1.0.0

0.4.0

0.3.0

0.2.1

0.2.0

0.1.5

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes