Skip to main content

Accurately remove and replace emojis in text strings

Project description

demoji

Accurately find or remove emojis from a blob of text using data from the Unicode Consortium's emoji code repository.

License PyPI Status Python


Install

pip install demoji
# or, with uv:
uv add demoji

demoji supports Python 3.10 – 3.14 and bundles Unicode emoji data (version 16.0) at install time, so no network access is required at runtime. See CHANGELOG.md for the full history.

To report a regression, please open a GitHub issue.

Basic Usage

demoji exports several text-related functions for find-and-replace functionality with emojis:

>>> tweet = """\
... #startspreadingthenews yankees win great start by 🎅🏾 going 5strong innings with 5k’s🔥 🐂
... solo homerun 🌋🌋 with 2 solo homeruns and👹 3run homerun… 🤡 🚣🏼 👨🏽‍⚖️ with rbi’s … 🔥🔥
... 🇲🇽 and 🇳🇮 to close the game🔥🔥!!!….
... WHAT A GAME!!..
... """
>>> demoji.findall(tweet)
{
    "🔥": "fire",
    "🌋": "volcano",
    "👨🏽\u200d⚖️": "man judge: medium skin tone",
    "🎅🏾": "Santa Claus: medium-dark skin tone",
    "🇲🇽": "flag: Mexico",
    "👹": "ogre",
    "🤡": "clown face",
    "🇳🇮": "flag: Nicaragua",
    "🚣🏼": "person rowing boat: medium-light skin tone",
    "🐂": "ox",
}

See below for function API.

Command-line Use

You can use demoji or python -m demoji to replace emojis in file(s) or stdin with their :code: equivalents:

$ cat out.txt
All done!  🍰 ✨
$ demoji out.txt
All done! :sparkles: :shortcake: :sparkles:

$ echo 'All done! ✨ 🍰 ✨' | demoji
All done! :sparkles: :shortcake: :sparkles:

$ demoji -
we didnt start the 🔥
we didnt start the :fire:

Reference

findall(string: str) -> Dict[str, str]

Find emojis within string. Return a mapping of {emoji: description}.

findall_list(string: str, desc: bool = True) -> List[str]

Find emojis within string. Return a list (with possible duplicates).

If desc is True, the list contains description codes. If desc is False, the list contains emojis.

replace(string: str, repl: str = "") -> str

Replace emojis in string with repl.

replace_with_desc(string: str, sep: str = ":") -> str

Replace emojis in string with their description codes. The codes are surrounded by sep.

last_downloaded_timestamp() -> datetime.datetime

Show the timestamp of last download for the emoji data bundled with the package.

Footnote: Emoji Sequences

Numerous emojis that look like single Unicode characters are actually multi-character sequences. Examples:

  • The keycap 2️⃣ is actually 3 characters, U+0032 (the ASCII digit 2), U+FE0F (variation selector), and U+20E3 (combining enclosing keycap).
  • The flag of Scotland 7 component characters, b'\\U0001f3f4\\U000e0067\\U000e0062\\U000e0073\\U000e0063\\U000e0074\\U000e007f' in full esaped notation.

(You can see any of these through s.encode("unicode-escape").)

demoji is careful to handle this and should find the full sequences rather than their incomplete subcomponents.

The way it does this it to sort emoji codes by their length, and then compile a concatenated regular expression that will greedily search for longer emojis first, falling back to shorter ones if not found. This is not by any means a super-optimized way of searching as it has O(N2) properties, but the focus is on accuracy and completeness.

>>> from pprint import pprint
>>> seq = """\
... I bet you didn't know that 🙋, 🙋‍♂️, and 🙋‍♀️ are three different emojis.
... """
>>> pprint(seq.encode('unicode-escape'))  # Python 3
(b"I bet you didn't know that \\U0001f64b, \\U0001f64b\\u200d\\u2642\\ufe0f,"
 b' and \\U0001f64b\\u200d\\u2640\\ufe0f are three different emojis.\\n')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

demoji-2.0.0.tar.gz (42.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

demoji-2.0.0-py3-none-any.whl (43.4 kB view details)

Uploaded Python 3

File details

Details for the file demoji-2.0.0.tar.gz.

File metadata

  • Download URL: demoji-2.0.0.tar.gz
  • Upload date:
  • Size: 42.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for demoji-2.0.0.tar.gz
Algorithm Hash digest
SHA256 570d34980588d6625867cda789973bc92f84b67ee9ef757cb877d37fa0d2a6ee
MD5 46927a2dac960e780a5f3971b501cad6
BLAKE2b-256 49c1fd43225c9a87628c9885370ef203d37920f03dac231de58f2d09fb737cae

See more details on using hashes here.

Provenance

The following attestation bundles were made for demoji-2.0.0.tar.gz:

Publisher: publish.yml on bsolomon1124/demoji

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file demoji-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: demoji-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 43.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for demoji-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 512e94aa0e828912ec55d441d92921a14233465317e6336a819deecac5d652ef
MD5 d5d83b98f2a888288dfa19eeafef9b45
BLAKE2b-256 53e3143672272e410bb131755c1d12d13e0e53e2235714c0b9ba358408911c4d

See more details on using hashes here.

Provenance

The following attestation bundles were made for demoji-2.0.0-py3-none-any.whl:

Publisher: publish.yml on bsolomon1124/demoji

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page