Accurately remove and replace emojis in text strings
Project description
demoji
Accurately find or remove emojis from a blob of text using data from the Unicode Consortium's emoji code repository.
Install
pip install demoji
# or, with uv:
uv add demoji
demoji supports Python 3.10 – 3.14 and bundles Unicode emoji data (version 16.0) at install time,
so no network access is required at runtime. See CHANGELOG.md for the full history.
To report a regression, please open a GitHub issue.
Basic Usage
demoji exports several text-related functions for find-and-replace functionality with emojis:
>>> tweet = """\
... #startspreadingthenews yankees win great start by 🎅🏾 going 5strong innings with 5k’s🔥 🐂
... solo homerun 🌋🌋 with 2 solo homeruns and👹 3run homerun… 🤡 🚣🏼 👨🏽⚖️ with rbi’s … 🔥🔥
... 🇲🇽 and 🇳🇮 to close the game🔥🔥!!!….
... WHAT A GAME!!..
... """
>>> demoji.findall(tweet)
{
"🔥": "fire",
"🌋": "volcano",
"👨🏽\u200d⚖️": "man judge: medium skin tone",
"🎅🏾": "Santa Claus: medium-dark skin tone",
"🇲🇽": "flag: Mexico",
"👹": "ogre",
"🤡": "clown face",
"🇳🇮": "flag: Nicaragua",
"🚣🏼": "person rowing boat: medium-light skin tone",
"🐂": "ox",
}
See below for function API.
Command-line Use
You can use demoji or python -m demoji to replace emojis
in file(s) or stdin with their :code: equivalents:
$ cat out.txt
All done! ✨ 🍰 ✨
$ demoji out.txt
All done! :sparkles: :shortcake: :sparkles:
$ echo 'All done! ✨ 🍰 ✨' | demoji
All done! :sparkles: :shortcake: :sparkles:
$ demoji -
we didnt start the 🔥
we didnt start the :fire:
Reference
findall(string: str) -> Dict[str, str]
Find emojis within string. Return a mapping of {emoji: description}.
findall_list(string: str, desc: bool = True) -> List[str]
Find emojis within string. Return a list (with possible duplicates).
If desc is True, the list contains description codes. If desc is False, the list contains emojis.
replace(string: str, repl: str = "") -> str
Replace emojis in string with repl.
replace_with_desc(string: str, sep: str = ":") -> str
Replace emojis in string with their description codes. The codes are surrounded by sep.
last_downloaded_timestamp() -> datetime.datetime
Show the timestamp of last download for the emoji data bundled with the package.
Footnote: Emoji Sequences
Numerous emojis that look like single Unicode characters are actually multi-character sequences. Examples:
- The keycap 2️⃣ is actually 3 characters, U+0032 (the ASCII digit 2), U+FE0F (variation selector), and U+20E3 (combining enclosing keycap).
- The flag of Scotland 7 component characters,
b'\\U0001f3f4\\U000e0067\\U000e0062\\U000e0073\\U000e0063\\U000e0074\\U000e007f'in full esaped notation.
(You can see any of these through s.encode("unicode-escape").)
demoji is careful to handle this and should find the full sequences rather than their incomplete subcomponents.
The way it does this it to sort emoji codes by their length, and then compile a concatenated regular expression that will greedily search for longer emojis first, falling back to shorter ones if not found. This is not by any means a super-optimized way of searching as it has O(N2) properties, but the focus is on accuracy and completeness.
>>> from pprint import pprint
>>> seq = """\
... I bet you didn't know that 🙋, 🙋♂️, and 🙋♀️ are three different emojis.
... """
>>> pprint(seq.encode('unicode-escape')) # Python 3
(b"I bet you didn't know that \\U0001f64b, \\U0001f64b\\u200d\\u2642\\ufe0f,"
b' and \\U0001f64b\\u200d\\u2640\\ufe0f are three different emojis.\\n')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file demoji-2.0.0.tar.gz.
File metadata
- Download URL: demoji-2.0.0.tar.gz
- Upload date:
- Size: 42.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
570d34980588d6625867cda789973bc92f84b67ee9ef757cb877d37fa0d2a6ee
|
|
| MD5 |
46927a2dac960e780a5f3971b501cad6
|
|
| BLAKE2b-256 |
49c1fd43225c9a87628c9885370ef203d37920f03dac231de58f2d09fb737cae
|
Provenance
The following attestation bundles were made for demoji-2.0.0.tar.gz:
Publisher:
publish.yml on bsolomon1124/demoji
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
demoji-2.0.0.tar.gz -
Subject digest:
570d34980588d6625867cda789973bc92f84b67ee9ef757cb877d37fa0d2a6ee - Sigstore transparency entry: 1340048341
- Sigstore integration time:
-
Permalink:
bsolomon1124/demoji@b8e166b914c74d3bf138829d846ee5b8ab587ede -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/bsolomon1124
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b8e166b914c74d3bf138829d846ee5b8ab587ede -
Trigger Event:
push
-
Statement type:
File details
Details for the file demoji-2.0.0-py3-none-any.whl.
File metadata
- Download URL: demoji-2.0.0-py3-none-any.whl
- Upload date:
- Size: 43.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
512e94aa0e828912ec55d441d92921a14233465317e6336a819deecac5d652ef
|
|
| MD5 |
d5d83b98f2a888288dfa19eeafef9b45
|
|
| BLAKE2b-256 |
53e3143672272e410bb131755c1d12d13e0e53e2235714c0b9ba358408911c4d
|
Provenance
The following attestation bundles were made for demoji-2.0.0-py3-none-any.whl:
Publisher:
publish.yml on bsolomon1124/demoji
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
demoji-2.0.0-py3-none-any.whl -
Subject digest:
512e94aa0e828912ec55d441d92921a14233465317e6336a819deecac5d652ef - Sigstore transparency entry: 1340048355
- Sigstore integration time:
-
Permalink:
bsolomon1124/demoji@b8e166b914c74d3bf138829d846ee5b8ab587ede -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/bsolomon1124
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b8e166b914c74d3bf138829d846ee5b8ab587ede -
Trigger Event:
push
-
Statement type: