Accurately remove and replace emojis in text strings.

These details have not been verified by PyPI

Project links

Homepage

Project description

demoji

Accurately find or remove emojis from a blob of text.

Basic Usage

demoji requires an initial data download from the Unicode Consortium's emoji code repository.

On first use of the package, call download_codes():

>>> import demoji
>>> demoji.download_codes()
Downloading emoji data ...
... OK (Got response in 0.14 seconds)
Writing emoji data to /Users/brad/.demoji/codes.json ...
... OK

This will store the Unicode hex-notated symbols at ~/.demoji/codes.json for future use.

demoji exports two text-related functions, findall() and replace(), which behave somewhat the re module's findall() and sub(), respectively. However, findall() returns a dictionary of emojis to their full name (description):

>>> tweet = """\
... #startspreadingthenews yankees win great start by 🎅🏾 going 5strong innings with 5k’s🔥 🐂
... solo homerun 🌋🌋 with 2 solo homeruns and👹 3run homerun… 🤡 🚣🏼 👨🏽‍⚖️ with rbi’s … 🔥🔥
... 🇲🇽 and 🇳🇮 to close the game🔥🔥!!!….
... WHAT A GAME!!..
... """
>>> demoji.findall(tweet)
{
    "🔥": "fire",
    "🌋": "volcano",
    "👨🏽\u200d⚖️": "man judge: medium skin tone",
    "🎅🏾": "Santa Claus: medium-dark skin tone",
    "🇲🇽": "flag: Mexico",
    "👹": "ogre",
    "🤡": "clown face",
    "🇳🇮": "flag: Nicaragua",
    "🚣🏼": "person rowing boat: medium-light skin tone",
    "🐂": "ox",
}

The reason that demoji requires a download rather than coming pre-packaged with Unicode emoji data is that the emoji list itself is frequently updated and changed. You are free to periodically update the local cache by calling demoji.download_codes() every so often.

To pull your last-downloaded date, you can use the last_downloaded_timestamp() helper:

>>> demoji.last_downloaded_timestamp()
datetime.datetime(2019, 2, 9, 7, 42, 24, 433776, tzinfo=<demoji.UTC object at 0x101b9ecf8>)

The result will be None if codes have not previously been downloaded.

Footnote: Emoji Sequences

Numerous emojis that look like single Unicode characters are actually multi-character sequences. Examples:

The keycap 2️⃣ is actually 3 characters, U+0032 (the ASCII digit 2), U+FE0F (variation selector), and U+20E3 (combining enclosing keycap).
The flag of Scotland 7 component characters, b'\\U0001f3f4\\U000e0067\\U000e0062\\U000e0073\\U000e0063\\U000e0074\\U000e007f' in full esaped notation.

(You can see any of these through s.encode("unicode-escape").)

demoji is careful to handle this and should find the full sequences rather than their incomplete subcomponents.

The way it does this it to sort emoji codes by their length, and then compile a concatenated regular expression that will greedily search for longer emojis first, falling back to shorter ones if not found. This is not by any means a super-optimized way of searching as it has O(N²) properties, but the focus is on accuracy and completeness.

>>> from pprint import pprint
>>> seq = """\
... I bet you didn't know that 🙋, 🙋‍♂️, and 🙋‍♀️ are three different emojis.
... """
>>> pprint(seq.encode('unicode-escape'))  # Python 3
(b"I bet you didn't know that \\U0001f64b, \\U0001f64b\\u200d\\u2642\\ufe0f,"
 b' and \\U0001f64b\\u200d\\u2640\\ufe0f are three different emojis.\\n')

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.1.0

Aug 29, 2021

1.0.0

Jul 25, 2021

1.0.0rc1 pre-release

Jul 18, 2021

1.0.0rc0 pre-release

Jul 18, 2021

0.4.0

Dec 13, 2020

0.3.0

Aug 30, 2020

0.3.0rc1 pre-release

Aug 30, 2020

0.2.1

Apr 14, 2020

0.2.0

Apr 14, 2020

This version

0.1.5

May 4, 2019

0.1.4

Feb 19, 2019

0.1.3

Feb 9, 2019

0.1.1

Feb 9, 2019

0.0.2

Feb 9, 2019

0.0.1

Feb 9, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

demoji-0.1.5.tar.gz (5.4 kB view details)

Uploaded May 4, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

demoji-0.1.5-py3-none-any.whl (9.5 kB view details)

Uploaded May 4, 2019 Python 3

File details

Details for the file demoji-0.1.5.tar.gz.

File metadata

Download URL: demoji-0.1.5.tar.gz
Upload date: May 4, 2019
Size: 5.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for demoji-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`6509f777e4c0cf5794c799b5c4c355ff43df540d06b65570fc9675e46ebab82e`
MD5	`f6247c469177732b52908ec7e0f4644c`
BLAKE2b-256	`6bcc7721315f755ce031a32c3c4009c6d5c13fc90b592fb7a3d7b4be6dfc0a03`

See more details on using hashes here.

File details

Details for the file demoji-0.1.5-py3-none-any.whl.

File metadata

Download URL: demoji-0.1.5-py3-none-any.whl
Upload date: May 4, 2019
Size: 9.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for demoji-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6d81bda178b0415141fae6f8e9da275e99dedbcd887d937eb7e8435c845fe644`
MD5	`991de645d6cce1aed23d6d1a471473fd`
BLAKE2b-256	`d73287e7ca4d8a8462cb8a0cd8803cbe4933ce03e5e0e9e3e685f89df72a8110`

See more details on using hashes here.

demoji 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

demoji

Basic Usage

Footnote: Emoji Sequences

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes