Extract, detect and count emoji
Project description
Emoji Extractor
Extract and count emoji from text efficiently and accurately. Fully supports multi-part emoji (skin tones, zero-width joiners, flags).
Installation
pip install emoji_extractor
Usage examples: see this Jupyter notebook
Quick Start
You can use the top-level convenience functions to extract emoji using the default (latest) Unicode version:
from emoji_extractor import count_emoji, detect_emoji
# Returns a Counter object of emojis and their counts
counts = count_emoji("I love apples 🍎 and bananas 🍌🍌")
print(counts)
# Counter({'🍌': 2, '🍎': 1})
# Check if a string has emoji
has_emoji = detect_emoji("No emoji here") # False
Advanced Usage (Version Selection)
By default, the package uses the latest available Unicode Emoji data.
If you need to extract emoji precisely as they were defined in a specific historical Unicode version, instantiate the Extractor class:
from emoji_extractor import Extractor
# Initialise an extractor for a specific version
ext_14 = Extractor(version='14.0')
ext_15 = Extractor(version='15.0')
# 🩷 Pink heart was introduced in 15.0
print(ext_14.detect_emoji("🩷")) # False
print(ext_15.detect_emoji("🩷")) # True
Available versions: 4.0, 5.0, 11.0, 12.0, 12.1, 13.0, 13.1, 14.0, 15.0, 15.1, 16.0, 17.0.
Details & Features
- Accurate Counting: Uses dynamically generated regular expressions to properly capture multi-codepoint sequences, including ZWJ sequences like '💁🏽♂️' and flags.
- Historical Accuracy: Supports strict adherence to older Unicode specifications, avoiding false positives on newer emoji.
- Always Up to Date: Automatically checks for new Unicode releases via GitHub Actions and updates itself.
How it works under the hood
The package relies on official Unicode data parsed from emoji-test.txt. Inside the data/ folder for each version, it generates:
possible_emoji.json: A set of all characters that could possibly be part of an emoji (used as a fast initial filter before checking the regex).big_regex.txt: A massive list of exact matching strings piped together in order of decreasing length. This guarantees multi-part emojis are matched before their individual components.tme_regex.txt: Regex definitions for Tone-Modifiable Emoji.
(Note: Prior versions of this package used .pkl files, but we have migrated to standard formats like JSON/TXT for better security and cross-platform compatibility).
Some emoji have a variation selector 0xFE0F, but some platforms strip these and still render the emoji form. However, the regex used here will capture both (e.g. 0xFE0F after each emoji codepoint vs no 0xFE0F). See Unicode's Full Emoji List and search for '0xFE0F' to see which emoji this potentially affects.
Other work
If you want to do stuff more complicated than simply detecting, extracting and counting emoji then you might find this Python package useful.
Anything else
Feel free to email me about any of this stuff.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file emoji_extractor-17.0.tar.gz.
File metadata
- Download URL: emoji_extractor-17.0.tar.gz
- Upload date:
- Size: 203.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d47851132a2cdba5728a0fe2aaba8bb41c6cb438077bb9610889fd036569251
|
|
| MD5 |
479bf45bfb2688573c9ef5a133f33049
|
|
| BLAKE2b-256 |
1aa1c104d19f656210fa7559e89661138683bd54dc1692fe6c671755fafdde74
|
Provenance
The following attestation bundles were made for emoji_extractor-17.0.tar.gz:
Publisher:
publish.yml on alexanderrobertson/emoji-extractor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
emoji_extractor-17.0.tar.gz -
Subject digest:
4d47851132a2cdba5728a0fe2aaba8bb41c6cb438077bb9610889fd036569251 - Sigstore transparency entry: 1429275230
- Sigstore integration time:
-
Permalink:
alexanderrobertson/emoji-extractor@5faf4f81a3f2f0756f49a55a9ff708e1f4b86540 -
Branch / Tag:
refs/tags/17.0 - Owner: https://github.com/alexanderrobertson
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5faf4f81a3f2f0756f49a55a9ff708e1f4b86540 -
Trigger Event:
release
-
Statement type:
File details
Details for the file emoji_extractor-17.0-py2.py3-none-any.whl.
File metadata
- Download URL: emoji_extractor-17.0-py2.py3-none-any.whl
- Upload date:
- Size: 204.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b12f24bc8c2a1168f4fb85f39545b753a767f637a6029f71c92d7fde31bce73f
|
|
| MD5 |
96d805dbe18a83546f135d96d439c97b
|
|
| BLAKE2b-256 |
6017fc8342152d9eaefc60652ed676e795a07c5b1dd3a947f902a3f4687ccbd2
|
Provenance
The following attestation bundles were made for emoji_extractor-17.0-py2.py3-none-any.whl:
Publisher:
publish.yml on alexanderrobertson/emoji-extractor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
emoji_extractor-17.0-py2.py3-none-any.whl -
Subject digest:
b12f24bc8c2a1168f4fb85f39545b753a767f637a6029f71c92d7fde31bce73f - Sigstore transparency entry: 1429275273
- Sigstore integration time:
-
Permalink:
alexanderrobertson/emoji-extractor@5faf4f81a3f2f0756f49a55a9ff708e1f4b86540 -
Branch / Tag:
refs/tags/17.0 - Owner: https://github.com/alexanderrobertson
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5faf4f81a3f2f0756f49a55a9ff708e1f4b86540 -
Trigger Event:
release
-
Statement type: