Extract, detect and count emoji

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

a-d-robertson alexanderrobertson

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Text Processing :: General

Project description

Emoji Extractor

Extract, detect, and count emoji from text — fast and accurate. Fully supports multi-codepoint sequences (skin tones, ZWJ sequences, flags).

Uses a trie-based greedy longest-match engine (pure Python, zero dependencies) that is 27× faster than regex for single strings and 115× faster when processing large datasets with automatic multiprocessing.

Installation

pip install emoji_extractor

Quick Start

Use the top-level convenience functions for simple tasks:

from emoji_extractor import count_emoji, detect_emoji

# Check if a string contains emoji
detect_emoji("Hello 👋")   # True
detect_emoji("No emoji")   # False

# Count emoji in a single string — returns a Counter
counts = count_emoji("I love 🍎 and 🍌🍌")
print(counts)
# Counter({'🍌': 2, '🍎': 1})

Single Strings vs Bulk Processing

The package provides two tiers of counting methods:

`count_emoji(string)` — Single string

Scans one string and returns a Counter. Fast enough for real-time use (~9µs per line).

from emoji_extractor import Extractor

ext = Extractor()
ext.count_emoji("Great job 🎉🎉🎉")
# Counter({'🎉': 3})

`count_all_emoji(iterable)` — Bulk processing

Processes a list (or any iterable) of strings. For inputs with 1,000+ lines, work is automatically distributed across multiple CPU cores for significantly faster throughput.

tweets = ["Love this 🍎", "So funny 😂😂", "Hello world", ...]

# Automatically parallelised for large inputs
totals = ext.count_all_emoji(tweets)
print(totals.most_common(5))
# [('😂', 2813), ('❤', 1150), ('😍', 974), ...]

Method	Input	Parallelised?
`count_emoji(string)`	Single string	No (already ~9µs)
`count_all_emoji(iterable)`	List of strings	Yes, for ≥1000 lines
`count_tme(string)`	Single string	No
`count_all_tme(iterable)`	List of strings	Yes, for ≥1000 lines
`count_tones(string)`	Single string	No
`count_all_tones(iterable)`	List of strings	Yes, for ≥1000 lines

Advanced Usage

Version Selection

By default, the package uses the latest Unicode Emoji data (currently 17.0). To extract emoji as defined in a specific historical version:

from emoji_extractor import Extractor

ext_14 = Extractor(version='14.0')
ext_15 = Extractor(version='15.0')

# 🩷 Pink heart was introduced in 15.0
ext_14.detect_emoji("🩷")  # False
ext_15.detect_emoji("🩷")  # True

Available versions: 4.0, 5.0, 11.0, 12.0, 12.1, 13.0, 14.0, 15.0, 15.1, 16.0, 17.0.

Tone-Modifiable Emoji

Count emoji that support skin tone modifiers, plus their unmodified base forms:

ext = Extractor()
ext.count_tme("High five ✋🏽")
# Counter({'✋🏽': 1})

ext.count_tones("Waves 👋🏻👋🏿")
# Counter({'🏻': 1, '🏿': 1})

Controlling Parallelism

# Use fewer workers (default: min(cpu_count, 8))
ext = Extractor(n_workers=4)

# Disable multiprocessing entirely
ext = Extractor(n_workers=1)

# Clean up worker processes when done
ext.close()

Details & Features

Accurate Counting: Uses a greedy longest-match trie to correctly handle multi-codepoint emoji, including ZWJ sequences like 👩‍🦰 and flag sequences like 🇬🇧.
Fast: 27× faster than regex for single strings. 115× faster with parallelism for bulk data.
Zero Dependencies: Pure Python — no external packages required.
Historical Accuracy: Supports strict adherence to older Unicode specifications, avoiding false positives on newer emoji.
Always Up to Date: Automatically checks for new Unicode releases via GitHub Actions and updates itself.

How It Works Under the Hood

The package relies on official Unicode data parsed from emoji-test.txt. For each supported version, the data/ folder contains:

emoji_sequences.json: All emoji strings, sorted longest-first. Used to build a nested-dict trie for greedy matching.
tme_sequences.json: Tone-modifiable emoji sequences.
possible_emoji.json: A set of all characters that could be part of an emoji (used by detect_emoji() for fast presence checking).

The trie scanner walks through text character by character, always matching the longest possible emoji sequence at each position. This naturally handles cases where a shorter emoji is a prefix of a longer one (e.g., 👩 vs 👩‍🦰).

Note: Some emoji include a variation selector (U+FE0F), but some platforms strip it while still rendering the emoji. The trie captures both forms.

Changelog

17.0.2

Engine: Regex replaced with pure-Python trie (27× faster single, 115× bulk with multiprocessing)
Data: big_regex.txt / tme_regex.txt → emoji_sequences.json / tme_sequences.json
check_first parameter is now a no-op (accepted for compatibility)
count_all_* methods auto-parallelise for large inputs
Added n_workers parameter and close() method to Extractor
Removed Extractor.big_regex and Extractor.tme (raise helpful error if accessed)

Other Work

If you want to do more than detecting, extracting, and counting emoji, this Python package may be useful.

Contact

Feel free to email me about any of this stuff.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

a-d-robertson alexanderrobertson

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Text Processing :: General

Release history Release notifications | RSS feed

This version

17.0.2

May 9, 2026

17.0.1

May 2, 2026

17.0

May 2, 2026

16.0

Jan 29, 2025

2.1.3

Jan 24, 2025

2.1.2

Feb 12, 2024

2.1.1

Feb 12, 2024

2.1.0

Feb 11, 2024

2.0.0

Feb 4, 2023

1.0.20

Mar 25, 2022

1.0.19

Feb 15, 2021

1.0.18

Feb 15, 2021

1.0.17

Nov 5, 2018

1.0.16

Sep 19, 2018

1.0.15

Sep 13, 2018

1.0.14

Mar 7, 2018

1.0.13

Dec 6, 2017

1.0.12

Dec 6, 2017

1.0.11

Dec 6, 2017

1.0.10

Dec 6, 2017

1.0.9

Dec 6, 2017

1.0.8

Nov 20, 2017

1.0.7

Nov 20, 2017

1.0.6

Nov 20, 2017

1.0.5

Nov 20, 2017

1.0.4

Nov 20, 2017

1.0.3

Nov 20, 2017

1.0.2

Nov 20, 2017

1.0.1

Nov 20, 2017

1.0

Nov 20, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emoji_extractor-17.0.2.tar.gz (209.4 kB view details)

Uploaded May 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

emoji_extractor-17.0.2-py2.py3-none-any.whl (215.8 kB view details)

Uploaded May 9, 2026 Python 2Python 3

File details

Details for the file emoji_extractor-17.0.2.tar.gz.

File metadata

Download URL: emoji_extractor-17.0.2.tar.gz
Upload date: May 9, 2026
Size: 209.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for emoji_extractor-17.0.2.tar.gz
Algorithm	Hash digest
SHA256	`68a4e66bca707a46b6d7c53b62e546d1215c0843b4d4adbc76fe25d621d7d28a`
MD5	`53931671c4bee8d1dd0a7aadc0fa1b30`
BLAKE2b-256	`14427ffbfd0cc1b655af4c3fb9cfde8874fb4bedfc76fa65e6c3d021271cdf63`

See more details on using hashes here.

Provenance

The following attestation bundles were made for emoji_extractor-17.0.2.tar.gz:

Publisher: publish.yml on alexanderrobertson/emoji-extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: emoji_extractor-17.0.2.tar.gz
- Subject digest: 68a4e66bca707a46b6d7c53b62e546d1215c0843b4d4adbc76fe25d621d7d28a
- Sigstore transparency entry: 1485932121
- Sigstore integration time: May 9, 2026
Source repository:
- Permalink: alexanderrobertson/emoji-extractor@587abf285c9bf6651759674d587be749c6c2ed5c
- Branch / Tag: refs/tags/17.0.2
- Owner: https://github.com/alexanderrobertson
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@587abf285c9bf6651759674d587be749c6c2ed5c
- Trigger Event: release

File details

Details for the file emoji_extractor-17.0.2-py2.py3-none-any.whl.

File metadata

Download URL: emoji_extractor-17.0.2-py2.py3-none-any.whl
Upload date: May 9, 2026
Size: 215.8 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for emoji_extractor-17.0.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`840ba9b0040cc8a6632e630f6ad28aaed1609dc13b2c5c7c25ee5330647834ff`
MD5	`431bede4c0b21e28f060e6758b641013`
BLAKE2b-256	`0cab5b3892b3cf2ba8ceacd34cf70950eb03c2a12e0e93cd694cfed6ae7ccaa3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for emoji_extractor-17.0.2-py2.py3-none-any.whl:

Publisher: publish.yml on alexanderrobertson/emoji-extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: emoji_extractor-17.0.2-py2.py3-none-any.whl
- Subject digest: 840ba9b0040cc8a6632e630f6ad28aaed1609dc13b2c5c7c25ee5330647834ff
- Sigstore transparency entry: 1485932141
- Sigstore integration time: May 9, 2026
Source repository:
- Permalink: alexanderrobertson/emoji-extractor@587abf285c9bf6651759674d587be749c6c2ed5c
- Branch / Tag: refs/tags/17.0.2
- Owner: https://github.com/alexanderrobertson
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@587abf285c9bf6651759674d587be749c6c2ed5c
- Trigger Event: release

emoji-extractor 17.0.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Emoji Extractor

Installation

Quick Start

Single Strings vs Bulk Processing

count_emoji(string) — Single string

count_all_emoji(iterable) — Bulk processing

Advanced Usage

Version Selection

Tone-Modifiable Emoji

Controlling Parallelism

Details & Features

How It Works Under the Hood

Changelog

17.0.2

Other Work

Contact

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`count_emoji(string)` — Single string

`count_all_emoji(iterable)` — Bulk processing