Unicode information lookup plugin for Sopel
Project description
sopel-unicode
Sopel plugin for information lookup on Unicode codepoints.
This plugin is designed as a drop-in replacement for the built-in unicode_info plugin. It provides:
- The General Category of each codepoint
- Optional support for Unicode Character Database (UCD) versions newer than the Python release used to run Sopel, if
unicodedata2is available - Optional support for reporting the Unicode version that introduced each codepoint, if
unicode_ageis available
Installing
Releases are hosted on PyPI, so after installing Sopel, all you need is pip:
$ pip install sopel-unicode
Disable built-in unicode_info plugin
You should edit your Sopel core config to add unicode_info to the exclude plugin list, otherwise you will
get duplicated responses from both plugins.
Optional features
sopel-unicode is designed to act as a drop-in replacement for the original built-in plugin on a basic installation.
To enable all the optional features:
$ pip install sopel-unicode[all]
Usage
Note that output given in this section corresponds to sopel-unicode[all] except where noted. Output layout may differ
if some optional dependencies are missing.
Codepoint lookup
The unicode (short-form u) command provides lookup of codepoints in a provided string. Input characters defined by
the configuration option ignore_chars are ignored.
Lookup uses unicodedata2 if it is available, and falls back on stdlib unicodedata otherwise.
<SnoopJ> .unicode
<terribot> [unicode] (): U+1FAE9 v16.0 (So) FACE WITH BAGS UNDER EYES
<SnoopJ> .u 🏴☠
<terribot> [unicode] (🏴): U+1F3F4 v7.0 (So) WAVING BLACK FLAG
<terribot> [unicode] (): U+200D v1.1 (Cf) ZERO WIDTH JOINER
<terribot> [unicode] (☠): U+2620 v1.1 (So) SKULL AND CROSSBONES
It is sometimes convenient to discard all ASCII characters from lookup, which can be done with the
unicode:noascii(u:noascii) command:
<SnoopJ> .u:noascii ça va?
<terribot> [unicode] (ç): U+00E7 v1.1 (Ll) LATIN SMALL LETTER C WITH CEDILLA
The unicode:raw (u:raw) command is provided to avoid discarding any codepoints when performing lookup.
<SnoopJ> .unicode:raw a b
<terribot> [unicode] (a): U+0061 v1.1 (Ll) LATIN SMALL LETTER A
<terribot> [unicode] ( ): U+0020 v1.1 (Zs) SPACE
<terribot> [unicode] (b): U+0062 v1.1 (Ll) LATIN SMALL LETTER B
Individual codepoints can also be looked up with hex notation, in either U+NNNN form, 0xNNNN form, or \uNNNN form.
<SnoopJ> .unicode U+037E
<terribot> [unicode] (;): U+037E v1.1 (Po) GREEK QUESTION MARK
<SnoopJ> .u 0xBEEF
<terribot> [unicode] (뻯): U+BEEF v2.0 (Lo) HANGUL SYLLABLE BBEGS
<SnoopJ> .u \u732b
<terribot> [unicode] (猫): U+732B v1.1 (Lo) CJK UNIFIED IDEOGRAPH-732B
Note that the \u notation is not restricted in the same way as the same notation for Python literals. You may use as
many or as few hex digits as you like.
<SnoopJ> .u \u1
<terribot> [unicode] (): U+0001 v1.1 (Cc) START OF HEADING
<SnoopJ> .u \u12345
<terribot> [unicode] (𒍅): U+12345 v5.0 (Lo) CUNEIFORM SIGN URU TIMES KI
Normalization forms
The Unicode normalization forms are available to transform input strings.
Input characters defined by the configuration option ignore_chars are ignored.
<SnoopJ> .unicode:NFKD ça va
<terribot> [unicode] (c): U+0063 v1.1 (Ll) LATIN SMALL LETTER C
<terribot> [unicode] (◌̧): U+0327 v1.1 (Mn) COMBINING CEDILLA
<terribot> [unicode] (a): U+0061 v1.1 (Ll) LATIN SMALL LETTER A
<terribot> [unicode] (v): U+0076 v1.1 (Ll) LATIN SMALL LETTER V
<terribot> [unicode] (a): U+0061 v1.1 (Ll) LATIN SMALL LETTER A
<SnoopJ> .u:NFKC ça va
<terribot> [unicode] (ç): U+00E7 v1.1 (Ll) LATIN SMALL LETTER C WITH CEDILLA
<terribot> [unicode] (a): U+0061 v1.1 (Ll) LATIN SMALL LETTER A
<terribot> [unicode] (v): U+0076 v1.1 (Ll) LATIN SMALL LETTER V
<terribot> [unicode] (a): U+0061 v1.1 (Ll) LATIN SMALL LETTER A
Codepoint search
A rudimentary search functionality is available. The maximum number of matches reported can be configured, as many queries produce a large number of results.
<SnoopJ> .unicode:search apple
<terribot> [unicode] 3 results:
<terribot> [unicode] 🍍 U+1f34d PINEAPPLE
<terribot> [unicode] 🍎 U+1f34e RED APPLE
<terribot> [unicode] 🍏 U+1f34f GREEN APPLE
Configuring
The easiest way to configure sopel-unicode is via Sopel's configuration wizard—simply run
sopel-plugins configure sopel-unicode and enter the values for which it prompts you.
| Field | Description | Default (if any) |
|---|---|---|
max_length |
Maximum length of Unicode string input | 5 |
length_override_channels |
Channels where max_length does not apply | [] |
ignore_characters |
Characters ignored during lookup | [' '] |
search_max_matches |
Maximum number of matches for a codepoint search | 10 |
search_num_public_matches |
Number of matches publicly reported for a codepoint search | 2 |
Changelog
1.0.0
First release of sopel-unicode.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sopel_unicode-1.0.0.tar.gz.
File metadata
- Download URL: sopel_unicode-1.0.0.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cb2ea1f1d300660864e1c14e0fe7a2818ee7a8da7e5fa551ea46e411a375b8d
|
|
| MD5 |
0d35a585de7f847b08ea749f1de71114
|
|
| BLAKE2b-256 |
c33d4a5b5d9c34e6ba68181ac3b4593421b105c30518c45baf7088517e254cbd
|
Provenance
The following attestation bundles were made for sopel_unicode-1.0.0.tar.gz:
Publisher:
pypi.yml on sopel-irc/sopel-unicode
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sopel_unicode-1.0.0.tar.gz -
Subject digest:
9cb2ea1f1d300660864e1c14e0fe7a2818ee7a8da7e5fa551ea46e411a375b8d - Sigstore transparency entry: 206493506
- Sigstore integration time:
-
Permalink:
sopel-irc/sopel-unicode@1c940126384b774c2a0b7a3efd97497e3d8776da -
Branch / Tag:
refs/tags/1.0.0 - Owner: https://github.com/sopel-irc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@1c940126384b774c2a0b7a3efd97497e3d8776da -
Trigger Event:
release
-
Statement type:
File details
Details for the file sopel_unicode-1.0.0-py3-none-any.whl.
File metadata
- Download URL: sopel_unicode-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cfd65c4a36e5e8872bf08b8547ba1503f5cb7adaa8541ebf65f058e5ca6b4ba1
|
|
| MD5 |
c5ab4cba2e59498356c34437d62c320c
|
|
| BLAKE2b-256 |
4a644335d8e95fff62e655292da18aaf01a8efeb7d98d87580050b0662849b7f
|
Provenance
The following attestation bundles were made for sopel_unicode-1.0.0-py3-none-any.whl:
Publisher:
pypi.yml on sopel-irc/sopel-unicode
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sopel_unicode-1.0.0-py3-none-any.whl -
Subject digest:
cfd65c4a36e5e8872bf08b8547ba1503f5cb7adaa8541ebf65f058e5ca6b4ba1 - Sigstore transparency entry: 206493507
- Sigstore integration time:
-
Permalink:
sopel-irc/sopel-unicode@1c940126384b774c2a0b7a3efd97497e3d8776da -
Branch / Tag:
refs/tags/1.0.0 - Owner: https://github.com/sopel-irc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@1c940126384b774c2a0b7a3efd97497e3d8776da -
Trigger Event:
release
-
Statement type: