Skip to main content

Unicode information lookup plugin for Sopel

Project description

sopel-unicode

Sopel plugin for information lookup on Unicode codepoints.

This plugin is designed as a drop-in replacement for the built-in unicode_info plugin. It provides:

  • The General Category of each codepoint
  • Optional support for Unicode Character Database (UCD) versions newer than the Python release used to run Sopel, if unicodedata2 is available
  • Optional support for reporting the Unicode version that introduced each codepoint, if unicode_age is available

Installing

Releases are hosted on PyPI, so after installing Sopel, all you need is pip:

$ pip install sopel-unicode

Disable built-in unicode_info plugin

You should edit your Sopel core config to add unicode_info to the exclude plugin list, otherwise you will get duplicated responses from both plugins.

Optional features

sopel-unicode is designed to act as a drop-in replacement for the original built-in plugin on a basic installation. To enable all the optional features:

$ pip install sopel-unicode[all]

Usage

Note that output given in this section corresponds to sopel-unicode[all] except where noted. Output layout may differ if some optional dependencies are missing.

Codepoint lookup

The unicode (short-form u) command provides lookup of codepoints in a provided string. Input characters defined by the configuration option ignore_chars are ignored.

Lookup uses unicodedata2 if it is available, and falls back on stdlib unicodedata otherwise.

<SnoopJ> .unicode 🫩
<terribot> [unicode] (🫩): U+1FAE9 v16.0 (So) FACE WITH BAGS UNDER EYES
<SnoopJ> .u 🏴‍☠ 
<terribot> [unicode] (🏴): U+1F3F4 v7.0 (So) WAVING BLACK FLAG
<terribot> [unicode] (‍): U+200D v1.1 (Cf) ZERO WIDTH JOINER
<terribot> [unicode] (☠): U+2620 v1.1 (So) SKULL AND CROSSBONES

It is sometimes convenient to discard all ASCII characters from lookup, which can be done with the unicode:noascii(u:noascii) command:

<SnoopJ> .u:noascii ça va?
<terribot> [unicode] (ç): U+00E7 v1.1 (Ll) LATIN SMALL LETTER C WITH CEDILLA

The unicode:raw (u:raw) command is provided to avoid discarding any codepoints when performing lookup.

<SnoopJ> .unicode:raw a b
<terribot> [unicode] (a): U+0061 v1.1 (Ll) LATIN SMALL LETTER A
<terribot> [unicode] ( ): U+0020 v1.1 (Zs) SPACE
<terribot> [unicode] (b): U+0062 v1.1 (Ll) LATIN SMALL LETTER B

Individual codepoints can also be looked up with hex notation, in either U+NNNN form, 0xNNNN form, or \uNNNN form.

<SnoopJ> .unicode U+037E
<terribot> [unicode] (;): U+037E v1.1 (Po) GREEK QUESTION MARK
<SnoopJ> .u 0xBEEF
<terribot> [unicode] (뻯): U+BEEF v2.0 (Lo) HANGUL SYLLABLE BBEGS
<SnoopJ> .u \u732b
<terribot> [unicode] (猫): U+732B v1.1 (Lo) CJK UNIFIED IDEOGRAPH-732B

Note that the \u notation is not restricted in the same way as the same notation for Python literals. You may use as many or as few hex digits as you like.

<SnoopJ> .u \u1
<terribot> [unicode] (): U+0001 v1.1 (Cc) START OF HEADING
<SnoopJ> .u \u12345
<terribot> [unicode] (𒍅): U+12345 v5.0 (Lo) CUNEIFORM SIGN URU TIMES KI

Normalization forms

The Unicode normalization forms are available to transform input strings.

Input characters defined by the configuration option ignore_chars are ignored.

<SnoopJ> .unicode:NFKD ça va
<terribot> [unicode] (c): U+0063 v1.1 (Ll) LATIN SMALL LETTER C
<terribot> [unicode] (◌̧): U+0327 v1.1 (Mn) COMBINING CEDILLA
<terribot> [unicode] (a): U+0061 v1.1 (Ll) LATIN SMALL LETTER A
<terribot> [unicode] (v): U+0076 v1.1 (Ll) LATIN SMALL LETTER V
<terribot> [unicode] (a): U+0061 v1.1 (Ll) LATIN SMALL LETTER A
<SnoopJ> .u:NFKC ça va
<terribot> [unicode] (ç): U+00E7 v1.1 (Ll) LATIN SMALL LETTER C WITH CEDILLA
<terribot> [unicode] (a): U+0061 v1.1 (Ll) LATIN SMALL LETTER A
<terribot> [unicode] (v): U+0076 v1.1 (Ll) LATIN SMALL LETTER V
<terribot> [unicode] (a): U+0061 v1.1 (Ll) LATIN SMALL LETTER A

Codepoint search

A rudimentary search functionality is available. The maximum number of matches reported can be configured, as many queries produce a large number of results.

<SnoopJ> .unicode:search apple
<terribot> [unicode] 3 results:
<terribot> [unicode] 🍍 U+1f34d PINEAPPLE
<terribot> [unicode] 🍎 U+1f34e RED APPLE
<terribot> [unicode] 🍏 U+1f34f GREEN APPLE

Configuring

The easiest way to configure sopel-unicode is via Sopel's configuration wizard—simply run sopel-plugins configure sopel-unicode and enter the values for which it prompts you.

Field Description Default (if any)
max_length Maximum length of Unicode string input 5
length_override_channels Channels where max_length does not apply []
ignore_characters Characters ignored during lookup [' ']
search_max_matches Maximum number of matches for a codepoint search 10
search_num_public_matches Number of matches publicly reported for a codepoint search 2

Changelog

1.0.0

First release of sopel-unicode.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sopel_unicode-1.0.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sopel_unicode-1.0.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file sopel_unicode-1.0.0.tar.gz.

File metadata

  • Download URL: sopel_unicode-1.0.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sopel_unicode-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9cb2ea1f1d300660864e1c14e0fe7a2818ee7a8da7e5fa551ea46e411a375b8d
MD5 0d35a585de7f847b08ea749f1de71114
BLAKE2b-256 c33d4a5b5d9c34e6ba68181ac3b4593421b105c30518c45baf7088517e254cbd

See more details on using hashes here.

Provenance

The following attestation bundles were made for sopel_unicode-1.0.0.tar.gz:

Publisher: pypi.yml on sopel-irc/sopel-unicode

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sopel_unicode-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: sopel_unicode-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sopel_unicode-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cfd65c4a36e5e8872bf08b8547ba1503f5cb7adaa8541ebf65f058e5ca6b4ba1
MD5 c5ab4cba2e59498356c34437d62c320c
BLAKE2b-256 4a644335d8e95fff62e655292da18aaf01a8efeb7d98d87580050b0662849b7f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sopel_unicode-1.0.0-py3-none-any.whl:

Publisher: pypi.yml on sopel-irc/sopel-unicode

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page