Skip to main content

Named (and numeric) HTML entities to/from each other or Unicode

Project description

travisci PyPI Package latest release PyPI Package monthly downloads Supported versions Supported implementations

When reading HTML, named entities are often neater and easier to comprehend than numeric entities, Unicode (or other charset) characters, or a mixture of all of the above. The ⊕ character, for example, is easier to recognize and remember as ⊕ than ⊕ or ⊕ or \u2295.

Because they fall within the ASCII range, entities are also much safer to use in databases, files, emails, and other contexts than Unicode is, given the various encodings (UTF-8 and such) required.

This module helps convert from whatever characters or entities you have into either named or numeric (either decimal or hexidecimal) HTML entities. Or, if you prefer, it will conversely help you go the other way, mapping all entities into Unicode.

Usage

Python 2:

from namedentities import *

u = u'both em\u2014and–dashes…'

print "named:  ", repr(named_entities(u))
print "numeric:", repr(numeric_entities(u))
print "hex:"   ", repr(hex_entities(u))
print "unicode:", repr(unicode_entities(u))

yields:

named:   'both em—and–dashes…'
numeric: 'both em—and–dashes…'
hex:     'both em—and–dashes…'
unicode: u'both em\u2014and\u2013dashes\u2026'

You can do just about the same thing in Python 3, but you have to use a print function rather than a print statement, and prior to 3.3, you have to skip the u prefix that in Python 2 marks string literals as being Unicode literals. In Python 3.3 and following, however, you can start using the u marker again, if you like. While all Python 3 strings are Unicode, it helps with cross-version code compatibility. (You can use the six cross-version compatibility library, as the tests do.)

One good use for unicode_entities is to create cross-platform, cross-Python-version strings that conceptually contain Unicode characters, but spelled out as named (or numeric) HTML entities. For example:

unicode_entities('This ’thing” is great!')

This has the advantage of using only ASCII characters and common string encoding mechanisms, yet rendering full Unicode strings upon reconstitution. You can use the other functions, say named_entities(), to go from Unicode characters to named entities.

Other APIs

entities(text, kind) takes text and the kind of entities you’d like returned. kind can be 'named' (the default), 'numeric', 'hex', 'unicode', or 'none'. It’s an alternative to the more explicit individual functions such as named_entities.

unescape(text) changes all entities into Unicode characters. It has an alias, unicode_entities(text) for parallelism with the other APIs.

Encodings Akimbo

This module helps map string between HTML entities (named, numeric, or hex) and Unicode characters. It makes those mappings–previously somewhat obscure and nitsy–easy. Yay us! It will not, however, specifically help you with “encodings” of Unicode characters such as UTF-8; for these, use Python’s built-in features.

Python 3 tends to handle encoding/decoding with a fair degree of transparency. Python 2, however, manifestly does not. Use the decode string method to get (byte) strings including UTF-8 into Unicode; use``encode`` to convert true unicode strings into UTF-8. Please convert them to Unicode before processing with namedentities:

s = "String with some UTF-8 characters..."
print named_entities(s.decode("utf-8"))

The best strategy is to convert data to full Unicode as soon as possible after ingesting it. Process in Unicode. Then encode back to UTF-8 etc. as you write the data out. This strategy is baked-in to Python 3, but must be manually handled in Python 2.

Notes

  • 1.6.6 improves docs and inaugurates testing under Travis CI.

  • 1.6.5 updates the testing matrix, packaging, and documentation. All vestiges of support for Python 2.5 and PyPy 1.9 and earlier are officially withdrawn; if you’re still back there, upgrade already!

  • See CHANGES.rst for additional changes.

  • Doesn’t attempt to encode <, >, or & (or their numerical equivalents) to avoid interfering with HTML escaping.

  • Automated multi-version testing managed with the wonderful pytest and tox. Successfully packaged for, and tested against, all late-model versions of Python: 2.6, 2.7, 3.2, 3.3, and 3.4, as well as PyPy 2.6.0 (based on 2.7.9) and PyPy3 2.4.0 (based on 3.2.5). Should run fine on Python 3.5, though py.test is broken on its pre-release iterations.

  • This module started as basically a packaging of Ian Beck’s recipe. While it’s moved forward since then, it’s still mostly Ian under the covers. Thank you, Ian!

Installation

To install or upgrade to the latest version:

pip install -U namedentities

To easy_install under a specific Python version (3.3 in this example):

python3.3 -m easy_install --upgrade namedentities

(You may need to prefix these with sudo command to authorize installation. In environments without super-user privileges, you may want to use pip’s --user option, to install only for a single user, rather than system-wide.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

namedentities-1.6.6.zip (16.2 kB view details)

Uploaded Source

namedentities-1.6.6.tar.gz (7.7 kB view details)

Uploaded Source

File details

Details for the file namedentities-1.6.6.zip.

File metadata

  • Download URL: namedentities-1.6.6.zip
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for namedentities-1.6.6.zip
Algorithm Hash digest
SHA256 425cb5f945a7d949b4b5a3f27b41a6fa99c72ee2be1298a4cd5cb55f01ea745b
MD5 8634f78956856806b58d53b08983cff4
BLAKE2b-256 ed6580978f02ad0a5c76adfe64b076033d605d5bc5674c610f4dfc7b0774dcbd

See more details on using hashes here.

File details

Details for the file namedentities-1.6.6.tar.gz.

File metadata

File hashes

Hashes for namedentities-1.6.6.tar.gz
Algorithm Hash digest
SHA256 57eb032327b3858fb75a1d8a1223b58d5c8d434cdc43b7fbc5e19872e5288bf5
MD5 21b1da0200d337a629158153c3cfdb27
BLAKE2b-256 9d343a33cd247a24b90a72f8793bb842fdd35fe85e312cf048dc0aa4f9868f1b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page