Skip to main content

Named (and numeric) HTML entities to/from each other or Unicode

Project description

travisci PyPI Package latest release PyPI Package monthly downloads Supported versions Supported implementations Wheel packaging support Test line coverage

When reading HTML, named entities are neater and often easier to comprehend than numeric entities (whether in decimal or hexidecimal notation), Unicode characters, or a mixture. The ⊕ character, for example, is easier to recognize and remember as ⊕ than ⊕ or ⊕ or \u2295.

Because they use only pure ASCII characters, entities are safer to use in databases, files, emails, and other contexts, especially given the many encodings (UTF-8 and such) required to fit Unicode into byte-oriented storage–and the many platform variations and quirks seen along the way.

This module helps convert from whatever mixture of characters and/or entities you have into named HTML entities. Or, if you prefer, into numeric HTML entities (either decimal or hexadecimal). It will even help you go the other way, mapping entities into Unicode.

Usage

Python 2:

from namedentities import *

u = u'both em\u2014and–dashes…'

print "named:  ", repr(named_entities(u))
print "numeric:", repr(numeric_entities(u))
print "hex:"   ", repr(hex_entities(u))
print "unicode:", repr(unicode_entities(u))

yields:

named:   'both em—and–dashes…'
numeric: 'both em—and–dashes…'
hex:     'both em—and–dashes…'
unicode: u'both em\u2014and\u2013dashes\u2026'

You can do just about the same thing in Python 3, but you have to use a print function rather than a print statement, and prior to 3.3, you have to skip the u prefix that in Python 2 marks string literals as being Unicode literals. In Python 3.3 and following, however, you can start using the u marker again, if you like. While all Python 3 strings are Unicode, it helps with cross-version code compatibility. (You can use the six cross-version compatibility library, as the tests do.)

One good use for unicode_entities is to create cross-platform, cross-Python-version strings that conceptually contain Unicode characters, but spelled out as named (or numeric) HTML entities. For example:

unicode_entities('This ’thing” is great!')

This has the advantage of using only ASCII characters and common string encoding mechanisms, yet rendering full Unicode strings upon reconstitution. You can use the other functions, say named_entities(), to go from Unicode characters to named entities.

Other APIs

entities(text, kind) takes text and the kind of entities you’d like returned. kind can be 'named' (the default), 'numeric', 'hex', 'unicode', or 'none'. It’s an alternative to the more explicit individual functions such as named_entities.

unescape(text) changes all entities into Unicode characters. It has an alias, unicode_entities(text) for parallelism with the other APIs.

Encodings Akimbo

This module helps map string between HTML entities (named, numeric, or hex) and Unicode characters. It makes those mappings–previously somewhat obscure and nitsy–easy. Yay us! It will not, however, specifically help you with “encodings” of Unicode characters such as UTF-8; for these, use Python’s built-in features.

Python 3 tends to handle encoding/decoding pretty transparently. Python 2, however, does not. Use the decode string method to get (byte) strings including UTF-8 into Unicode; use encode to convert true unicode strings into UTF-8. Please convert them to Unicode before processing with namedentities:

s = "String with some UTF-8 characters..."
print named_entities(s.decode("utf-8"))

The best strategy is to convert data to full Unicode as soon as possible after ingesting it. Process everything uniformly in Unicode. Then encode back to UTF-8 etc. as you write the data out. This strategy is baked-in to Python 3, but must be manually accomplished in Python 2.

Notes

  • Version 1.8 acheives 100% test line coverage.

  • See CHANGES.yml for more historical changes.

  • Doesn’t attempt to encode <, >, or & (or their numerical equivalents) to avoid interfering with HTML escaping.

  • Automated multi-version testing managed with pytest and tox. Continuous integration testing with Travis-CI. Packaging linting with pyroma.

    Successfully packaged for, and tested against, all late-model versions of Python: 2.6, 2.7, 3.2, 3.3, 3.4, and 3.5 pre-release (3.5.0b3) as well as PyPy 2.6.0 (based on 2.7.9) and PyPy3 2.4.0 (based on 3.2.5).

  • This module started as basically a packaging of Ian Beck’s recipe. While it’s moved forward since then, Ian’s contribution to the core remains key. Thank you, Ian!

  • The author, Jonathan Eunice or @jeunice on Twitter welcomes your comments and suggestions.

Installation

To install or upgrade to the latest version:

pip install -U namedentities

To easy_install under a specific Python version (3.3 in this example):

python3.3 -m easy_install --upgrade namedentities

(You may need to prefix these with sudo to authorize installation. In environments without super-user privileges, you may want to use pip’s --user option, to install only for a single user, rather than system-wide.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

namedentities-1.8.0.zip (17.9 kB view details)

Uploaded Source

namedentities-1.8.0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

namedentities-1.8.0-py2.py3-none-any.whl (10.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file namedentities-1.8.0.zip.

File metadata

  • Download URL: namedentities-1.8.0.zip
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for namedentities-1.8.0.zip
Algorithm Hash digest
SHA256 e58f62eb0926b82a553f4c2b10e4505f1b00f5f3b077ec064798060fa47eaf2b
MD5 b747cd48264c556c874eb6b0ccb60647
BLAKE2b-256 3bee42e88e603db6e3a24a6b8ae6a53b0c69e44e90bbc98a034b67908248073a

See more details on using hashes here.

File details

Details for the file namedentities-1.8.0.tar.gz.

File metadata

File hashes

Hashes for namedentities-1.8.0.tar.gz
Algorithm Hash digest
SHA256 9451f68f186267862f15c5456ad11fd6bcc6f1462347f384b73bf531a3f848f4
MD5 99efbe98c0e855df2d7b4a605cc623c3
BLAKE2b-256 47cb5c7626a3aa7b3682a7330a6872d644c7d79ab1e808d8db31e8d883f9458d

See more details on using hashes here.

File details

Details for the file namedentities-1.8.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for namedentities-1.8.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2e5599b058ed6198cabe2fd233c9cca03f9aa2c2db499d87a3c5417ce15f7f15
MD5 940fa3f678cb8d827f325d1641e119e0
BLAKE2b-256 a762158cf12fdfad28f150657aeaa01940fd1154dc0d8c95e7f09c2447aefea9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page