Skip to main content

A tool that converts arbitrary text (like user input or file names) into valid Python identifiers while preserving as much of the original meaning as possible.

Project description

A tool that converts arbitrary text (like user input or file names) into valid Python identifiers while preserving as much of the original meaning as possible.

Installation

pip install unicode-text-to-identifier

Usage

# coding=utf-8
from unicode_text_to_identifier import unicode_text_to_identifier

assert unicode_text_to_identifier(u"") == u"_"
assert unicode_text_to_identifier(u" \r\n\t") == u"_"
assert unicode_text_to_identifier(u"123abc") == u"_123abc"
assert unicode_text_to_identifier(u"&abc 123") == u"_abc_123"
assert unicode_text_to_identifier(u"  hello  world  $") == u"_hello_world__"
assert unicode_text_to_identifier(u"测试@unicode") == u"测试_unicode"

How It Works

First, get_unicode_character_type() categorizes each Unicode character into one of four types:

  • LETTER_OR_UNDERSCORE (Unicode category starting with 'L' or underscore _)
  • DECIMAL_DIGIT (Unicode category 'Nd')
  • SPACE_OR_CONTROL (Unicode categories starting with 'Z' or 'Cc')
  • OTHER (all other characters)

Then, it implements the following conversion rules using a state machine:

  • The first character must be a letter/underscore. If the first character is a digit, prepend _ to make it valid.
  • Subsequent valid characters (letters/underscores/digits) are kept as-is.
  • Other characters are replaced with underscores, but whitespace/control character sequences are collapsed into single underscores.
  • Ensures the output is non-empty (appends _ if empty).

Contributing

Contributions are welcome! Please submit pull requests or open issues on the GitHub repository.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicode_text_to_identifier-0.1.0a0.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unicode_text_to_identifier-0.1.0a0-py2.py3-none-any.whl (4.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file unicode_text_to_identifier-0.1.0a0.tar.gz.

File metadata

File hashes

Hashes for unicode_text_to_identifier-0.1.0a0.tar.gz
Algorithm Hash digest
SHA256 43d1734d68a8eedabf8e6517c493c1a290fe71616181e03981dab6dabe8bc627
MD5 e49c441825cea2a8d2217fd6d8d87b84
BLAKE2b-256 e4992d53738240bb15098fb34f8dea0986fee0053f9a806f3d1254d19a790ce7

See more details on using hashes here.

File details

Details for the file unicode_text_to_identifier-0.1.0a0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for unicode_text_to_identifier-0.1.0a0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 08d7cdd6f7d946fe346566084dea6155837697fdb19a068812c404efb02fa1e4
MD5 2e2266b93c5c8ff05a31a4cf18c093a3
BLAKE2b-256 c2793cabfe2bad1f4f991708ca21cd95d68b129ff62c832ed6a29863cafc46e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page