Skip to main content

Provides pure ASCII transliterations of Unicode strings.

Project description

Python Fast Unidecode

Build Python version Tests License: MIT


This repo is a fork of the rust-unicode repository and transports the original Rust implementation to be used with Python. It also implements a couple of source code changes to hasten a translation of ASCII family of characters and makes this implementation on par with Python unidecode implementation on this set of characters.

The overall result is this package should provide you with the same output as the aforementioned Python implementation. However, this package is much faster on a translation of non-ASCII characters (>~3x) and comparable to slightly slowe on ASCII characters (in a degree of small percents) on average based on the benchmark/speed_benchmark.py benchmark (depending on caching, etc.; sometimes, a translation of non-ASCII characters provides you with a speedup of up to >10x). The benchmarks were run on Python 3.13.

License

This project is licensed under the MIT License.

Important Note: Unlike the original Python unidecode package, which is distributed under the restrictive GNU General Public License (GPL), fast-unidecode is released under the permissive MIT license. This makes it suitable for use in a wider range of projects, including commercial and closed-source applications. For SaaS (Software as a Service) companies, using a GPL-licensed library can create an obligation to release your own source code, a requirement that the MIT license does not have.

Benchmark code is not a part of the distributed package.

Installation

pip install fast_unidecode
Installation from source

First, you need to build the package using maturin, then install fast_unidecode simply with pip.

maturin build --release
pip install target/wheels/fast_unidecode...

Usage

>>> from fast_unidecode import unidecode

>>> print(unidecode("Æneid"))
'AEneid'

>>> print(unidecode("北亰"))
'Bei Jing'
rust-unidecode (Original README.md)

Documentation

The rust-unidecode library is a Rust port of Sean M. Burke's famous Text::Unidecode module for Perl. It transliterates Unicode strings such as "Æneid" into pure ASCII ones such as "AEneid." For a detailed explanation on the rationale behind using such a library, you can refer to both the documentation of the original module and this article written by Burke in 2001.

The data set used to translate the Unicode was ported directly from the Text::Unidecode module using a Perl script, so rust-unidecode should produce identical output.

Examples

extern crate unidecode;
use unidecode::unidecode;

assert_eq!(unidecode("Æneid"), "AEneid");
assert_eq!(unidecode("étude"), "etude");
assert_eq!(unidecode("北亰"), "Bei Jing");
assert_eq!(unidecode("ᔕᓇᓇ"), "shanana");
assert_eq!(unidecode("げんまい茶"), "genmaiCha ");

Guarantees and Warnings

Here are some guarantees you have when calling unidecode():

  • The String returned will be valid ASCII; the decimal representation of every char in the string will be between 0 and 127, inclusive.
  • Every ASCII character (0x0000 - 0x007F) is mapped to itself.
  • All Unicode characters will translate to a string containing newlines ("\n") or ASCII characters in the range 0x0020 - 0x007E. So for example, no Unicode character will translate to \u{01}. The exception is if the ASCII character itself is passed in, in which case it will be mapped to itself. (So '\u{01}' will be mapped to "\u{01}".)

There are, however, some things you should keep in mind:

  • As stated, some transliterations do produce \n characters.
  • Some Unicode characters transliterate to an empty string, either on purpose or because rust-unidecode does not know about the character.
  • Some Unicode characters are unknown and transliterate to "[?]".
  • Many Unicode characters transliterate to multi-character strings. For example, 北 is transliterated as "Bei ".

This information was paraphrased from the original Text::Unidecode documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fast_unidecode-1.0.2-cp313-cp313-manylinux_2_28_x86_64.whl (546.9 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

fast_unidecode-1.0.2-cp313-cp313-manylinux_2_28_aarch64.whl (567.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

fast_unidecode-1.0.2-cp313-cp313-macosx_11_0_arm64.whl (322.9 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

fast_unidecode-1.0.2-cp313-cp313-macosx_10_12_x86_64.whl (319.8 kB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

fast_unidecode-1.0.2-cp312-cp312-manylinux_2_28_x86_64.whl (546.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

fast_unidecode-1.0.2-cp312-cp312-manylinux_2_28_aarch64.whl (566.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

fast_unidecode-1.0.2-cp312-cp312-macosx_11_0_arm64.whl (322.8 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

fast_unidecode-1.0.2-cp312-cp312-macosx_10_12_x86_64.whl (319.5 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

fast_unidecode-1.0.2-cp311-cp311-manylinux_2_28_x86_64.whl (547.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

fast_unidecode-1.0.2-cp311-cp311-manylinux_2_28_aarch64.whl (567.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

fast_unidecode-1.0.2-cp311-cp311-macosx_11_0_arm64.whl (325.5 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

fast_unidecode-1.0.2-cp311-cp311-macosx_10_12_x86_64.whl (322.4 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

fast_unidecode-1.0.2-cp310-cp310-manylinux_2_28_x86_64.whl (547.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

fast_unidecode-1.0.2-cp310-cp310-manylinux_2_28_aarch64.whl (567.5 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ ARM64

File details

Details for the file fast_unidecode-1.0.2-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f3f98090c3fed8ffd91826a8f9eee8bd14d6449d0eb46f7637e94c31b9621294
MD5 304456a9df1c9ab6e94e0bb8e526df19
BLAKE2b-256 c3c84b0fcbc5e1afebaaa9723421e00dea94bd8a3234dbf13b133203b1ceccc8

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.2-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 867c610a71f1e7c8fae89195f8aef855a8ad699df5788ed0cc5407d24241a586
MD5 41b0abe4f8d67896ed86d75d6a96943f
BLAKE2b-256 4d8688ebae3420264e8bb00d21d2df9d10a52b8112b356097998c801d159388f

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fc23d809cd3c93c4ebfcef322f3efee52278e2cdfe4301fdab4c8e1774a863e9
MD5 41209b7c14683366387a5d617a0139a6
BLAKE2b-256 a1ec1e04f74a032be57c41b43c660e0c56ae9fdae1a33ce7ee00ffc43447e23b

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.2-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 68b587fb7a392239f493d45f3647c624cbcde01a1a3a7365d15702df13bb287e
MD5 df21df7b838d5bf36a95bb53636cfb9e
BLAKE2b-256 30b992be716245d516ce4e646e60cb15efaabcc9207e38b3c1726545cc5533c3

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.2-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1c84d4bcd708171e5ae906ed2eda6c0e327b0431567a873a5ee49a8c789c385d
MD5 2bbd60be3b785d49ea572da7a5a5f199
BLAKE2b-256 d8ef329fa2adb97d4d27815a04de651d36ea069ce2623577a7e4f131e7802a14

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.2-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a8e73e6f1f8c6bdeeba098a18f6d8b85aa374d6cbdf5da075f5151bc0a261a14
MD5 48724c75583793254fcfc9c72e8f12cf
BLAKE2b-256 07ccd53a7c2033b3da4b3c9f9be6eab1421aed6e93ed18b0ec5a8d3f76176a92

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9a57a6d13a9cce17663cb28b92373077d8027839b364325916a818eb8ac30eea
MD5 f52c11e64bb0f46334e2e8cbc12696d3
BLAKE2b-256 0c9ae4bd5c66f13402887107f0e5f6505b30c70063d9952ef02251846f4d5ed9

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.2-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 52d114383e3f2360b2e930d76f4187ff285d86199e3bb7b6cebfdf33c7db8e4d
MD5 bb1e8f8c1bdbc5f2aed14e9622345594
BLAKE2b-256 3b90f9369118d80127189a83ca7b644daf27c9bd4853e87499bdc77c5ba79734

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.2-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8154e4bbae741cfc428a09bea5316466dab6e9203da1e0d251edc4c37352dbe7
MD5 1060f280bf8016bc47f6329521ad6f76
BLAKE2b-256 26438183be4a0f4cb5f06fa07fb1480a2fc4a0610974d49b97bb6335349ccc69

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.2-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 78c28c168edfb40eabe4546e592518017a553ade4f3dfc78cd350197de239634
MD5 5cb8db81098a34459384a8dd2c5d7f74
BLAKE2b-256 023bc7e28b9a92952c925d63a0628cf28853e0af210ea9a32c6ed85f555646a7

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 80cbf9438594fdebd7dc2433631fdd7de73c971121560c88d58d8c394b5c03e9
MD5 cd606de61609728030c1738bd006156d
BLAKE2b-256 c4719b06a1ffff666071aa8cfa88468c0894c09d67148ebb13d4617415582ce3

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.2-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 79a79e7ab92c472801ddda7f6e48400ee0ac1e7b965a98d125d10233be576623
MD5 9d301c5af2b08b4c3df870a9fd0709d8
BLAKE2b-256 5583d0bc8508238e022b9aa3d3df67ab99f86da1a6c66d44438a46a95b803f54

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.2-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6787a88d290e63df4c91ebbe2bdabf9d89c6b472f8652a5252acfbe0865693ec
MD5 6e55011115934b10da1e60f5901b6451
BLAKE2b-256 a26a9a48b873d6bc59bb322f19ecf47272d187f431d2aa88d8f9d60934306faa

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.2-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.2-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 df24be9e50a0e5b111b0a4684ed9bbe6d70a019ddead894a1775e3f89272702b
MD5 374b911166d2ba08baa430deab4aa395
BLAKE2b-256 afb4bb043c8c58a7e60063e2a0b839518e02978e93247573d99aa5b02d8de05a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page