Skip to main content

Provides pure ASCII transliterations of Unicode strings.

Project description

Python Fast Unidecode

Build Python version Tests License: MIT


This repo is a fork of the rust-unicode repository and transports the original Rust implementation to be used with Python. It also implements a couple of source code changes to hasten a translation of ASCII family of characters and makes this implementation on par with Python unidecode implementation on this set of characters.

The overall result is this package should provide you with the same output as the aforementioned Python implementation. However, this package is much faster on a translation of non-ASCII characters (>~3x) and comparable to slightly slowe on ASCII characters (in a degree of small percents) on average based on the benchmark/speed_benchmark.py benchmark (depending on caching, etc.; sometimes, a translation of non-ASCII characters provides you with a speedup of up to >10x). The benchmarks were run on Python 3.13.

License

This project is licensed under the MIT License.

Important Note: Unlike the original Python unidecode package, which is distributed under the restrictive GNU General Public License (GPL), fast-unidecode is released under the permissive MIT license. This makes it suitable for use in a wider range of projects, including commercial and closed-source applications. For SaaS (Software as a Service) companies, using a GPL-licensed library can create an obligation to release your own source code, a requirement that the MIT license does not have.

Benchmark code is not a part of the distributed package.

Installation

pip install fast_unidecode
Installation from source

First, you need to build the package using maturin, then install fast_unidecode simply with pip.

maturin build --release
pip install target/wheels/fast_unidecode...

Usage

>>> from fast_unidecode import unidecode

>>> print(unidecode("Æneid"))
'AEneid'

>>> print(unidecode("北亰"))
'Bei Jing'
rust-unidecode (Original README.md)

Documentation

The rust-unidecode library is a Rust port of Sean M. Burke's famous Text::Unidecode module for Perl. It transliterates Unicode strings such as "Æneid" into pure ASCII ones such as "AEneid." For a detailed explanation on the rationale behind using such a library, you can refer to both the documentation of the original module and this article written by Burke in 2001.

The data set used to translate the Unicode was ported directly from the Text::Unidecode module using a Perl script, so rust-unidecode should produce identical output.

Examples

extern crate unidecode;
use unidecode::unidecode;

assert_eq!(unidecode("Æneid"), "AEneid");
assert_eq!(unidecode("étude"), "etude");
assert_eq!(unidecode("北亰"), "Bei Jing");
assert_eq!(unidecode("ᔕᓇᓇ"), "shanana");
assert_eq!(unidecode("げんまい茶"), "genmaiCha ");

Guarantees and Warnings

Here are some guarantees you have when calling unidecode():

  • The String returned will be valid ASCII; the decimal representation of every char in the string will be between 0 and 127, inclusive.
  • Every ASCII character (0x0000 - 0x007F) is mapped to itself.
  • All Unicode characters will translate to a string containing newlines ("\n") or ASCII characters in the range 0x0020 - 0x007E. So for example, no Unicode character will translate to \u{01}. The exception is if the ASCII character itself is passed in, in which case it will be mapped to itself. (So '\u{01}' will be mapped to "\u{01}".)

There are, however, some things you should keep in mind:

  • As stated, some transliterations do produce \n characters.
  • Some Unicode characters transliterate to an empty string, either on purpose or because rust-unidecode does not know about the character.
  • Some Unicode characters are unknown and transliterate to "[?]".
  • Many Unicode characters transliterate to multi-character strings. For example, 北 is transliterated as "Bei ".

This information was paraphrased from the original Text::Unidecode documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fast_unidecode-1.0.1-cp313-cp313-manylinux_2_28_x86_64.whl (546.9 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

fast_unidecode-1.0.1-cp313-cp313-macosx_11_0_arm64.whl (322.9 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

fast_unidecode-1.0.1-cp313-cp313-macosx_10_12_x86_64.whl (319.8 kB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

fast_unidecode-1.0.1-cp312-cp312-manylinux_2_28_x86_64.whl (546.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

fast_unidecode-1.0.1-cp312-cp312-macosx_11_0_arm64.whl (322.8 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

fast_unidecode-1.0.1-cp312-cp312-macosx_10_12_x86_64.whl (319.6 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

fast_unidecode-1.0.1-cp311-cp311-manylinux_2_28_x86_64.whl (547.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

fast_unidecode-1.0.1-cp311-cp311-macosx_11_0_arm64.whl (325.5 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

fast_unidecode-1.0.1-cp311-cp311-macosx_10_12_x86_64.whl (322.5 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

fast_unidecode-1.0.1-cp310-cp310-manylinux_2_28_x86_64.whl (547.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file fast_unidecode-1.0.1-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.1-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b90169990c2f1cfead674634225df965013a667163a68967487d49203c9252bf
MD5 f0e472447ad4a666da7e496ebae7dbe8
BLAKE2b-256 64f17043bbf7444688c3f6be0dcd1fae325d60256c4bda62eb2829d08dd22f4f

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 84eefd13662fc11f8972a5d3d8fa750081f736b8cdf77e25e46f2424a2955a57
MD5 0a3ab48d8313e439538d10610f634200
BLAKE2b-256 ba219ef16427aec50b97f3de7d9895b7cef950dcad0db75c17bae2a25aff314e

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.1-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.1-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a501e0d382e44cd32f7314434c3f66d2dab98ebad239768c47fe3330135f9679
MD5 618bbd6903346571e0f198f85c92fb0d
BLAKE2b-256 8819cefe6fd65df24562decb58727ae15fa897fae5a1c47ef437985a004ce536

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.1-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.1-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1011b1553f5d1bacbfe98dccbd7a7585a4f474e9e9357a7743ba6da179175e80
MD5 e5aafe66e2fa72060d8fc48987593020
BLAKE2b-256 c852505a922071b75bffe54f1b683e294c719fc42478c5e543bbc963aea769a9

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 36f3bbe6ae08889292d4fa7073fb8330382b9b72a4dcc94b6f27ff85bdbab01d
MD5 e11c5eddcbe157ad220be26360d846e2
BLAKE2b-256 2147c923151cfc1d8a07966d782a64e45f8b07eb67b3a3ce01280f95e1e2868e

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.1-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.1-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7d0c676c404e8fb2425743d1630e9b38e9df4f498ea31aae5c8433843d4ef706
MD5 74a73c1ab45a73dffc775966d358a321
BLAKE2b-256 b6959ae3dee5705b8568e81fe64a6f93f90a33a90304d137adcd306ef3939222

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.1-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.1-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 30de000a0fbc34ef0e826b21e51177e2ddc40586d1dc1a3852bcb20e61d002bc
MD5 820f7ea62caa515cf8adb5e1e843eb61
BLAKE2b-256 6dbaec18ea94a32e1a48b0df659154c61a94713ca12759e490df252ccbe9436f

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 94a7c5c64f44c093fe49f45e47de41ee687bb4fe4b11f1caa71904dca8d2b9f6
MD5 8db8557d36cb88d0d3ebf2a3b75864c4
BLAKE2b-256 a87ab5f16103e3d94a3e46a7ddc15fe198715002b8785a2fa10fb583d302c6a4

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.1-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.1-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f620022043ea84f70a95a7b283579713c23db20b1904890412a8dae117d3f975
MD5 0c23b9a0071385e2e9ca2efb5c411dfb
BLAKE2b-256 d8cb8cbd83913fda941b7dd79af24905a55f4541c4aaf916d4c6c2de1f7ab7df

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.1-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.1-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 00dda4e75529b095e47596a41c90bb5461ffacac683eabfa2a73b5daafbc6fa0
MD5 d39df09a73b120a2f370c3c14811693f
BLAKE2b-256 91032c4b38bc1293ec86b80d387341e5166ed4e46d0dc6ccca2e5dd5f82b8279

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page