Unicode to ASCII transliteration

These details have not been verified by PyPI

Project links

Homepage

Project description

Any-Ascii

Unicode to ASCII transliteration

Description
Examples
Implementations
- CLI
- Go
- Java
- Node.js
- Python
- Ruby
- Rust
See Also

Description

Converts Unicode text to a reasonable representation using only ASCII.

Unicode is the universal character set, a global standard to support all the world's languages. It consists of 130,000+ characters used by 150 writing systems. Along with characters used in language, it also contains various technical symbols, emojis, and other symbolic characters. The String type in programming languages usually corresponds to Unicode text. Whenever text is used digitally on computers or the internet it is almost always represented using Unicode characters. Unicode characters are not stored directly but instead encoded into bytes using an encoding, typically UTF-8.

ASCII is the most compatible character set, established in 1967. It is a subset of Unicode and UTF-8 consisting of 128 characters using 7-bits in the range 0x00 - 0x7F. The printable characters are English letters, digits, and punctuation in the range 0x20 - 0x7E, with the remaining being control characters. All of the characters found on a standard US keyboard correspond to the printable ASCII characters.

Conversion into the Latin script used by English and ASCII is called romanization.

When converting between writing systems there are multiple properties that can be preserved:

Meaning: Translation replaces text with an equivalent in the target language with the same meaning. This relies heavily on context and automatic translation is extremely complicated.
Appearance: Preserving the visual appearance of a character when converting between languages is rarely possible and requires readers to have knowledge of the source language.
Sound: Orthographic transcription uses the spelling and pronunciation rules of the target language to produce text that a speaker of the target language will pronounce as accurately as possible to the original.
Spelling: Transliteration converts each letter individually using predictable rules. An unambiguous transliteration allows for reconstruction of the original text by using unique mappings for each letter. A phonetic transliteration instead uses the most phonetically accurate mappings which may result in duplicates or ambiguity.

Any-Ascii is a transliteration, it converts text character-by-character without considering the context. Characters used in language are converted using the most popular already existing transliteration scheme for each language, with small modifications. Symbolic characters are instead converted based on their meaning or appearance.

Examples

Language	Script	Input	Output	Actual
French	Latin	René François Lacôte	Rene Francois Lacote	Rene Francois Lacote
German	Latin	Großer Hörselberg	Grosser Horselberg	Grosser Hoerselberg
Vietnamese	Latin	Trần Hưng Đạo	Tran Hung Dao	Tran Hung Dao
Norwegian	Latin	Nærøy	Naeroy	Naroy
Ancient Greek	Greek	Φειδιππίδης	Feidippidis	Pheidippides
Modern Greek	Greek	Δημήτρης Φωτόπουλος	Dimitris Fotopoylos	Dimitris Fotopoulos
Russian	Cyrillic	Борис Николаевич Ельцин	Boris Nikolaevich El'tsin	Boris Nikolayevich Yeltsin
Hebrew	Hebrew	אברהם הלוי פרנקל	'vrhm hlvy frnkl	Abraham Halevi Fraenkel
Mandarin Chinese	Han	深圳	ShenZhen	Shenzhen
Cantonese Chinese	Han	深水埗	ShenShuiBu	Sham Shui Po
Korean	Hangul	화성시	hwaseongsi	Hwaseong-si
Korean	Han	華城市	HuaChengShi	Hwaseong-si
Japanese	Hiragana	さいたま	saitama	Saitama
Japanese	Han	埼玉県	QiYuXian	Saitama-ken
Japanese	Katakana	トヨタ	toyota	Toyota
Unified English Braille	Braille	⠠⠎⠁⠽⠀⠭⠀⠁⠛	^say x ag	Say it again

Implementations

CLI

$ anyascii άνθρωποι
anthropoi

Use cd rust && cargo build --release to build a native executable to rust/target/release/anyascii

Go

package main

import (
    "github.com/hunterwb/any-ascii"
)

func main() {
    s := anyascii.Transliterate("άνθρωποι")
    // anthropoi
}

Java

String s = AnyAscii.transliterate("άνθρωποι");
// anthropoi

Java 6+ compatible

Available through JitPack

Maven

<repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
</repositories>

<dependency>
    <groupId>com.hunterwb</groupId>
    <artifactId>any-ascii</artifactId>
    <version>0.1.1</version>
</dependency>

Gradle

repositories {
    maven { url 'https://jitpack.io' }
}

dependencies {
    implementation 'com.hunterwb:any-ascii:0.1.1'
}

Node.js

const anyAscii = require('any-ascii');

const s = anyAscii('άνθρωποι');
// anthropoi

Node.js 4+ compatible

Install latest release: npm install any-ascii

Install pre-release: npm install hunterwb/any-ascii

Python

from anyascii import anyascii

s = anyascii('άνθρωποι')
#  anthropoi

Python 3.3+ compatible

Install latest release: pip install anyascii

Install pre-release: pip install https://github.com/hunterwb/any-ascii/archive/master.zip#subdirectory=python

Ruby

require 'any_ascii'

s = AnyAscii.transliterate('άνθρωποι')
# anthropoi

Use pre-release:

# Gemfile
gem 'any_ascii', git: 'https://github.com/hunterwb/any-ascii', glob: 'ruby/any_ascii.gemspec'

Rust

use any_ascii::any_ascii;

let s = any_ascii("άνθρωποι");
// anthropoi

Use pre-release:

[dependencies]
any_ascii = { git = "https://github.com/hunterwb/any-ascii" }

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.3

Jun 29, 2025

0.3.2

Mar 16, 2023

0.3.1

Apr 6, 2022

0.3.0

Sep 3, 2021

0.2.0

Apr 18, 2021

0.1.7

Oct 19, 2020

0.1.6

Jul 28, 2020

0.1.5

May 2, 2020

0.1.4

Mar 20, 2020

0.1.3

Feb 27, 2020

This version

0.1.2

Feb 15, 2020

0.1.1

Jan 27, 2020

0.1.0

Jan 24, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anyascii-0.1.2.tar.gz (160.3 kB view details)

Uploaded Feb 15, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anyascii-0.1.2-py3-none-any.whl (239.2 kB view details)

Uploaded Feb 15, 2020 Python 3

File details

Details for the file anyascii-0.1.2.tar.gz.

File metadata

Download URL: anyascii-0.1.2.tar.gz
Upload date: Feb 15, 2020
Size: 160.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.0

File hashes

Hashes for anyascii-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`fe5f09853b80b17d708df82fe97d0f78a55e5803dc88306d3d11c47dda05a66a`
MD5	`8b596ae1cf80bab9fd704129760e6345`
BLAKE2b-256	`38cedb93d14d5e64b771e00b29a8d676467501c64d775c88f20ee47d646e9148`

See more details on using hashes here.

File details

Details for the file anyascii-0.1.2-py3-none-any.whl.

File metadata

Download URL: anyascii-0.1.2-py3-none-any.whl
Upload date: Feb 15, 2020
Size: 239.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.0

File hashes

Hashes for anyascii-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`81463cd0e4858e854d14204a414760e0c21df7e875907740353df69c55f296eb`
MD5	`5d586893640405ea2baae50428af2e1c`
BLAKE2b-256	`1791917daf5951357cdfafe8d111cf0df46e4a76bd6bdb3d597acac26d9e5b46`

See more details on using hashes here.

anyascii 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Any-Ascii

Table of Contents

Description

Examples

Implementations

CLI

Go

Java

Maven

Gradle

Node.js

Python

Ruby

Rust

See Also

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes