Unicode to ASCII transliteration
Project description
Any Ascii
Unicode to ASCII transliteration
Table of Contents
Glossary
- Unicode: The universal character set, a global standard to support all the world's languages. Consists of 100,000+ characters used by 150 writing systems. Typically encoded into bytes using UTF-8.
- ASCII:
The most compatible character set.
A subset of Unicode/UTF-8 consisting of 128 characters using 7-bits in the range
0x00
-0x7F
. The visible characters are English letters, digits, and punctuation in the range0x20
-0x7E
, with the remaining being control characters. - Transliteration: A mapping from one writing system into another, typically done one character at a time using predictable rules. Transliteration into the Latin script used by English is known as romanization.
Java
String s = AnyAscii.transliterate("άνθρωποι");
// anthropoi
Java 6+ compatible
Available through JitPack
Maven
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
<dependency>
<groupId>com.hunterwb</groupId>
<artifactId>any-ascii</artifactId>
<version>${version}</version>
</dependency>
Gradle
repositories {
maven { url 'https://jitpack.io' }
}
dependencies {
implementation "com.hunterwb:any-ascii:${version}"
}
Python
from anyascii import anyascii
s = anyascii('άνθρωποι')
# anthropoi
Python 3.3+ compatible
Install from GitHub
pip install https://github.com/hunterwb/any-ascii/archive/master.zip#subdirectory=python
Node.js
const anyAscii = require('any-ascii');
const s = anyAscii('άνθρωποι');
// anthropoi
Node.js 4+ compatible
See Also
ALA-LC Romanization
BGN/PCGN Romanization
Compart: Unicode Charts
ICAO 9303: Machine Readable Passports
ISO 9: Cyrillic Romanization
Sean M. Burke: Unidecode
Sean M. Burke: Unidecode, Perl Journal
UNGEGN Romanization
Unicode CLDR: Transliteration Guidelines
Unicode Unihan Database
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
anyascii-0.1.0.tar.gz
(154.5 kB
view hashes)
Built Distribution
anyascii-0.1.0-py3-none-any.whl
(235.3 kB
view hashes)