Skip to main content

No project description provided

Project description

WorldAlphabets

A tool to access alphabets of the world with Python and Node interfaces.

Usage

Python

To load the data in Python:

from worldalphabets import get_available_codes, load_alphabet

codes = get_available_codes()
print("Loaded", len(codes), "alphabets")

alphabet = load_alphabet("en")
print(alphabet.uppercase[:5])  # ['A', 'B', 'C', 'D', 'E']
print(alphabet.frequency['e'])

Node.js

From npm

Install the package from npm:

npm install worldalphabets

Then, you can use the functions in your project:

const {
  getUppercase,
  getLowercase,
  getFrequency,
  getAvailableCodes,
} = require('worldalphabets');

async function main() {
  const codes = await getAvailableCodes();
  console.log('Available codes (first 5):', codes.slice(0, 5));

  const uppercaseEn = await getUppercase('en');
  console.log('English uppercase:', uppercaseEn);

  const lowercaseFr = await getLowercase('fr');
  console.log('French lowercase:', lowercaseFr);

  const frequencyDe = await getFrequency('de');
  console.log('German frequency for "a":', frequencyDe['a']);
}

main();

TypeScript projects receive typings automatically via index.d.ts.

Local Usage

If you have cloned the repository, you can use the module directly:

const { getUppercase } = require('./index');

async function main() {
    const uppercaseEn = await getUppercase('en');
    console.log('English uppercase:', uppercaseEn);
}

main();

Examples

The examples/ directory contains small scripts demonstrating the library:

  • examples/python/ holds Python snippets for printing alphabets, collecting stats, and more.
  • examples/node/ includes similar examples for Node.js.

Alphabet Index

This library also provides an index of all available alphabets with additional metadata.

Python

from worldalphabets import get_index_data, get_language

# Get the entire index
index = get_index_data()
print(f"Index contains {len(index)} languages.")

# Get information for a specific language
lang_info = get_language("he")
print(f"Language: {lang_info['language-name']}")
print(f"Script Type: {lang_info['script-type']}")
print(f"Direction: {lang_info['direction']}")

Node.js

const { getIndexData, getLanguage } = require('worldalphabets');

async function main() {
  // Get the entire index
  const index = await getIndexData();
  console.log(`Index contains ${index.length} languages.`);

  // Get information for a specific language
  const langInfo = await getLanguage('he');
  console.log(`Language: ${langInfo['language-name']}`);
  console.log(`Script Type: ${langInfo['script-type']}`);
  console.log(`Direction: ${langInfo['direction']}`);
}

main();

Keyboard Layouts

Key entries expose pos (a KeyboardEvent.code when available) along with row, col, and size information.

Python

A helper in examples/keyboard_md_table.py renders a layout as a Markdown table:

from examples.keyboard_md_table import layout_to_markdown

print(layout_to_markdown("en-united-kingdom"))

Output:

` 1 2 3 4 5 6 7 8 9 0 - =
q w e r t y u i o p [ ]
a s d f g h j k l ; ' #
z x c v b n m , . /

Node.js

const {
  getAvailableLayouts,
  loadKeyboard,
  getUnicode,
} = require('worldalphabets');

async function main() {
  const layouts = await getAvailableLayouts();
  console.log('Available layouts (first 5):', layouts.slice(0, 5));

  const kb = await loadKeyboard('en-us');
  console.log('First key Unicode:', getUnicode(kb.keys[1], 'base'));
  console.log('First key position:', kb.keys[1].pos, kb.keys[1].row, kb.keys[1].col);
}

main();

Exanmple Usages

Examples include printing alphabets, collecting stats, and more.

Supported Languages

For a detailed list of supported languages and their metadata, including available keyboard layouts, see the Alphabet Table.

Developer Guide

This project uses the kalenchukov/Alphabet Java repository as the source for alphabet data. A helper script clones the repository, scans all *Alphabet.java files, downloads a sample Wikipedia article for supported languages, and writes JSON files containing the alphabet and estimated letter frequencies. A second utility can replace those estimates with corpus frequencies from the Simia unigrams dataset.

Each JSON file includes:

  • alphabetical – letters of the alphabet (uppercase when the script has case)
  • uppercase – uppercase letters
  • lowercase – lowercase letters
  • frequency – relative frequency of each lowercase letter (zero when no sample text is available)

Example JSON snippet:

{
  "alphabetical": ["A", "B", ],
  "uppercase": ["A", "B", ],
  "lowercase": ["a", "b",],
  "frequency": {"a": 0.084, "b": 0.0208, ]
}

Setup

This project uses uv for dependency management. To set up the development environment:

# Install uv
pipx install uv

# Create and activate a virtual environment
uv venv
source .venv/bin/activate

# Install dependencies
uv pip install -e '.[dev]'

Data Generation

Extract alphabets

uv run scripts/extract_alphabets.py

The script clones the Java project and stores JSON files for every available alphabet under data/alphabets/, named by ISO language code. If no sample text is available, frequency values default to zero and the language is recorded in data/todo_languages.csv for follow-up.

Update letter frequencies

uv run scripts/update_frequencies.py

This script downloads the unigrams.zip archive and rewrites each alphabet's frequency mapping using the published counts.

Generate alphabets from locale data

Derive an alphabet from an ICU locale's exemplar character set:

uv run scripts/generate_alphabet_from_locale.py <code> --locale <locale>

The script writes data/alphabets/<code>.json, using the locale's standard exemplar set for the base letters and populating frequency values from the Simia unigrams dataset when available. Locales without exemplar data are skipped.

Generate alphabets from unigrams

For languages present in the Simia dataset but missing here:

uv run scripts/generate_alphabet_from_unigrams.py <code> --locale <locale> \
  --block <Unicode block>

The script writes data/alphabets/<code>.json. To list missing codes:

uv run scripts/missing_unigram_languages.py

Generate missing alphabets

Create alphabet files for every language in the Simia unigrams dataset that does not yet have one:

uv run scripts/generate_missing_alphabets.py --limit 10

Omit --limit to process all missing languages. Each file is written under data/alphabets/ and combines ICU exemplar characters with Simia frequencies.

Linting and type checking

ruff check .
mypy .

Sources

Future work

  • Add sample text or unigram support for more languages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

worldalphabets-0.0.11.tar.gz (679.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

worldalphabets-0.0.11-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file worldalphabets-0.0.11.tar.gz.

File metadata

  • Download URL: worldalphabets-0.0.11.tar.gz
  • Upload date:
  • Size: 679.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for worldalphabets-0.0.11.tar.gz
Algorithm Hash digest
SHA256 760ca33a525e130b1f7d4b674e4058890e33df2e3c3fc50c11bfaf19eaae3397
MD5 cdbc718ff2de97ec0f056611c2070a73
BLAKE2b-256 90b5d69a5a118a654fcbaa6f517e1c2cc95b5616dd6dcf852c95353c571722b4

See more details on using hashes here.

Provenance

The following attestation bundles were made for worldalphabets-0.0.11.tar.gz:

Publisher: publish.yml on willwade/WorldAlphabets

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file worldalphabets-0.0.11-py3-none-any.whl.

File metadata

File hashes

Hashes for worldalphabets-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 7c12e4c6bce41dd0bf4120d684858684ee9ac5303a74d9a88c7ca5a079447835
MD5 d105389360c5e3d2521a734071cc69d0
BLAKE2b-256 38e34eaaacefb4559334460a2fdfba1c167395304efa0454fd6de211e660bc06

See more details on using hashes here.

Provenance

The following attestation bundles were made for worldalphabets-0.0.11-py3-none-any.whl:

Publisher: publish.yml on willwade/WorldAlphabets

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page