Detect language support for font binaries

These details have not been verified by PyPI

Project links

Project description

Hyperglot – a database and tools for detecting language support in fonts

Hyperglot is an open research project dedicated to documenting how the world’s languages are written. By mapping orthographies and their requirements, it supports inclusive, multilingual type design and equitable access to high-quality typography for underserved communities. Hyperglot currently covers 783 languages, representing approximately 7.3 billion speakers, and is developed as open source by Rosetta Type/Research in collaboration with a global community of contributors and licensed under the Apache 2.0 license.

Hyperglot is available as:

the Hyperglot web apps,
the command-line tool: hyperglot,
the python packagage: import hyperglot (see examples for basic usage).

📖 Learn more about Hyperglot
🙋 Read the FAQ

💰 Sponsor via GitHub or directly via Hyperglot sponsorship. Any and all contributions are much appreciated! 🙏

Data validity & contributing

Hyperglot is a work in progress and provided AS IS. The validity of language data varies and continues to improve. Each language includes a validity label (todo, draft, preliminary, verified) to help you assess the data.

Mapping all the world’s languages is a huge task—we need help from native speakers and language users! If you notice an error or see that a language is missing, please get in touch (via email or Issues). We welcome contributions and will credit your input.

The data structure is documented in a separate README file along with guidelines for contributing.

Core concepts

The following concepts are essential to understanding how Hyperglot works.

A language can be written in one or more scripts. Each such writing system is represented in Hyperglot as an orthography. Most languages have a single primary orthography; however, some use multiple orthographies either independently (for example, in different regions) or concurrently (such as Serbian or Japanese).

In the database, an orthography contains the following character sets:

base – the required, essential characters,
aux – non-essential, recommended characters,
marks – combining marks,
punctuation,
numerals, and
currency.

A script, however, is more than a collection of characters. It also defines how characters interact when combined. This behavior is known as shaping and, in digital fonts, is implemented using OpenType features.

Read the detailed description of the database structure

Language support detection process

To detect language support in a font, Hyperglot performs the following checks:

Required characters are present. Which characters are considered required is specified by filtering based on language/orthography status, data validity, and by selecting which character sets to check against.
Precomposed character combinations are handled by the font. For character combinations that have a unique code point in Unicode, one of the following (depending on the setting):
1. The encoded, precomposed character combinations are present.
2. Base characters and mark characters from these combinations are present independently.
3. Both of the above.
Shaping behaviour is correctly handled by the font, where applicable:
1. Required mark-positioning instructions are present.
2. Required alternates for joining behavior (for example, in Arabic) are present.
3. Conjunct syllable construction in Brahmi-derived scripts is supported. (Currently supported only for Hindi/Devanagari.)

Additional design-related notes are provided for the user’s discretion when assessing design quality. Hyperglot does not assess the font design in any way.

Command-line tools

Installation

You will need to have Python 3 installed. Install via pip:

pip install hyperglot

Besides the main hyperglot command used for font inspection, the package also includes:

hyperglot-report – explore missing language support (see below).
hyperglot-data – review language data stored in the database.
hyperglot-validate, hyperglot-save, and hyperglot-export – manage and process data when contributing.

Basic usage

Use:

hyperglot path/to/font.otf

to output a list of supported languages (and other data) for a font. Use:

hyperglot path/to/font.otf path/to/anotherfont.otf …

to check several fonts at once, or their combined coverage (with -m union).

Advanced options

-c, --check: Specify which character sets to check against. Options are 'base, auxiliary, punctuation, numerals, currency, all', or a comma-separated combination of these. (Default: 'base')
--validity: Filter languages by data validity level. Options are 'todo, draft, preliminary, verified'. (Default: 'preliminary')
-s, --status: Specify which languages to consider when checking support. Options are 'living, historical, constructed, all', or a comma-separated combination of these . (Default: 'living,constructed')
-o, --orthography: Which orthographies to consider when checking support for a language. Options are 'primary, secondary, historical, transliteration, all', or a comma-separated combination of these. (Default: 'primary')
-d, --decomposed: For precomposed character combinations, require only the individual component characters. By default, precomposed character combinations are also required when they have a unique code point in Unicode. (Default: False)
-m, --marks: Require that a font include all combining marks used by a language’s orthography. By default, only marks that are not part of precomposed character combinations are required. (Default: False)
--sort: Specify the sort order. Use "speakers" to sort by number of speakers. (Default: "alphabetic")
--sort-dir: Specify the sort direction. Use "desc" for descending order. (Default: "asc" for ascending order)
-y, --output: Specify a file path to write the output to, in YAML format. For a single input font, the output is a subset of the Hyperglot database containing the languages and orthographies supported by the font. When multiple fonts are provided, the YAML file contains a top-level key for each font. If the -m option is provided, the output includes the specific intersection or union result.
-t, --shaping-threshold: Set the frequency threshold for complex-script shaping checks. A font passes when it renders correctly for combinations at or above this threshold. Frequencies range from 1.0 (most frequent combinations) to 0.0 (rares combinations). (Default: 0.01)
--no-shaping Disable shaping checks (mark attachment, joining behavior, and conjunct shaping). (Default: shaping checks enabled)
-v, --verbose: Enable verbose logging.
-V, --version: Print the Hyperglot version number.

Explore missing language support

The hyperglot-report reports missing characters and shaping support. A common use case is identifying languages that could be supported with minimal additional work in a given font. The command accepts the same options as hyperglot and the following options:

--report-missing: Report languages missing n or fewer characters. If n is 0, all languages with any number of missing characters are reported. (Default: 0)
--report-marks: Report languages missing n or fewer mark-attachment sequences. If n is 0, all languages with any number of missing mark-attachment sequences are reported. (Default: 0)
--report-joining: Report languages missing n or fewer joining sequences. If n is 0, all languages with any number of missing joining sequences are reported. (Default: 0)
--report-all: Set or override all other --report-* options.

Roadmap

🪶 Change licence to Apache 2
💰 Invite sponsorship and funding#174
🤖 Basic analysis of shaping support provided by the font (GPOS and GSUB): check whether character combinations are affected by font OpenType features, enabling scalable support for complex combinations (e.g., Arabic, Hindi/Devanagari). #176
➡️ Export in a format suitable for submission to Unicode CLDR
🌍 Database web app: add links to other resources per language
📚 Improve language data, sources, and validity for languages with fewer authoritative references #157
🌍 Add data for more African languages and scripts, e.g., N'Ko #195
🇮🇳 Add more shaping checks for Brahmi-derived scripts #176
🇧🇷 Add data for indigenous Brazilian languages (Rafael Dietzch and students)
🇺🇳 Secure funding to expand language coverage

Other

The comparison of Hyperglot and the Unicode CLDR (this might be outdated atm.)

Notes

Fonts included in the repository for testing purposes are licenses under their respective licenses
Data included in the other directory is replicated from various public domain and open source origins for compasion and aggregation (mostly present in historic commits of this repository)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.8.1

Mar 11, 2026

0.8.0

Feb 17, 2026

0.7.3

Oct 13, 2025

0.7.2

Jan 15, 2025

0.7.1

Oct 7, 2024

0.7.0

Oct 7, 2024

0.6.4

Jun 20, 2024

0.6.3

Apr 8, 2024

0.6.2

Mar 22, 2024

0.6.1

Jan 30, 2024

0.6.0

Jan 16, 2024

0.5.3

Dec 13, 2023

0.5.2

Nov 23, 2023

0.5.1

Jun 21, 2023

0.5.0

Jun 21, 2023

0.4.5

Mar 28, 2023

0.4.4

Dec 10, 2022

0.4.3

Dec 10, 2022

0.4.2

Nov 25, 2022

0.4.1

Aug 18, 2022

0.4.0

Jul 1, 2022

0.3.9

Jun 21, 2022

0.3.8

Mar 1, 2022

0.3.7

Jan 4, 2022

0.3.6

Nov 11, 2021

0.3.5

Sep 7, 2021

0.3.4

Sep 6, 2021

0.3.3

Apr 23, 2021

0.3.2

Apr 13, 2021

0.3.1

Apr 12, 2021

0.3.0

Apr 9, 2021

0.2.12

Apr 6, 2021

0.2.11

Mar 29, 2021

0.2.10

Mar 26, 2021

0.2.9

Mar 24, 2021

0.2.8

Mar 24, 2021

0.2.7

Mar 24, 2021

0.2.6

Mar 23, 2021

0.2.5

Mar 23, 2021

0.2.4

Mar 18, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyperglot-0.8.1.tar.gz (335.9 kB view details)

Uploaded Mar 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hyperglot-0.8.1-py3-none-any.whl (683.7 kB view details)

Uploaded Mar 11, 2026 Python 3

File details

Details for the file hyperglot-0.8.1.tar.gz.

File metadata

Download URL: hyperglot-0.8.1.tar.gz
Upload date: Mar 11, 2026
Size: 335.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for hyperglot-0.8.1.tar.gz
Algorithm	Hash digest
SHA256	`6364cf5b535a6c16ce31c168687624b0ba053164fdc47698fbbbff91f50e9263`
MD5	`6965c5437e6deac513a722359814f9ef`
BLAKE2b-256	`78f88cf6ed686d558c68b2bb7fd68453914add179889e2edc4616074dfb07877`

See more details on using hashes here.

File details

Details for the file hyperglot-0.8.1-py3-none-any.whl.

File metadata

Download URL: hyperglot-0.8.1-py3-none-any.whl
Upload date: Mar 11, 2026
Size: 683.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for hyperglot-0.8.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8cdcb4af90a8aae182a3c60882891651d112a031ef66db7c0848936e042a6af7`
MD5	`bd38cf2b923cf75cebd937194bf8be47`
BLAKE2b-256	`099bbd2836d6338d5f75b9c09ec65ccebfa1222299a96153f07c730afbc2f406`

See more details on using hashes here.

hyperglot 0.8.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hyperglot – a database and tools for detecting language support in fonts

Data validity & contributing

Core concepts

Language support detection process

Command-line tools

Installation

Basic usage

Advanced options

Explore missing language support

Roadmap

Other

Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes