Skip to main content

ISO 639 language codes, names, and other associated information

Project description

python-iso639

PyPI version Supported Python versions PyPI downloads last month CircleCI Builds

python-iso639 is a Python package for ISO 639 language codes, names, and other associated information.

Current features:

  • 🌐 A representation of languages mapped across ISO 639-1, 639-2, and 639-3.
  • 🔎 Functionality to "guess" what a language is for a given unknown language code or name.
  • 🚀 Optimized for speed in retrieving language information.

Installation

Using pip:

pip install python-iso639

Using uv:

uv add python-iso639

Using conda:

conda install -c conda-forge python-iso639

Usage

python-iso639 revolves around a Language class. Instances of Language have attributes and methods that you will find useful.

Note that while the package name registered on PyPI is python-iso639, the actual import name during runtime is iso639 (which means you should do import iso639 in your Python code).

Creating Language Instances

Create a Language instance by one of the class methods.

from_part3, with an ISO 639-3 code

>>> import iso639
>>> lang1 = iso639.Language.from_part3('fra')
>>> type(lang1)
<class 'iso639.language.Language'>
>>> lang1
Language(part3='fra', part2b='fre', part2t='fra', part1='fr', scope='I', type='L', name='French', comment=None, other_names=None, macrolanguage=None, retire_reason=None, retire_change_to=None, retire_remedy=None, retire_date=None)

Fast object instantiation for retrieving language information (run on Python 3.13, macOS 15.3.1, Apple M1 Pro)

In [1]: import iso639

In [2]: %timeit iso639.Language.from_part3("fra")
217 ns ± 0.139 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

From Another ISO 639 Code Set or a Reference Name

>>> lang2 = iso639.Language.from_part2b('fre')  # ISO 639-2 (bibliographic)
>>> lang3 = iso639.Language.from_part2t('fra')  # ISO 639-2 (terminological)
>>> lang4 = iso639.Language.from_part1('fr')  # ISO 639-1
>>> lang5 = iso639.Language.from_name('French')  # ISO 639-3 reference language name

A LanguageNotFoundError is Raised for Invalid Inputs

>>> iso639.Language.from_part3('Fra')  # The user input is case-sensitive!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LanguageNotFoundError: 'Fra' isn't an ISO language code or name
>>>
>>> iso639.Language.from_name("unknown language")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LanguageNotFoundError: 'unknown language' isn't an ISO language code or name

Accessing Attributes

>>> lang1
Language(part3='fra', part2b='fre', part2t='fra', part1='fr', scope='I', type='L', name='French', comment=None, other_names=None, macrolanguage=None, retire_reason=None, retire_change_to=None, retire_remedy=None, retire_date=None)
>>> lang1.part3
'fra'
>>> lang1.name
'French'

Comparison

>>> lang1 == lang2 == lang3 == lang4 == lang5  # All are French
True
>>> lang6 = iso639.Language.from_part3('spa')  # Spanish
>>> lang1 == lang6  # French vs. Spanish
False
>>> 'French' == lang1.name == lang2.name == lang3.name == lang4.name == lang5.name
True
>>> lang6.name
'Spanish'

Guess a Language: Classmethod match

You don't know which code set or name your input is from? Use the match classmethod:

>>> lang1 = iso639.Language.match('fra')
>>> lang2 = iso639.Language.match('fre')
>>> lang3 = iso639.Language.match('fr')
>>> lang4 = iso639.Language.match('French')
>>> lang1 == lang2 == lang3 == lang4
True

By default, the classmethod match is case-sensitive. To ignore case instead, pass in strict_case=False:

>>> lang5 = iso639.Language.match('FRA', strict_case=False)
>>> lang6 = iso639.Language.match('french', strict_case=False)
>>> lang4 == lang5 == lang6
True
>>> iso639.Language.match("french")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LanguageNotFoundError: 'french' isn't an ISO language code or name

[!NOTE]
Depending on your use case, ignoring case could potentially lead to matching issues, where a language code might match an unintended language name (or vice versa), e.g., conflating "igo" and "Igo", while there exist the ISO 639-3 code ahl for Igo and the ISO 639-3 code igo for Isebe.

The classmethod match is particularly useful for consistently accessing a specific attribute from unknown inputs, e.g., the ISO 639-3 code.

>>> 'fra' == lang1.part3 == lang2.part3 == lang3.part3 == lang4.part3 == lang5.part3 == lang6.part3 == lang7.part3
True

If there's no match, a LanguageNotFoundError is raised, which you may want to catch:

>>> try:
...     lang = iso639.Language.match('not gonna find a match')
... except iso639.LanguageNotFoundError:
...     print("no match found!")
... 
no match found!

Macrolanguages and Alternative Names

>>> language = iso639.Language.match('yue')
>>> language.name
'Yue Chinese'  # also commonly known as Cantonese
>>> language.macrolanguage
'zho'  # Chinese
>>> language.other_names
[Name(print='Yue Chinese', inverted='Chinese, Yue')]
>>> for name in language.other_names:
...     print(f'{name.print} | {name.inverted}')
...
Yue Chinese | Chinese, Yue

Retired Language Codes:

>>> language = iso639.Language.match('bvs')
>>> language.part3
'bvs'
>>> language.name
'Belgian Sign Language'
>>> language.status
'R'  # (R)etired
>>> language.retire_reason
'S'  # (S)plit
>>> language.retire_change_to is None
True
>>> language.retire_remedy
'Split into Langue des signes de Belgique Francophone [sfb], and Vlaamse Gebarentaal [vgt]'
>>> language.retire_date
datetime.date(2007, 7, 18)

Into the Weeds

Attributes of a Language Instance

A Language instance has the following attributes:

Attribute Data type Can it be None? Description
part3 str ISO 639-3 code
part2b str ISO 639-2 code (bibliographic)
part2t str ISO 639-2 code (terminological)
part1 str ISO 639-1 code
scope str One of {(I)ndividual, (M)acrolanguage, (S)pecial}
type str One of {(A)ncient, (C)onstructed, (E)xtinct, (H)istorical, (L)iving, (S)pecial} [1]
status str One of {(A)ctive, (R)etired}, describing the ISO 639-3 code
name str Reference language name in ISO 639-3
comment str Comment from ISO 639-3
other_names list[Name] Other print and inverted names [2]
macrolanguage str Macrolanguage
retire_reason str Retirement reason, one of {(C)hange, (D)uplicate, (N)on-existent, (S)plit, (M)erge}
retire_change_to str ISO 639-3 code to which this language can be changed, if retirement reason is one of {(C)hange, (D)uplicate, (M)erge}
retire_remedy str Instructions for updating this retired language code
retire_date datetime.date The date the retirement became effective

[1] If the ISO 639-3 code is retired, then the type attribute is None, because its value is not clearly discernible from the SIL data source.

[2] A Name instance has the attributes print and inverted, for the print name and inverted name, respectively. If reference name, print name, and inverted name are all the same, then that particular (print name, inverted name) pair is excluded from the other_names attribute. For example, for Spanish (ISO 639-3: spa), one (print name, inverted name) pair is (Spanish, Spanish) from the SIL data source, but this pair is excluded from its list of other_names.

How Language.match Matches the Language

At a high level, Language.match assumes the input is more likely to be a language code rather than a language name. Beyond that, the precise order in matching is as follows:

  • ISO 639-3 codes (among the active codes)
  • ISO 639-2 (bibliographic) codes
  • ISO 639-2 (terminological) codes
  • ISO 639-1 codes
  • ISO 639-3 codes (among the retired codes)
  • ISO 639-3 reference language names
  • ISO 639-3 alternative language names (the "print" ones)
  • ISO 639-3 alternative language names (the "inverted" ones)

As soon as a match is found, Language.match returns a Language instance. If there isn't a match, a LanguageNotFoundError is raised.

Language is a dataclass

The Language class is a dataclass. All functionality of dataclasses applies to Language and its instances, e.g., dataclasses.asdict:

>>> import dataclasses, iso639
>>> language = iso639.Language.match('fra')
>>> dataclasses.asdict(language)
{'part3': 'fra', 'part2b': 'fre', 'part2t': 'fra', 'part1': 'fr', 'scope': 'I', 'type': 'L', 'status': 'A', 'name': 'French', 'comment': None, 'other_names': None, 'macrolanguage': None, 'retire_reason': None, 'retire_change_to': None, 'retire_remedy': None, 'retire_date': None}

Constants

  • DATA_LAST_UPDATED: The release date of the included language code data from SIL

    >>> import iso639
    >>> iso639.DATA_LAST_UPDATED
    datetime.date(2026, 4, 15)
    
  • ALL_LANGUAGES: The list of all Language objects based on the included language code data

    >>> import iso639
    >>> type(iso639.ALL_LANGUAGES)
    <class 'set'>
    >>> len(iso639.ALL_LANGUAGES)
    8315
    

Links

License and Data Source

The python-iso639 code is released under an Apache 2.0 license. Please see LICENSE.txt for details.

The data source that backs this package is the language code tables published by SIL. The tables are included in this package under src/iso639/_data/. They are the UTF8-encoded *.tab tab-separated files bundled as a ZIP archive file, typically found at a URL that looks like https://iso639-3.sil.org/sites/iso639-3/files/downloads/iso-639-3_Code_Tables_YYYYMMDD.zip (replace YYYYMMDD with the data release date). Note that SIL resources have their terms of use.

Why Another ISO 639 Package?

Both packages iso639 and iso-639 exist on PyPI. However, as of this writing (May 2022), they were last updated in 2016 and don't seem to be maintained anymore for updating the language codes. pycountry is a great package, but what if you want a more lightweight package with just the language codes only and not the other stuff? :-)

If you ever notice that the upstream ISO 639-3 tables from SIL have been updated and yet this package isn't using the latest data, please ping me by opening a GitHub issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_iso639-2026.4.20.tar.gz (174.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_iso639-2026.4.20-py3-none-any.whl (167.8 kB view details)

Uploaded Python 3

File details

Details for the file python_iso639-2026.4.20.tar.gz.

File metadata

  • Download URL: python_iso639-2026.4.20.tar.gz
  • Upload date:
  • Size: 174.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for python_iso639-2026.4.20.tar.gz
Algorithm Hash digest
SHA256 00570376d24788f889578991bb2ad93c030a014c1d373f64f2ceffe84732a537
MD5 cf8c7d956492cc53166a35fda8ba4af7
BLAKE2b-256 dac822c80367213029ea3abc4e7ab6e1ed8545542f98e5db6e1ab4f2973890ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for python_iso639-2026.4.20.tar.gz:

Publisher: release.yml on jacksonllee/iso639

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file python_iso639-2026.4.20-py3-none-any.whl.

File metadata

File hashes

Hashes for python_iso639-2026.4.20-py3-none-any.whl
Algorithm Hash digest
SHA256 60a380571fafdbcc6190c5c1ee78e217194332cbe3caec76345327712e5a65cb
MD5 e5684755ad28e012e91bc6bbc81ca71e
BLAKE2b-256 fe71520fbac49c0650aba66093396282e1e4a1315a7242461c21480132a1b0df

See more details on using hashes here.

Provenance

The following attestation bundles were made for python_iso639-2026.4.20-py3-none-any.whl:

Publisher: release.yml on jacksonllee/iso639

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page