Packaged language data from Wiktionary
Project description
/ipaˈnẽmɐ/
ipanema provides an API in various programming languages to access the Wiktionary language database and other language-related data.
Python
$ python
>>> from ipanema import query_language
>>> query_language('ca')
{'code': 'ca', 'canonical_name': 'Catalan', 'family': {'code': 'roa-ocr', 'canonical_name': 'Occitano-Romance', 'wikidata_item': 'Q599958', 'parent_family': 'Gallo-Romance', 'proto_language_code': None}, 'ancestor': 'Old Catalan', 'parent': 'None', 'wikidata_item': 'Q7026'}
>>> query_language('Deutsch')
{'code': 'de', 'canonical_name': 'German', 'family': {'code': 'gmw-hgm', 'canonical_name': 'High German', 'wikidata_item': 'Q52040', 'parent_family': 'West Germanic', 'proto_language_code': 'goh'}, 'ancestor': 'Early New High German', 'parent': 'None', 'wikidata_item': 'Q188'}
>>> from ipanema import query_family
>>> query_family('Indo-European')
{'code': 'ine', 'canonical_name': 'Indo-European', 'wikidata_item': 'Q19860', 'parent_family': 'None', 'proto_language_code': 'ine-pro'}
Java
import ipanema.language.model.Language;
import ipanema.language.model.LanguageData;
Optional<Language> ca = LanguageData.load().getLanguage("ca");
Swift
import Ipanema
let ca = try! Polyglot.sharedInstance.languageData("ca")
JSON
$ jq '.ca' data/lang_data.json
{
"ancestors": "roa-oca",
"canonicalName": "Catalan",
"family": "roa-ocr",
"scripts": "Latn",
"sort_key": {
"remove_diacritics": "̧̀́̈·"
},
"standard_chars": "AaÀàBbCcÇçDdEeÉéÈèFfGgHhIiÍíÏïJjLlMmNnOoÓóÒòPpQqRrSsTtUuÚúÜüVvXxYyZz· ',-‐‑‒–—…∅",
"type": "regular",
"wikidata_item": "Q7026"
}
SQLite
$ sqlite data/languages.sqlite
sqlite> select * from languages where code = 'ca';
ca|Catalan||roa-oca|roa-ocr|regular|Q7026
Language data
Data sources:
Extraction
The actual language data is stored in a submodule (ipanema-data). To update/regenerate the data manually:
$ apt-get install jq lua5.1 liblua5.1-dev luarocks # linux
$ brew install jq lua@5.1 luarocks # osx
$ make clean # delete stored data
$ make
Language codes
The Wiktionary language code is defined as follows:
- If the language has a two-letter code in the ISO 639-1 standard, then that code is used.
- If the language has a three-letter code in the ISO 639-3 standard, then that code is used.
- If the language has a three-letter code in the ISO 639-2 standard, then that code is used. (rare)
- Any language which does not have an ISO code, but which is to be included in Wiktionary, has a new Wiktionary-specific "exceptional" code devised for it.
License
The language data extracted from Wiktionary is subject to the Creative Commons license, CC BY-SA 4.0. The data has been transformed (into a machine-readable format), but not modified. The project itself is licensed as MIT, see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ipanema-202605.11-py3-none-any.whl.
File metadata
- Download URL: ipanema-202605.11-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
013f083c96c4e54da90b91e485c2bf54da8e6d93cc561248c9dce39163879f9c
|
|
| MD5 |
1aa1349b7681134cae43433186c620c8
|
|
| BLAKE2b-256 |
2ef046a5ac792619f79eff15a7f6883ed3d23b2773c8addd6732556ce163a466
|