Skip to main content

Packaged language data from Wiktionary

Project description

/ipaˈnẽmɐ/

PyPI - Version Swift5 compatible Gitlab Pipeline Status

ipanema provides an API in various programming languages to access the Wiktionary language database and other language-related data.

Python

$ python
>>> from ipanema import query_language
>>> query_language('ca')
{'code': 'ca', 'canonical_name': 'Catalan', 'family': {'code': 'roa-ocr', 'canonical_name': 'Occitano-Romance', 'wikidata_item': 'Q599958', 'parent_family': 'Gallo-Romance', 'proto_language_code': None}, 'ancestor': 'Old Catalan', 'parent': 'None', 'wikidata_item': 'Q7026'}
>>> query_language('Deutsch')
{'code': 'de', 'canonical_name': 'German', 'family': {'code': 'gmw-hgm', 'canonical_name': 'High German', 'wikidata_item': 'Q52040', 'parent_family': 'West Germanic', 'proto_language_code': 'goh'}, 'ancestor': 'Early New High German', 'parent': 'None', 'wikidata_item': 'Q188'}
>>> from ipanema import query_family
>>> query_family('Indo-European')
  {'code': 'ine', 'canonical_name': 'Indo-European', 'wikidata_item': 'Q19860', 'parent_family': 'None', 'proto_language_code': 'ine-pro'}

API docs

Java

import ipanema.language.model.Language;
import ipanema.language.model.LanguageData;

Optional<Language> ca = LanguageData.load().getLanguage("ca");

Swift

import Ipanema

let ca = try! Polyglot.sharedInstance.languageData("ca")

JSON

$ jq '.ca' data/lang_data.json
{
  "ancestors": "roa-oca",
  "canonicalName": "Catalan",
  "family": "roa-ocr",
  "scripts": "Latn",
  "sort_key": {
    "remove_diacritics": "̧̀́̈·"
  },
  "standard_chars": "AaÀàBbCcÇçDdEeÉéÈèFfGgHhIiÍíÏïJjLlMmNnOoÓóÒòPpQqRrSsTtUuÚúÜüVvXxYyZz· ',-‐‑‒–—…∅",
  "type": "regular",
  "wikidata_item": "Q7026"
}

SQLite

$ sqlite data/languages.sqlite
sqlite> select * from languages where code = 'ca';
ca|Catalan||roa-oca|roa-ocr|regular|Q7026

Language data

Data sources:

Extraction

The actual language data is stored in a submodule (ipanema-data). To update/regenerate the data manually:

$ apt-get install jq lua5.1 liblua5.1-dev luarocks # linux
$ brew install jq lua@5.1 luarocks # osx
$ make clean # delete stored data
$ make

Language codes

The Wiktionary language code is defined as follows:

  1. If the language has a two-letter code in the ISO 639-1 standard, then that code is used.
  2. If the language has a three-letter code in the ISO 639-3 standard, then that code is used.
  3. If the language has a three-letter code in the ISO 639-2 standard, then that code is used. (rare)
  4. Any language which does not have an ISO code, but which is to be included in Wiktionary, has a new Wiktionary-specific "exceptional" code devised for it.

License

The language data extracted from Wiktionary is subject to the Creative Commons license, CC BY-SA 4.0. The data has been transformed (into a machine-readable format), but not modified. The project itself is licensed as MIT, see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ipanema-202602.14.2-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file ipanema-202602.14.2-py3-none-any.whl.

File metadata

  • Download URL: ipanema-202602.14.2-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for ipanema-202602.14.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dfeedd31db375b2d4c95d2f69274ca4a804c80c53d9a28ab064d6bc38b12fa3e
MD5 514b5dd5da433681596377469b739134
BLAKE2b-256 4ea9ee53406d7f9b1ae154927787b3da28f004caa1637aec59f9015ad6e38685

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page