Skip to main content

No project description provided

Project description

CL ToolKit

Build Status Documentation Status PyPI

A Python Library for the Processing of Cross-Linguistic Data.

By Johann-Mattis List and Robert Forkel.

Overview

While pycldf provides a basic Python API to access cross-linguistic data encoded in CLDF datasets, cltoolkit goes one step further, turning the data into full-fledged Python objects rather than shallow proxies for rows in a CSV file. Of course, as with pycldf's ORM package, there's a trade-off involved, gaining convenient access and a more pythonic API at the expense of performance (in particular memory footprint but also data load time) and write-access. But most of today's CLDF datasets (or aggregations of these) will be processable with cltoolkit on reasonable hardware in minutes - rather than hours.

The main idea behind cltoolkit is making (aggregated) CLDF data easily amenable for computation of linguistic features in a general sense (e.g. typological features, etc.). This is done by

  • providing the data for processing code as Python objects,
  • providing a framework that makes feature computation as simple as writing a Python function acting on a cltoolkit.models.Language object.

In general, aggregated CLDF Wordlists provide limited (automated) comparability across datasets (e.g. one could compare the number of words per language in each dataset). A lot more can be done when datasets use CLDF reference properties to link to reference catalogs, i.e.

cltoolkit objects exploit this extended comparability by distinguishing "senses" and "concepts" and "graphemes" and "sounds" and providing convenient access to comparable subsets of objects in an aggregation (see models.py).

See example.md for a walk-through of the typical workflow with cltoolkit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cltoolkit-0.1.1.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

cltoolkit-0.1.1-py2.py3-none-any.whl (25.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file cltoolkit-0.1.1.tar.gz.

File metadata

  • Download URL: cltoolkit-0.1.1.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.26.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.10

File hashes

Hashes for cltoolkit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b68d48706ac740a308011071c4e795ecde3c280205e65e0a69671933e3c9981d
MD5 b5dbb7f91457e7f9a0d5bc8dc810ff20
BLAKE2b-256 cf0f4b6eaa64c2296d8f545099a654109d139b0fbd8380314107eb9e06074fb3

See more details on using hashes here.

File details

Details for the file cltoolkit-0.1.1-py2.py3-none-any.whl.

File metadata

  • Download URL: cltoolkit-0.1.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.26.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.10

File hashes

Hashes for cltoolkit-0.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 97642c66d03ce585dda2470b3cdd28e396a8cf7786b3575237f7e5e95fdd53b0
MD5 b563af8d0b1ae2d9e140b12f5b64054b
BLAKE2b-256 5bc6f618c5fe4fda44210ebaf023060d22341fe5eba69649c0aba3de8ca739c9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page