python package for glottolog data curation

These details have not been verified by PyPI

Project links

Homepage

Project description

pyglottolog

Programmatic access to Glottolog data.

Install

To install pyglottolog you need a python installation on your system, running python 2.7 or >3.4. Run

pip install pyglottolog

This will also install the command line interface glottolog.

Note: To make use of pyglottolog you also need a local copy of the Glottolog data. This can be

a clone of the glottolog/glottolog repository or your fork of it,
an unzipped released version of Glottolog from GitHub,
or an unzipped download of a released version of Glottolog from ZENODO.

Make sure you remember where this local copy of the data is located - you always have to pass this location as argument when using pyglottolog.

Python API

Using pyglottolog, Glottolog data can be accessed programmatically from within python programs. All functionality is mediated through an instance of pyglottolog.Glottolog, e.g.

>>> from pyglottolog import Glottolog
>>> glottolog = Glottolog('.')
>>> print(glottolog)
<Glottolog repos v0.2-259-g27ac0ef at /.../glottolog>

Accessing languoid data

The data in languoid info files in the languoids/tree subdirectory is mainly accessed through two methods:

>>> glottolog.languoid('stan1295')
<Language stan1295>
>>> print(glottolog.languoid('stan1295'))
German [stan1295]

Accessing reference data

>>> print(api.bibfiles['hh.bib']['s:Karang:Tati-Harzani'])
@book{s:Karang:Tati-Harzani,
    author = {'Abd-al-'Ali Kārang},
    title = {Tāti va Harzani},
    publisher = {Tabriz: Tabriz University Press},
    address = {Tabriz},
    pages = {6+160},
    year = {1334 [1953]},
    glottolog_ref_id = {41999},
    hhtype = {grammar_sketch},
    inlg = {Farsi [pes]},
    lgcode = {Harzani [hrz]},
    macro_area = {Eurasia}
}

Performance considerations

Reading the data for Glottolog's almost 25,000 languoids from the same number of files in individual directories isn't particularly quick. So on average computers running

>>> list(Glottolog().languoids())

would take around 15 seconds.

Due to this, care should be taken not to read languoid data from disk repeatedly. In particular "N+1"-type problems should be avoided, where one would read all languoid into memory and then look up attributes on each languoid, thereby triggering new reads from disk. This may easily happen, since attributes such as Languoid.family are implemented as properties, which traverse the directory tree and read information from disk at access time.

To make it possible to avoid such problems, many of these properties can be substituted with a call to a similar method of Languoid, which accepts a "node map" (i.e. a dict mapping Languoid.id to Languoid objects) as parameter, e.g. Languoid.ancestors_from_nodemap or Languoid.descendants_from_nodemap. Typical usage would look as follows:

>>> languoids = {l.id: l for l in Glottolog().languoids()}
>>> for l in languoids.values():
...    if not l.ancestors_from_nodemap(languoids):
...        print('top-level {0}: {1}'.format(l.level, l.name))

Accessing configuration data

The config subdirectory of Glottolog data contains machine readable metadata like the list of macroareas. This information can be accessed via an instance of Glottolog, too, using the stem of the filename as attribute name:

>>> for ma in glottolog.macroareas.values():
...     print(ma.name)
...     
South America
Eurasia
Africa
Papunesia
North America
Australia

Note that the data read from the INI files is stored as dict, with section names (or explicit id options) as keys and instances of the corresponding class in pyglottolog.config as values.

Command line interface

Command line functionality is implemented via sub-commands of glottolog. The list of available sub-commands can be inspected running

$ glottolog --help
usage: glottolog [-h] [--verbosity VERBOSITY] [--log-level LOG_LEVEL]
                 [--repos REPOS]
                 command ...

Main command line interface of the pyglottolog package.

positional arguments:
  command               isobib | show | edit | create | bib | tree | newick |
                        index | check | metadata | refsearch | refindex |
                        langsearch | langindex | tree2lff | lff2tree
  args

optional arguments:
  -h, --help            show this help message and exit
  --verbosity VERBOSITY
                        increase output verbosity
  --log-level LOG_LEVEL
                        log level [ERROR|WARN|INFO|DEBUG]
  --repos REPOS         path to glottolog data repository

Use 'glottolog help <cmd>' to get help about individual commands.

Note: The location of your local clone or export of the Glottolog data should be passed as --repos=PATH/TO/glottolog.

Extracting languoid data

Glottolog data is often integrated with other data or incorporated as reference data in tools, e.g. as LanguageTable in a CLDF dataset.

To make this easier, pyglottolog provides the languoids subcommand, which dumps basic languoid data into a CSVW file with accompanying metadata:

glottolog --repos=PATH/TO/glottolog languoids [--output=OUTDIR] [--version=VERSION]

This will create a CSVW package, i.e.

a CSV table glottolog-languoids-VERSION.csv
and a JSON description glottolog-languoids-VERSION.csv-metadata.json

where VERSION is the result of running git describe on the data repository, or the version string passed as--version=VERSION in case you are running the command on an export of the repository or a download from ZENODO.

Languoid search

To allow convenient search across all languoid info files, pyglottolog comes with functionality to create and search a Whoosh index. To do so, run

glottolog --repos=PATH/TO/glottolog langindex

This will take about a minute or two and build an index of about 90 MB size at build/whoosh_langs.

Now you can search the index, e.g. using alternative names as query:

$ glottolog --repos=. langsearch "Abipónok"
1 matches
Abipon [abip1241] language
languoids/tree/guai1249/guai1250/abip1241/md.ini
Abipónok [hu]

1 matches

But you can also exploit the schema defined in pyglottolog.fts.get_langs_index:

$ glottolog --repos=. langsearch "country:Papua New Guinea"
...

Alamblak [alam1246] language
languoids/tree/sepi1257/sepi1258/east2496/alam1246/md.ini
Papua New Guinea (PG)

900 matches

Reference search

The same can be done for reference data: To create a Whoosh index with all reference data, run

glottolog --repos=PATH/TO/glottolog refindex

This will take about 15 minutes and build an index of about 700 MB size at build/whoosh.

Now you can query the index:

$ glottolog --repos=. refsearch "author:Haspelmath AND title:Atlas"
...
(13 matches)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

3.16.0

Jun 11, 2025

3.15.0

May 27, 2025

3.14.0

Oct 25, 2024

3.13.0

Mar 11, 2024

3.12.0

Jul 10, 2023

3.11.0

Dec 5, 2022

3.10.0

Oct 18, 2022

3.9.0

May 24, 2022

3.8.0

May 20, 2022

3.7.0

Dec 10, 2021

3.6.0

May 17, 2021

3.5.0

May 14, 2021

3.4.1

Apr 13, 2021

3.4.0

Apr 8, 2021

3.3.0

Dec 4, 2020

3.2.2

Jun 10, 2020

3.2.1

Apr 16, 2020

3.2.0

Apr 16, 2020

3.1.0

Mar 30, 2020

3.0.0

Nov 21, 2019

This version

2.2.1

Sep 18, 2019

2.2.0

Sep 16, 2019

2.1.0

Jun 27, 2019

2.0.0

Jun 19, 2019

1.5.1

May 7, 2019

1.5.0

Apr 1, 2019

1.4.0

Feb 15, 2019

1.3.0

Feb 5, 2019

1.2.1

Sep 18, 2018

1.2.0

Jul 25, 2018

1.1.0

Jan 23, 2018

1.0.0

Mar 23, 2017

0.3.2

Mar 16, 2017

0.3.1

Feb 24, 2017

0.3

Feb 24, 2017

0.2

Aug 25, 2016

0.1

May 4, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyglottolog-2.2.1.tar.gz (88.2 kB view details)

Uploaded Sep 18, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyglottolog-2.2.1-py2.py3-none-any.whl (79.8 kB view details)

Uploaded Sep 18, 2019 Python 2Python 3

File details

Details for the file pyglottolog-2.2.1.tar.gz.

File metadata

Download URL: pyglottolog-2.2.1.tar.gz
Upload date: Sep 18, 2019
Size: 88.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.2

File hashes

Hashes for pyglottolog-2.2.1.tar.gz
Algorithm	Hash digest
SHA256	`364dc586e38951de5e32e0b82716f5cdc057e023bb347ec313e107ae4cc81cad`
MD5	`5a5a1fbec846e4661a87b9f525bcac25`
BLAKE2b-256	`bac980242ad1945153f46a07b4200a506be5bc481070bca76715e5c2f2767da6`

See more details on using hashes here.

File details

Details for the file pyglottolog-2.2.1-py2.py3-none-any.whl.

File metadata

Download URL: pyglottolog-2.2.1-py2.py3-none-any.whl
Upload date: Sep 18, 2019
Size: 79.8 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.2

File hashes

Hashes for pyglottolog-2.2.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`b674c97b4949ea78457508847d584f1d1b2d258e4aa36e57ddd4b8f57639ca54`
MD5	`49bb60a4bba9a24f4a4eaeeb8f4716c3`
BLAKE2b-256	`378be2b3a0780d24332b44ee316f493da3b2a8c323f8ddf88c75fce76b6e686f`

See more details on using hashes here.

pyglottolog 2.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pyglottolog

Install

Python API

Accessing languoid data

Accessing reference data

Performance considerations

Accessing configuration data

Command line interface

Extracting languoid data

Languoid search

Reference search

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes