Skip to main content

A Python library for handling inter-linear-glossed text.

Project description

pyigt: Handling interlinear glossed text with Python

Build Status codecov PyPI

This library provides easy access to Interlinear Glossed Text (IGT) according to the Leipzig Glossing Rules, stored as CLDF examples.

Installation

Installing pyigt via pip

pip install pyigt

will install the Python package along with a command line interface igt.

Usage

CLI

$ igt -h
usage: igt [-h] [--log-level LOG_LEVEL] COMMAND ...

optional arguments:
  -h, --help            show this help message and exit
  --log-level LOG_LEVEL
                        log level [ERROR|WARN|INFO|DEBUG] (default: 20)

available commands:
  Run "COMAMND -h" to get help for a specific command.

  COMMAND
    ls                  List IGTs in a CLDF dataset
    stats               Describe the IGTs in a CLDF dataset

The igt ls command allows inspecting IGTs from the commandline, formatted using the four standard lines described in the Leipzig Glossing Rules, where analyzed text and glosses are aligned, e.g.

$ igt ls tests/fixtures/examples.csv 
Example 1:
zəple: ȵike: peji qeʴlotʂuʁɑ,
zəp-le:       ȵi-ke:       pe-ji       qeʴlotʂu-ʁɑ,
earth-DEF:CL  WH-INDEF:CL  become-CSM  in.the.past-LOC

...

Example 5:
zuɑməɸu oʐgutɑ ipiχuɑȵi,
zuɑmə-ɸu      o-ʐgu-tɑ    i-pi-χuɑ-ȵi,
cypress-tree  one-CL-LOC  DIR-hide-because-ADV

IGT corpus at tests/fixtures/examples.csv

igt ls can be chained with other commandline tools such as commands from the csvkit package for filtering:

$ csvgrep -c Primary_Text -m"ȵi"  tests/fixtures/examples.csv | csvgrep -c Gloss -m"ADV" |  igt ls -
Example 5:
zuɑməɸu oʐgutɑ ipiχuɑȵi,
zuɑmə-ɸu      o-ʐgu-tɑ    i-pi-χuɑ-ȵi,
cypress-tree  one-CL-LOC  DIR-hide-because-ADV

Python API

You can read all IGT examples provided with an CLDF dataset

>>> from pyigt import Corpus
>>> corpus = Corpus.from_path('tests/fixtures/cldf-metadata.json')
>>> len(corpus)
>>> len(corpus)
5
>>> for igt in corpus:
...     print(igt)
...     break
... 
zəple: ȵike: peji qeʴlotʂuʁɑ,
zəp-le:       ȵi-ke:       pe-ji       qeʴlotʂu-ʁɑ,
earth-DEF:CL  WH-INDEF:CL  become-CSM  in.the.past-LOC

or instantiate individual IGT examples, e.g. to check for validity:

>>> from pyigt import IGT
>>> ex = IGT(phrase="palasi=lu", gloss="priest-and")
>>> ex.check(strict=True, verbose=True)
palasi=lu
priest-and
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/robert_forkel/projects/cldf/pyigt/src/pyigt/igt.py", line 287, in check
    raise ValueError(
ValueError: Rule 2 or 10 violated: Mismatch of element separators in word and gloss! 

or to expand known gloss abbreviations:

>>> ex = IGT(phrase="Gila abur-u-n ferma hamišaluǧ güǧüna amuq’-da-č.",
...          gloss="now they-OBL-GEN farm forever behind stay-FUT-NEG", 
...          translation="Now their farm will not stay behind forever.")
>>> ex.pprint()
Gila aburun ferma hamišaluǧ güǧüna amuqdač.
Gila    abur-u-n      ferma    hamišaluǧ    güǧüna    amuq-da-č.
now     they-OBL-GEN  farm     forever      behind    stay-FUT-NEG
Now their farm will not stay behind forever.
  OBL = oblique
  GEN = genitive
  FUT = future
  NEG = negation, negative

Morpheme parsing

And you can go deeper, parsing morphemes and glosses according to the LGR (see module pyigt.lgrmorphemes):

>> > igt = IGT(phrase="zəp-le: ȵi-ke: pe-ji qeʴlotʂu-ʁɑ,", gloss="earth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC")
>> > igt.glossed_words[1].gloss_morphemes
[ < Morpheme
"WH" >, < Morpheme
"INDEF:CL" >]
>> > igt.glossed_words[1].gloss_morphemes[1].elements
[ < GlossElement
"INDEF" >, < GlossElementAfterColon
"CL" >]

See also

  • interlineaR - an R package with similar functionality, but support for more input formats.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyigt-1.2.0.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

pyigt-1.2.0-py2.py3-none-any.whl (29.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pyigt-1.2.0.tar.gz.

File metadata

  • Download URL: pyigt-1.2.0.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/28.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.56.2 importlib-metadata/4.10.1 keyring/22.0.1 rfc3986/1.5.0 colorama/0.4.3 CPython/3.8.10

File hashes

Hashes for pyigt-1.2.0.tar.gz
Algorithm Hash digest
SHA256 bd59edf8247e2d1808da4003481e802c4cbb63aaaac333d306af19b4fa514e78
MD5 21c4bfcec7a228cde1e7f0ddfafdad18
BLAKE2b-256 b54e5bec1a7bcc080a534b862a7e5949b8b86614ebf4f7bf971882e7d935aa68

See more details on using hashes here.

File details

Details for the file pyigt-1.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: pyigt-1.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 29.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/28.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.56.2 importlib-metadata/4.10.1 keyring/22.0.1 rfc3986/1.5.0 colorama/0.4.3 CPython/3.8.10

File hashes

Hashes for pyigt-1.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 acf5893f122bb8c78ddb29a0b644981bbc37dbd940160d4ac63b0888a823454c
MD5 b09490a2be0846559b0d8c71c0005dee
BLAKE2b-256 872f6cf21756d1158c39e1df25f68ec20df04bd066dc5e8629ac0b447645c747

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page