Skip to main content

Qumin: Quantitative Modelling of Inflection

Project description

tests DocStatus

Qumin (QUantitative Modelling of INflection) is a package for the computational modelling of the inflectional morphology of languages. It was initially developed for Sacha Beniamine’s PhD dissertation.

Contributors: Sacha Beniamine, Jules Bouton.

Documentation: https://qumin.readthedocs.io/

Github: https://github.com/XachaB/Qumin

This is version 2, which was significantly updated since the publications cited below. These updates do not affect results, and focused on bugfixes, command line interface, paralex compatibility, workflow improvement and overall tidyness.

For more detail, you can refer to Sacha’s dissertation (in French, Beniamine 2018).

Citing

If you use Qumin in your research, please cite Sacha’s dissertation (Beniamine 2018), as well as the relevant paper for the specific actions used (see below). To appear in the publications list, send Sacha an email with the reference of your publication at s.<last name>@surrey.ac.uk

Quick Start

Install

Install the Qumin package using pip:

pip install qumin

Data

Qumin works from full paradigm data in phonemic transcription.

The package expects Paralex datasets, containing at least a forms and a sounds table. Note that the sounds files may sometimes require edition, as Qumin imposes more constraints on sound definitions than paralex does.

Scripts

More details on configuration::

/$ qumin --help

Patterns

Alternation patterns serve as a basis for all the other scripts. An early version of the patterns algorithm is described in Beniamine (2017). An updated description figures in Beniamine, Bonami and Luís (2021).

The default action for Qumin is to compute patterns only, so these two commands are identical:

/$ qumin data=<dataset.package.json>
/$ qumin action=patterns data=<dataset.package.json>

By default, Qumin will ignore defective lexemes and overabundant forms.

For paradigm entropy, it is possible to explicitly keep defective lexemes:

/$ qumin pats.defective=True data=<dataset.package.json>

For inflection class lattices, both can be kept:

/$ qumin pats.defective=True pats.overabundant=True data=<dataset.package.json>

Microclasses

To visualize the microclasses and their similarities, one can compute a microclass heatmap:

/$ qumin action=heatmap data=<dataset.package.json>

This will compute patterns, then the heatmap. To pass pre-computed patterns, pass the file path:

/$ qumin action=heatmap patterns=<path/to/patterns.csv> data=<dataset.package.json>

It is also possible to pass class labels to facilitate comparisons with another classification:

/$ qumin.heatmap label=inflection_class patterns=<path/to/patterns.csv> data=<dataset.package.json>

The label key is the name of the column in the Paralex lexemes table to use as labels.

A few more parameters can be changed:

heatmap:
    cmap: null               # colormap name
    exhaustive_labels: False # by default, seaborn shows only some labels on
                            # the heatmap for readability.
                            # This forces seaborn to print all labels.

Paradigm entropy

An early version of this software was used in Bonami and Beniamine 2016, and a more recent one in Beniamine, Bonami and Luís (2021)

By default, this will start by computing patterns. To work with pre-computed patterns, pass their path with patterns=<path/to/patterns.csv>.

Computing entropies from one cell

/$ qumin action=H data=<dataset.package.json>

Computing entropies for other number of predictors:

/$ qumin action=H  n=2 data=<dataset.package.json>
/$ qumin action=H  n="[2,3]" data=<dataset.package.json>

Predicting with known lexeme-wise features (such as gender or inflection class) is also possible. This feature was used in Pellegrini (2023). To use features, pass the name of any column(s) from the lexemes table:

/$ qumin.H  feature=inflection_class patterns=<patterns.csv> data=<dataset.package.json>
/$ qumin.H  feature="[inflection_class,gender]" patterns=<patterns.csv> data=<dataset.package.json>

The config file contains the following keys, which can be set through the command line:

patterns: null        # pre-computed patterns
entropy:
  n:                  # Compute entropy for prediction from with n predictors.
    - 1
  features: null      # Feature column in the Lexeme table.
                      # Features will be considered known in conditional probabilities: P(X~Y|X,f1,f2...)
  importFile: null    # Import entropy file with n-1 predictors (allows for acceleration on nPreds entropy computation).
  merged: False       # Whether identical columns are merged in the input.
  stacked: False      # whether to stack results in long form

For bipartite systems, it is possible to pass two values to both patterns and data, eg:

/$ qumin.H  patterns="[<patterns1.csv>,<patterns2.csv>]" data="[<dataset1.package.json>,<dataset2.package.json>]"

Visualizing results

Since Qumin 2.0, results are shipped as long tables. This allows to store several metrics in the same file, with results for several runs. Results file now look like this:

predictor,predicted,measure,value,n_pairs,n_preds,dataset
<cell1>,<cell2>,cond_entropy,0.39,500,1,<dataset_name>
<cell1>,<cell2>,cond_entropy,0.35,500,1,<dataset_name>
<cell1>,<cell2>,cond_entropy,0.2,500,1,<dataset_name>
<cell1>,<cell2>,cond_entropy,0.43,500,1,<dataset_name>
<cell1>,<cell2>,cond_entropy,0.6,500,1,<dataset_name>
<cell1>,<cell2>,cond_entropy,0.1,500,1,<dataset_name>

All results are in the same file, including different number of predictors (indicated in the n_preds column), and different measures (indicated in the measure column).

To facilitate a quick general glance at the results, we output an entropy heatmap in the wide matrix format. This behaviour can be disabled by passing entropy.heatmap=False. It takes advantage of the Paralex features-values table to sort the cells in a canonical order on the heatmap. The heatmap.order setting is used to specify which feature should have higher priority in the sorting:

/$ qumin action=H data=<dataset.package.json> heatmap.order="[number, case]"

It is also possible to draw an entropy heatmap without running entropy computations:

/$ qumin action=ent_heatmap entropy.importFile=<entropies.csv>

The config file contains the following keys, which can be set through the command line:

heatmap:
  cmap: null               # colormap name
  exhaustive_labels: False # by default, seaborn shows only some labels on
                           # the heatmap for readability.
                           # This forces seaborn to print all labels.
  dense: False             # Use initials instead of full labels (only for entropy heatmap)
  annotate: False          # Display values on the heatmap. (only for entropy heatmap)
  order: False             # Priority list for sorting features (for entropy heatmap)
                           # ex: [number, case]). If no features-values file available,
                           # it should contain an ordered list of the cells to display.
entropy:
  heatmap: True        # Whether to draw a heatmap.

Macroclass inference

Our work on automatical inference of macroclasses was published in Beniamine, Bonami and Sagot (2018)”.

By default, this will start by computing patterns. To work with pre-computed patterns, pass their path with patterns=<path/to/patterns.csv>.

Inferring macroclasses

/$ qumin action=macroclasses data=<dataset.package.json>

Lattices

By default, this will start by computing patterns. To work with pre-computed patterns, pass their path with patterns=<path/to/patterns.csv>.

This software was used in Beniamine (2021)”.

Inferring a lattice of inflection classes, with (default) html output

/$ qumin action=lattice pats.defective=True pats.overabundant=True data=<dataset.package.json>

Further config options:

lattice:
  shorten: False      # Drop redundant columns altogether.
                      #  Useful for big contexts, but loses information.
                      # The lattice shape and stats will be the same.
                      # Avoid using with --html
  aoc: False          # Only attribute and object concepts
  stat: False         # Output stats about the lattice
  html: False         # Export to html
  ctxt: False         # Export as a context
  pdf: True           # Export as pdf
  png: False          # Export as png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qumin-2.0.0.tar.gz (93.8 kB view details)

Uploaded Source

Built Distribution

qumin-2.0.0-py2.py3-none-any.whl (106.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file qumin-2.0.0.tar.gz.

File metadata

  • Download URL: qumin-2.0.0.tar.gz
  • Upload date:
  • Size: 93.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for qumin-2.0.0.tar.gz
Algorithm Hash digest
SHA256 0d9f1fa7d2f90fe04d09be1e0d368e77939dec6578fbf4315eadbfa5f91dcc7b
MD5 b6fedeab79274a6c4033afc17847d52a
BLAKE2b-256 0a07e04d26c46c9f0984843f864c733374b739991979260219422a8ef8b62381

See more details on using hashes here.

File details

Details for the file qumin-2.0.0-py2.py3-none-any.whl.

File metadata

  • Download URL: qumin-2.0.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 106.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for qumin-2.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 de58a485b8f9a88a5bd4f692d51bcaaa3b4a2603083c098e37624e2779cc21fc
MD5 f213e250520657c831f4013b109861a2
BLAKE2b-256 fbbdea2d4c5a8d737a6708d4f3f9130e9c48e38aee5af35b017cad2fc2386db9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page