Skip to main content

MoDeST: a Morphological Decomposition & Segmentation Trove.

Project description

MoDeST: a Morphological Decomposition & Segmentation Trove

The point of MoDeST is two-fold:

  1. Provide a general object-oriented Python interface to access morphological decompositions and segmentations;
  2. Host morphological datasets generated by smaller research groups that would otherwise have a hard time being found.

Morphological decomposition is the task of recognising which building blocks a word was originally constructed from. These building blocks are its morphemes. As an example, the Dutch derivation isometrisch ("isometric") can be decomposed into the morphemes iso, meter and isch.

Morphological segmentation is the task of isolating the substrings of a word that correspond to its morphemes. These substrings are called morphs. In the above example, the segmentation would be iso/metr/ic.

Languages and Datasets

The supported languages are simply under modest.languages, so the list will not be reproduced here. The list of datasets roughly coincides with the downloaders under modest.datasets. Currently, the package supports:

  • CELEX
  • MorphyNet
  • MorphoChallenge2010
  • CompoundPiece

Installation

Run

pip install "modest[github] @ git+https://github.com/bauwenst/MoDeST.git"

Repo layout

Currently, the repo looks as follows:

data/              ---> Datasets hosted specifically by MoDeST on GitHub. Will NOT be downloaded when you install the package.
src/modest/        ---> All source code for the Python package that will be installed in your interpreter.
    languages/     ---> Per-language definitions of the classes users will interact with.
    datasets/      ---> Support code for pulling in and reading remote data.
    formats/       ---> Support code for turning tag formats into objects. (Tag formats are independent of how the tags are stored.)
    interfaces/    ---> Declarations of the interfaces users will interact with.

Currently, every language has its own file under languages/. The assumption is that the datasets pertaining to one language are sufficiently encapsulated that this will not clutter the imports from such a file. There are two arguments in favour of going from languages/{language}.py to instead languages/{language}/{dataset}.py:

  1. Autocompletion for the last . of the import suggests exactly the list of available datasets for that language;
  2. You do not have to have all the packages installed required to download/build all the datasets for a language if you only need one. (However, realistically, since MoDeST is for final datasets rather than making datasets, the code for pulling them should not be that complicated.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modest_bauwenst-2024.8.1.tar.gz (35.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modest_bauwenst-2024.8.1-py3-none-any.whl (43.0 kB view details)

Uploaded Python 3

File details

Details for the file modest_bauwenst-2024.8.1.tar.gz.

File metadata

  • Download URL: modest_bauwenst-2024.8.1.tar.gz
  • Upload date:
  • Size: 35.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.13.12 HTTPX/0.28.1

File hashes

Hashes for modest_bauwenst-2024.8.1.tar.gz
Algorithm Hash digest
SHA256 7bf5b16992ded2c7e1b0bb2545902ee174b2b9321544b98bc2932e7154d33f1f
MD5 d08eff49977235a57fc4b7b3e043bca7
BLAKE2b-256 bd24317c25d58a2dbec8dd9acf793d69060e59be7d9135649c75c8e136320f4a

See more details on using hashes here.

File details

Details for the file modest_bauwenst-2024.8.1-py3-none-any.whl.

File metadata

  • Download URL: modest_bauwenst-2024.8.1-py3-none-any.whl
  • Upload date:
  • Size: 43.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.13.12 HTTPX/0.28.1

File hashes

Hashes for modest_bauwenst-2024.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5be59b754b85e0eb3271b3ca7f455d497956234acacfe315f7ec1b047466a152
MD5 c9c083ac468e7e755d8eb12b132bb8cc
BLAKE2b-256 aeddb1c7a465857e6e276eac2c5776bf18cfc81c3a756ee929741b1d429a7651

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page