Skip to main content

MoDeST: a Morphological Decomposition & Segmentation Trove.

Project description

MoDeST: a Morphological Decomposition & Segmentation Trove

The point of MoDeST is two-fold:

  1. Provide a general object-oriented Python interface to access morphological decompositions and segmentations;
  2. Host morphological datasets generated by smaller research groups that would otherwise have a hard time being found.

Morphological decomposition is the task of recognising which building blocks a word was originally constructed from. These building blocks are its morphemes. As an example, the Dutch derivation isometrisch ("isometric") can be decomposed into the morphemes iso, meter and isch.

Morphological segmentation is the task of isolating the substrings of a word that correspond to its morphemes. These substrings are called morphs. In the above example, the segmentation would be iso/metr/ic.

Installation

Run

pip install "modest[github] @ git+https://github.com/bauwenst/MoDeST.git"

Repo layout

Currently, the repo looks as follows:

data/              ---> Datasets hosted specifically by MoDeST on GitHub. Will NOT be downloaded when you install the package.
src/modest/        ---> All source code for the Python package that will be installed in your interpreter.
    datasets/      ---> Per-language definitions of the classes users will interact with.
    downloaders/   ---> Support code for pulling in remote data.
    formats/       ---> Support code for interpreting file formats by specific authors/organisations.
    interfaces/    ---> Declarations of the interfaces users will interact with.

Currently, every language has its own subpackage under datasets/. The assumption is that every language may have multiple datasets and that every dataset may need more than a single class definition to work (even if most of the support code should be under formats/). If this turns out not to be the case in the future, we might go from datasets/{language}/{dataset}.py to instead datasets/{language}.py for simplicity of imports.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modest_bauwenst-2024.7.1.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modest_bauwenst-2024.7.1-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file modest_bauwenst-2024.7.1.tar.gz.

File metadata

  • Download URL: modest_bauwenst-2024.7.1.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.13.12 HTTPX/0.28.1

File hashes

Hashes for modest_bauwenst-2024.7.1.tar.gz
Algorithm Hash digest
SHA256 454748faca567e71efaaa9a998af110c2453916c368534e783999122490d261f
MD5 4b5153708f9bd5b821ef895f48ce8a48
BLAKE2b-256 66083c742c05a8214e755d1bf37ed68d82fccef3acb7cfa57e783be21644d48c

See more details on using hashes here.

File details

Details for the file modest_bauwenst-2024.7.1-py3-none-any.whl.

File metadata

  • Download URL: modest_bauwenst-2024.7.1-py3-none-any.whl
  • Upload date:
  • Size: 27.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.13.12 HTTPX/0.28.1

File hashes

Hashes for modest_bauwenst-2024.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 75909f5bd99fd2b617731023cc934c4568ff14ccfd35cab4d084b6f87f47459f
MD5 76a2c5b81827c46ca8fb29b86a5993af
BLAKE2b-256 21a4e19d9546a5adbecaa0ce592a9bd3bb920793a1784bfc9854c242b674c7ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page