MoDeST: a Morphological Decomposition & Segmentation Trove.
Project description
MoDeST: a Morphological Decomposition & Segmentation Trove
The point of MoDeST is two-fold:
- Provide a general object-oriented Python interface to access morphological decompositions and segmentations;
- Host morphological datasets generated by smaller research groups that would otherwise have a hard time being found.
Morphological decomposition is the task of recognising which building blocks a word was originally constructed from. These building blocks are its morphemes.
As an example, the Dutch derivation isometrisch ("isometric") can be decomposed into the morphemes iso, meter and isch.
Morphological segmentation is the task of isolating the substrings of a word that correspond to its morphemes. These substrings are called morphs.
In the above example, the segmentation would be iso/metr/ic.
Installation
Run
pip install "modest[github] @ git+https://github.com/bauwenst/MoDeST.git"
Repo layout
Currently, the repo looks as follows:
data/ ---> Datasets hosted specifically by MoDeST on GitHub. Will NOT be downloaded when you install the package.
src/modest/ ---> All source code for the Python package that will be installed in your interpreter.
datasets/ ---> Per-language definitions of the classes users will interact with.
downloaders/ ---> Support code for pulling in remote data.
formats/ ---> Support code for interpreting file formats by specific authors/organisations.
interfaces/ ---> Declarations of the interfaces users will interact with.
Currently, every language has its own subpackage under datasets/. The assumption is that every language may have multiple
datasets and that every dataset may need more than a single class definition to work (even if most of the support code should
be under formats/). If this turns out not to be the case in the future, we might go from datasets/{language}/{dataset}.py
to instead datasets/{language}.py for simplicity of imports.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file modest_bauwenst-2024.7.1.tar.gz.
File metadata
- Download URL: modest_bauwenst-2024.7.1.tar.gz
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.5 cpython/3.13.12 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
454748faca567e71efaaa9a998af110c2453916c368534e783999122490d261f
|
|
| MD5 |
4b5153708f9bd5b821ef895f48ce8a48
|
|
| BLAKE2b-256 |
66083c742c05a8214e755d1bf37ed68d82fccef3acb7cfa57e783be21644d48c
|
File details
Details for the file modest_bauwenst-2024.7.1-py3-none-any.whl.
File metadata
- Download URL: modest_bauwenst-2024.7.1-py3-none-any.whl
- Upload date:
- Size: 27.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.5 cpython/3.13.12 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75909f5bd99fd2b617731023cc934c4568ff14ccfd35cab4d084b6f87f47459f
|
|
| MD5 |
76a2c5b81827c46ca8fb29b86a5993af
|
|
| BLAKE2b-256 |
21a4e19d9546a5adbecaa0ce592a9bd3bb920793a1784bfc9854c242b674c7ea
|