Python module to decompose nouns based on the SECOS algorithm
Project description
SECOS
This repo is a modular python implementation of the SECOS algorithm for decomposing composite nouns.
Based on the SECOS algorithm:
original python implementation
However, the training data of the models have been distilled slightly to reduce the size of the models. More information on this can be found in the pretrained-models directory in the github repo. Typically though it involves trimming out words with low frequency counts, words with non-unicode characters etc.
Installation
From Github
pip install git+https://github.com/mhaugestad/noun-splitter.git -U
From Source
git clone
cd noun-splitter
pip install -e . -U
From Pip
pip install noun-splitter
Installing models:
The module relies on pretrained models to be passed in. These can be downloaded from command line as follows:
python -m secos download --model de
The command line tool also takes an optional argument --overwrite. This is to be used if you would like to redownload a model for whatever reason, as follows:
python -m secos download --model no --overwrite
Alternatively, you can download models directly from a python script or notebook like this:
from secos import Decomposition
Decomposition.download_model('de')
Available models and their names are:
Language | Model |
---|---|
Danish | da |
German | de |
English | en |
Spanish | es |
Estonian | et |
Finnish | fi |
Hungarian | hu |
Latin | la |
Latvian | lv |
Netherland | nl |
Norway | no |
Swedish | sv |
Basic Usage
from secos import Decomposition
model = Decomposition.load_model('de')
secos = Decomposition(model)
secos.decompose("Bundesfinanzministerium")
['bundes', 'finanz', 'ministerium']
Evaluation
The evaluation folder in the github repo includes code for the evaluation of the pretrained models.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file noun-splitter-0.0.1.tar.gz
.
File metadata
- Download URL: noun-splitter-0.0.1.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b14f2ce6825acda428dafdd3a9f1327c77880f62d95049d53ada68153a86f294 |
|
MD5 | d3797f92b1b70e6628c52d469e02e608 |
|
BLAKE2b-256 | 4c8e1960f051c2d1980573bd1b5a65e36058d63bb9be3245804353d364daa58f |