Skip to main content

Python module to decompose nouns based on the SECOS algorithm

Project description

SECOS

This repo is a modular python implementation of the SECOS algorithm for decomposing composite nouns.

Based on the SECOS algorithm:

original python implementation

original paper

However, the training data of the models have been distilled slightly to reduce the size of the models. More information on this can be found in the pretrained-models directory in the github repo. Typically though it involves trimming out words with low frequency counts, words with non-unicode characters etc.

Installation

From Github

pip install git+https://github.com/mhaugestad/noun-splitter.git -U

From Source

git clone
cd noun-splitter
pip install -e . -U

From Pip

pip install noun-splitter

Installing models:

The module relies on pretrained models to be passed in. These can be downloaded from command line as follows:

python -m secos download --model de

The command line tool also takes an optional argument --overwrite. This is to be used if you would like to redownload a model for whatever reason, as follows:

python -m secos download --model no --overwrite

Alternatively, you can download models directly from a python script or notebook like this:

from secos import Decomposition

Decomposition.download_model('de')

Available models and their names are:

Language Model
Danish da
German de
English en
Spanish es
Estonian et
Finnish fi
Hungarian hu
Latin la
Latvian lv
Netherland nl
Norway no
Swedish sv

Basic Usage

from secos import Decomposition

model = Decomposition.load_model('de')

secos = Decomposition(model)

secos.decompose("Bundesfinanzministerium")

['bundes', 'finanz', 'ministerium']

Evaluation

The evaluation folder in the github repo includes code for the evaluation of the pretrained models.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

noun-splitter-0.0.1.tar.gz (9.7 kB view details)

Uploaded Source

File details

Details for the file noun-splitter-0.0.1.tar.gz.

File metadata

  • Download URL: noun-splitter-0.0.1.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for noun-splitter-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b14f2ce6825acda428dafdd3a9f1327c77880f62d95049d53ada68153a86f294
MD5 d3797f92b1b70e6628c52d469e02e608
BLAKE2b-256 4c8e1960f051c2d1980573bd1b5a65e36058d63bb9be3245804353d364daa58f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page