Extension for nlp-pie package
Project description
Pie Extended
Warning: This software is only compatible with up to Python 3.7 for the moment.
Extension for pie
to include taggers with their models and pre/postprocessors.
Pie is a wonderful tool to train models. And most of the time, it will be enough. What pie_extended
is proposing here
is to provide you with the necessary tools to share your models with customized pre- and post-processing.
The current system provide an easier access to adding customized:
- normalization of your text,
- sentence tokenization,
- word tokenization,
- disambiguation,
- output formatting
Cite as
@software{thibault_clerice_2020_3883590,
author = {Clérice, Thibault},
title = {Pie Extended, an extension for Pie with pre-processing and post-processing},
month = jun,
year = 2020,
publisher = {Zenodo},
doi = {10.5281/zenodo.3883589},
url = {https://doi.org/10.5281/zenodo.3883589}
}
Current supported languages
- Classical Latin (Model:
lasla
) - Ancient Greek (Model:
grc
) - Old French (Model:
fro
) - Early Modern French (Model:
freem
) - Classical French (Model:
fr
) - Old Dutch (Model:
dum
)
If you trained models and want some help sharing them with Pie Extended, open an issue :)
Install
To install, simply do pip install pie-extended
. Then, look at all available models.
WARNING: if you don't have a GPU or CUDA
Please, in case of doubt, run pip install pie-extended --extra-index-url https://download.pytorch.org/whl/cpu
Run on terminal
But on top of that, it provides a quick and easy way to use others models ! For example, in a shell :
pie-extended download lasla
pie-extended install-addons lasla
pie-extended tag lasla your_file.txt
will give you access to all you need !
Python API
You can run the lemmatizer in your own scripts and retrieve token annotations as dictionaries:
from typing import List
from pie_extended.cli.utils import get_tagger, get_model, download
# In case you need to download
do_download = False
if do_download:
for dl in download("lasla"):
x = 1
# model_path allows you to override the model loaded by another .tar
model_name = "lasla"
tagger = get_tagger(model_name, batch_size=256, device="cpu", model_path=None)
sentences: List[str] = ["Lorem ipsum dolor sit amet, consectetur adipiscing elit. "]
# Get the main object from the model (: data iterator + postprocesor
from pie_extended.models.lasla.imports import get_iterator_and_processor
for sentence_group in sentences:
iterator, processor = get_iterator_and_processor()
print(tagger.tag_str(sentence_group, iterator=iterator, processor=processor) )
will result in
[{'form': 'lorem', 'lemma': 'lor', 'POS': 'NOMcom', 'morph': 'Case=Acc|Numb=Sing', 'treated': 'lorem'},
{'form': 'ipsum', 'lemma': 'ipse', 'POS': 'PROdem', 'morph': 'Case=Acc|Numb=Sing', 'treated': 'ipsum'},
{'form': 'dolor', 'lemma': 'dolor', 'POS': 'NOMcom', 'morph': 'Case=Nom|Numb=Sing', 'treated': 'dolor'},
{'form': 'sit', 'lemma': 'sum1', 'POS': 'VER', 'morph': 'Numb=Sing|Mood=Sub|Tense=Pres|Voice=Act|Person=3',
'treated': 'sit'},
{'form': 'amet', 'lemma': 'amo', 'POS': 'VER', 'morph': 'Numb=Sing|Mood=Sub|Tense=Pres|Voice=Act|Person=3',
'treated': 'amet'}, {'form': ',', 'lemma': ',', 'pos': 'PUNC', 'morph': 'MORPH=empty', 'treated': ','},
{'form': 'consectetur', 'lemma': 'consector2', 'POS': 'VER',
'morph': 'Numb=Sing|Mood=Sub|Tense=Pres|Voice=Dep|Person=3', 'treated': 'consectetur'},
{'form': 'adipiscing', 'lemma': 'adipiscor', 'POS': 'VER', 'morph': 'Tense=Pres|Voice=Dep', 'treated': 'adipiscing'},
{'form': 'elit', 'lemma': 'elio', 'POS': 'VER', 'morph': 'Numb=Sing|Mood=Ind|Tense=Pres|Voice=Act|Person=3',
'treated': 'elit'}, {'form': '.', 'lemma': '.', 'pos': 'PUNC', 'morph': 'MORPH=empty', 'treated': '.'}]
Add a model
- Create a package in
./pie_extended/models/
. Exemple:foo
. - Add the name of the package in
./pie_extended/models/__init__.py
in the variablemodules
. - In the module
pie_extended.models.foo
, we should find the following variable:Models
: a string with filenames and tasks for Pie.DESC
: a METADATA object that bears information about the modelDOWNLOADS
: A list of file to download.
from pie_extended.utils import Metadata, File, get_path
DESC = Metadata(
"Foo"
"language",
["Author 1", "Author 2"],
"A readable description",
"A link to more information"
)
DOWNLOADS = [
File("/a/link/to/a/file", "local_name_of_the_file.tar")
]
Models = "<{},task1,task2><{},lemma,pos>".format(
get_path("foo", "local_name_of_the_file.tar")
)
- In the module
pie_extended.models.foo.imports
, we should find the following content:get_iterator_and_processor
: a function that returns aDataIterator
and aProcessor
- (optionally)
addons
: a function that installs add-ons - (optionally)
Disambiguator
: a disambiguator instance (or an object creator that returns one)
Check for a simple example in pie_extended.models.fro.imports
and a more complex one
in pie_extended.models.lasla.imports
Install development version (⚠ for development only)
Clone the repository, create an environment, and then
python setup.py develop
Warning
This is an extremely early build, subject to change here and there. But it is functional !
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pie_extended-0.1.2.tar.gz
.
File metadata
- Download URL: pie_extended-0.1.2.tar.gz
- Upload date:
- Size: 46.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6522a081202fd9072eccd17bbd66652a4e64a6c20df048b5b6109d15ccb505e |
|
MD5 | 9f1f5298188f6316b474fde72a7791d0 |
|
BLAKE2b-256 | 02954fee65f85fc39cb573a5e71737583c78e2b51e6da097a13d31aed6b80b42 |
File details
Details for the file pie_extended-0.1.2-py2.py3-none-any.whl
.
File metadata
- Download URL: pie_extended-0.1.2-py2.py3-none-any.whl
- Upload date:
- Size: 69.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c7046b426f89f7d29783aae83d0ec285a52a2435bf736cf6c05b59388650ab5 |
|
MD5 | 08e77c047527f8e5ec547ebcbb7f6c3d |
|
BLAKE2b-256 | 5aa363e64fcd06790d9ee0b60958db71dd6a92577f777f4615326c04f2e9b3f2 |