Extension for nlp-pie package
Project description
Pie Extended
Extension for pie
to include taggers with their models and pre/postprocessors.
Pie is a wonderful tool to train models. And most of the time, it will be enough. What pie_extended
is proposing here
is to provide you with the necessary tools to share your models with customized pre- and post-processing.
The current system provide an easier access to adding customized:
- normalization of your text,
- sentence tokenization,
- word tokenization,
- disambiguation,
- output formatting
Install
To install, simply do pip install pie-extended
. Then, look at all available models.
Run on terminal
But on top of that, it provides a quick and easy way to use others models ! For example, in a shell :
pie-extended download lasla
pie-extended install-addons lasla
pie-extended tag laslsa your_file.txt
will give you access to all you need !
Python API
You can run the lemmatizer in your own scripts and retrieve token annotations as dictionaries:
from typing import List
from pie_extended.cli.sub import get_tagger, get_model, download
# In case you need to download
do_download = False
if do_download:
for dl in download("lasla"):
x = 1
# model_path allows you to override the model loaded by another .tar
model_name = "lasla"
tagger = get_tagger(model_name, batch_size=256, device="cpu", model_path=None)
sentences: List[str] = ["Lorem ipsum dolor sit amet, consectetur adipiscing elit. "]
# Get the main object from the model (: data iterator + postprocesor
from pie_extended.models.lasla.imports import get_iterator_and_processor
for sentence_group in sentences:
iterator, processor = get_iterator_and_processor()
print(tagger.tag_str(sentence_group, iterator=iterator, processor=processor) )
will result in
[{'form': 'lorem', 'lemma': 'lor', 'POS': 'NOMcom', 'morph': 'Case=Acc|Numb=Sing', 'treated': 'lorem'},
{'form': 'ipsum', 'lemma': 'ipse', 'POS': 'PROdem', 'morph': 'Case=Acc|Numb=Sing', 'treated': 'ipsum'},
{'form': 'dolor', 'lemma': 'dolor', 'POS': 'NOMcom', 'morph': 'Case=Nom|Numb=Sing', 'treated': 'dolor'},
{'form': 'sit', 'lemma': 'sum1', 'POS': 'VER', 'morph': 'Numb=Sing|Mood=Sub|Tense=Pres|Voice=Act|Person=3',
'treated': 'sit'},
{'form': 'amet', 'lemma': 'amo', 'POS': 'VER', 'morph': 'Numb=Sing|Mood=Sub|Tense=Pres|Voice=Act|Person=3',
'treated': 'amet'}, {'form': ',', 'lemma': ',', 'pos': 'PUNC', 'morph': 'MORPH=empty', 'treated': ','},
{'form': 'consectetur', 'lemma': 'consector2', 'POS': 'VER',
'morph': 'Numb=Sing|Mood=Sub|Tense=Pres|Voice=Dep|Person=3', 'treated': 'consectetur'},
{'form': 'adipiscing', 'lemma': 'adipiscor', 'POS': 'VER', 'morph': 'Tense=Pres|Voice=Dep', 'treated': 'adipiscing'},
{'form': 'elit', 'lemma': 'elio', 'POS': 'VER', 'morph': 'Numb=Sing|Mood=Ind|Tense=Pres|Voice=Act|Person=3',
'treated': 'elit'}, {'form': '.', 'lemma': '.', 'pos': 'PUNC', 'morph': 'MORPH=empty', 'treated': '.'}]
Add a model
- Create a package in
./pie_extended/models/
. Exemple:foo
. - Add the name of the package in
./pie_extended/models/__init__.py
in the variablemodules
. - In the module
pie_extended.models.foo
, we should find the following variable:Models
: a string with filenames and tasks for Pie.DESC
: a METADATA object that bears information about the modelDOWNLOADS
: A list of file to download.
from pie_extended.utils import Metadata, File, get_path
DESC = Metadata(
"Foo"
"language",
["Author 1", "Author 2"],
"A readable description",
"A link to more information"
)
DOWNLOADS = [
File("/a/link/to/a/file", "local_name_of_the_file.tar")
]
Models = "<{},task1,task2><{},lemma,pos>".format(
get_path("foo", "local_name_of_the_file.tar")
)
- In the module
pie_extended.models.foo.imports
, we should find the following content:get_iterator_and_processor
: a function that returns aDataIterator
and aProcessor
- (optionally)
addons
: a function that installs add-ons - (optionally)
Disambiguator
: a disambiguator instance (or an object creator that returns one)
Check for a simple example in pie_extended.models.fro.imports
and a more complex one
in pie_extended.models.lasla.imports
Warning
This is an extremely early build, subject to change here and there. But it is functional !
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pie_extended-0.0.12-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2da5603b72bdba374027393930fb30532fa0b659cbdf3b906bf5bba715207f0e |
|
MD5 | efc5c7305439edddc4c32ef0f6704b8a |
|
BLAKE2b-256 | 2029f1d310797e426a3c9f9122bd4bc32fd81ca729e69ae860511e0e6acf9380 |