Skip to main content

Extension for nlp-pie package

Project description

Pie Extended

Build Status Coverage Status PyPI

Warning: This software is only compatible with up to Python 3.7 for the moment.

Extension for pie to include taggers with their models and pre/postprocessors.

Pie is a wonderful tool to train models. And most of the time, it will be enough. What pie_extended is proposing here is to provide you with the necessary tools to share your models with customized pre- and post-processing.

The current system provide an easier access to adding customized:

  • normalization of your text,
  • sentence tokenization,
  • word tokenization,
  • disambiguation,
  • output formatting

Cite as

@software{thibault_clerice_2020_3883590,
  author       = {Clérice, Thibault},
  title        = {Pie Extended, an extension for Pie with pre-processing and post-processing},
  month        = jun,
  year         = 2020,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.3883589},
  url          = {https://doi.org/10.5281/zenodo.3883589}
}

Current supported languages

  • Classical Latin (Modele: lasla)
  • Ancient Greek (Modele: grc)
  • Old French (Modele: fro)
  • Early Modern French (Modele: freem)
  • Classical French (Modele: fr)

If you trained models and want some help sharing them with Pie Extended, open an issue :)

Install

To install, simply do pip install pie-extended. Then, look at all available models.

Run on terminal

But on top of that, it provides a quick and easy way to use others models ! For example, in a shell :

pie-extended download lasla
pie-extended install-addons lasla
pie-extended tag laslsa your_file.txt

will give you access to all you need !

Python API

You can run the lemmatizer in your own scripts and retrieve token annotations as dictionaries:

from typing import List
from pie_extended.cli.utils import get_tagger, get_model, download

# In case you need to download
do_download = False
if do_download:
    for dl in download("lasla"):
        x = 1

# model_path allows you to override the model loaded by another .tar
model_name = "lasla"
tagger = get_tagger(model_name, batch_size=256, device="cpu", model_path=None)

sentences: List[str] = ["Lorem ipsum dolor sit amet, consectetur adipiscing elit. "]
# Get the main object from the model (: data iterator + postprocesor
from pie_extended.models.lasla.imports import get_iterator_and_processor
for sentence_group in sentences:
    iterator, processor = get_iterator_and_processor()
    print(tagger.tag_str(sentence_group, iterator=iterator, processor=processor) )

will result in

[{'form': 'lorem', 'lemma': 'lor', 'POS': 'NOMcom', 'morph': 'Case=Acc|Numb=Sing', 'treated': 'lorem'},
 {'form': 'ipsum', 'lemma': 'ipse', 'POS': 'PROdem', 'morph': 'Case=Acc|Numb=Sing', 'treated': 'ipsum'},
 {'form': 'dolor', 'lemma': 'dolor', 'POS': 'NOMcom', 'morph': 'Case=Nom|Numb=Sing', 'treated': 'dolor'},
 {'form': 'sit', 'lemma': 'sum1', 'POS': 'VER', 'morph': 'Numb=Sing|Mood=Sub|Tense=Pres|Voice=Act|Person=3',
  'treated': 'sit'},
 {'form': 'amet', 'lemma': 'amo', 'POS': 'VER', 'morph': 'Numb=Sing|Mood=Sub|Tense=Pres|Voice=Act|Person=3',
  'treated': 'amet'}, {'form': ',', 'lemma': ',', 'pos': 'PUNC', 'morph': 'MORPH=empty', 'treated': ','},
 {'form': 'consectetur', 'lemma': 'consector2', 'POS': 'VER',
  'morph': 'Numb=Sing|Mood=Sub|Tense=Pres|Voice=Dep|Person=3', 'treated': 'consectetur'},
 {'form': 'adipiscing', 'lemma': 'adipiscor', 'POS': 'VER', 'morph': 'Tense=Pres|Voice=Dep', 'treated': 'adipiscing'},
 {'form': 'elit', 'lemma': 'elio', 'POS': 'VER', 'morph': 'Numb=Sing|Mood=Ind|Tense=Pres|Voice=Act|Person=3',
  'treated': 'elit'}, {'form': '.', 'lemma': '.', 'pos': 'PUNC', 'morph': 'MORPH=empty', 'treated': '.'}]

Add a model

  • Create a package in ./pie_extended/models/. Exemple: foo.
  • Add the name of the package in ./pie_extended/models/__init__.py in the variable modules.
  • In the module pie_extended.models.foo, we should find the following variable:
    • Models : a string with filenames and tasks for Pie.
    • DESC: a METADATA object that bears information about the model
    • DOWNLOADS: A list of file to download.
from pie_extended.utils import Metadata, File, get_path

DESC = Metadata(
    "Foo"
    "language",
    ["Author 1", "Author 2"],
    "A readable description",
    "A link to more information"
)

DOWNLOADS = [
    File("/a/link/to/a/file", "local_name_of_the_file.tar")
]


Models = "<{},task1,task2><{},lemma,pos>".format(
    get_path("foo", "local_name_of_the_file.tar")
)
  • In the module pie_extended.models.foo.imports, we should find the following content:
    1. get_iterator_and_processor: a function that returns a DataIterator and a Processor
    2. (optionally) addons: a function that installs add-ons
    3. (optionally) Disambiguator: a disambiguator instance (or an object creator that returns one)

Check for a simple example in pie_extended.models.fro.imports and a more complex one in pie_extended.models.lasla.imports

Warning

This is an extremely early build, subject to change here and there. But it is functional !

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pie_extended-0.0.23.tar.gz (35.5 kB view details)

Uploaded Source

Built Distribution

pie_extended-0.0.23-py2.py3-none-any.whl (65.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pie_extended-0.0.23.tar.gz.

File metadata

  • Download URL: pie_extended-0.0.23.tar.gz
  • Upload date:
  • Size: 35.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.7.5

File hashes

Hashes for pie_extended-0.0.23.tar.gz
Algorithm Hash digest
SHA256 46071f6c8306e538adc658bc3001a5e6c63f286f8197b1ed05a85ea0d53b6910
MD5 8411581bc7922fa58e6ce89e0988c80e
BLAKE2b-256 a7360194bd5c91f1814aae3360492441f00a3ab287e397dd5c17240b01ad5b93

See more details on using hashes here.

File details

Details for the file pie_extended-0.0.23-py2.py3-none-any.whl.

File metadata

  • Download URL: pie_extended-0.0.23-py2.py3-none-any.whl
  • Upload date:
  • Size: 65.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.7.5

File hashes

Hashes for pie_extended-0.0.23-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c28e36d08f83e8681b2e58de2613d7f00f50e70b999e41b316a7d2dcaddb6c4d
MD5 03b2afd0756ef44c069cec58afa9a7d7
BLAKE2b-256 7b4577e4c5b591265e1a2518f453a14da7ed26990163c8004d051314bd1fa688

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page