Skip to main content

Custom French POS and lemmatizer based on Leff for spacy

Project description

Build StatusCoverage Status

spacy-lefff : Custom French POS and lemmatizer based on Lefff for spacy

spacy v2.0 extension and pipeline component for adding a French POS and lemmatizer based on Lefff.

Description

This package allows to bring Lefff lemmatization and part-of-speech tagging to a spaCy custom pipeline. When POS tagging and Lemmatizaion are combined inside a pipeline, it improves your text preprocessing for French compared to the built-in spaCy French processing.

Installation

spacy-lefff requires spacy <= v2.0.12.

pip install spacy-lefff

Usage

Import and initialize your nlp spacy object and add the custom component after it parsed the document so you can benefit the POS tags. Be aware to work with UTF-8.

If both POS and lemmatizer are bundled, you need to tell the lemmatizer to use MElt mapping by setting after_melt, else it will use the spaCy part of speech mapping. Current mapping used spaCy to Lefff is :

{
    'ADJ': 'adj',
    'ADP': 'det',
    'ADV': 'adv',
    'DET': 'det',
    'PRON': 'cln',
    'PROPN': 'np',
    'NOUN': 'nc',
    'VERB': 'v',
    'PUNCT': 'poncts'
}

MElt Tagset

MElt Tag table:

ADJ 	   adjective
ADJWH	   interrogative adjective
ADV	   adverb
ADVWH	   interrogative adverb
CC	   coordination conjunction
CLO	   object clitic pronoun
CLR	   reflexive clitic pronoun
CLS	   subject clitic pronoun
CS	   subordination conjunction
DET	   determiner
DETWH	   interrogative determiner
ET	   foreign word
I	   interjection
NC	   common noun
NPP	   proper noun
P	   preposition
P+D	   preposition+determiner amalgam
P+PRO	   prepositon+pronoun amalgam
PONCT	   punctuation mark
PREF	   prefix
PRO	   full pronoun
PROREL	   relative pronoun
PROWH	   interrogative pronoun
V	   indicative or conditional verb form
VIMP	   imperative verb form
VINF	   infinitive verb form
VPP	   past participle
VPR	   present participle
VS	   subjunctive verb form

Code snippet

You need to install the French spaCy package before : python -m spacy download fr.

import spacy
from spacy_lefff import LefffLemmatizer, POSTagger

nlp = spacy.load('fr')
pos = POSTagger()
french_lemmatizer = LefffLemmatizer(after_melt=True)
nlp.add_pipe(pos, name='pos', after='parser')
nlp.add_pipe(french_lemmatizer, name='lefff', after='pos')
doc = nlp(u"Qu'est ce qu'il se passe")
for d in doc:
    print(d.text, d.pos_, d._.melt_tagger, d._.lefff_lemma, d.tag_, d.lemma_)

Credits

Sagot, B. (2010). The Lefff, a freely available and large-coverage morphological and syntactic lexicon for French. In 7th international conference on Language Resources and Evaluation (LREC 2010).

Benoît Sagot Webpage about LEFFF
http://alpage.inria.fr/~sagot/lefff-en.html

First work of Claude Coulombe to support Lefff with Python : https://github.com/ClaudeCoulombe

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-lefff-0.3.3.tar.gz (3.0 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page