Pylexique is a Python wrapper around Lexique83
Project description
pylexique
Pylexique is a Python wrapper around Lexique83.
It allows to extract lexical information from more than 140 000 French words in an Object Oriented way.
Each lexical item is represented in a LexItem having the following LexEntryType:
class LexEntryTypes:
"""
Type information about all the lexical attributes in a LexItem object.
"""
ortho = str
phon = str
lemme = str
cgram = str
genre = str
nombre = str
freqlemfilms2 = float
freqlemlivres = float
freqfilms2 = float
freqlivres = float
infover = str
nbhomogr = int
nbhomoph = int
islem = bool
nblettres = int
nbphons = int
cvcv = str
p_cvcv = str
voisorth = int
voisphon = int
puorth = int
puphon = int
syll = str
nbsyll = int
cv_cv = str
orthrenv = str
phonrenv = str
orthosyll = str
cgramortho = str
deflem = float
defobs = int
old20 = float
pld20 = float
morphoder = str
nbmorph = int
id = int
The meanings of the attributes of this object are as follow:
ortho: the word
phon: the phonological forms of the word
lemme: the lemmas of this word
cgram: the grammatical categories of this word
genre: the gender
nombre: the number
freqlemfilms: the frequency of the lemma according to the corpus of subtitles (per million occurrences)
freqlemlivres: the frequency of the lemma according to the body of books (per million occurrences)
freqfilms: the frequency of the word according to the corpus of subtitles (per million occurrences)
freqbooks: the frequency of the word according to the body of books (per million occurrences)
infover: modes, tenses, and possible people for verbs
nbhomogr: number of homographs
nbhomoph: number of homophones
islem: indicates if it is a lemma or not
nbletters: the number of letters
nbphons: number of phonemes
cvcv: the orthographic structure
p-cvcv: the phonological structure
voisorth: number of orthographic neighbors
voisphon: number of phonological neighbors
puorth: point of spelling uniqueness
puphon: point of phonological uniqueness
syll: syllable phonological form
nbsyll: number of syllables
cv-cv: syllable phonological structure
orthrenv: reverse orthographic form
phonrenv: reversed phonological form
orthosyll: syllable orthographic form
You can find all the revelation in the official documentation of Lexique83
Free software: MIT license
Documentation: https://pylexique.readthedocs.io.
Features
Extract all lexical information from a French word.
Easy to use Api.
Easily integrate pylexique in your own projects as an imported library.
Can be used as a command line tool.
Credits
Main developer SekouDiaoNlp.
Lexical corpus: Lexique83
About Lexique383:
Lexique3
Lexique 3.83 est une base de données lexicales du français qui fournit pour ~140000 mots du français: les représentations orthographiques et phonémiques, les lemmes associés, la syllabation, la catégorie grammaticale, le genre et le nombre, les fréquences dans un corpus de livres et dans un corpus de sous-titres de filems, etc.
Table: Lexique383.zip
Web site: http://www.lexique.org
Publications
New, Boris, Christophe Pallier, Marc Brysbaert, and Ludovic Ferrand. 2004. “Lexique 2: A New French Lexical Database.” Behavior Research Methods, Instruments, & Computers 36 (3): 516–524. pdf
New, Boris, Christophe Pallier, Ludovic Ferrand, and Rafael Matos. 2001. “Une Base de Données Lexicales Du Français Contemporain Sur Internet: LEXIQUE” L’Année Psychologique 101 (3): 447–462. pdf
Boris New, Marc Brysbaert, Jean Veronis, and Christophe Pallier. 2007. “The Use of Film Subtitles to Estimate Word Frequencies.” Applied Psycholinguistics 28 (4): 661–77. https://doi.org/10.1017/S014271640707035X. (pdf)
Contributors
Boris New & Christophe Pallier
Ronald Peereman
Sophie Dufour
Christian Lachaud
and many others… (contact us to be listed)
License
BibTex Entry to cite publications about Lexique383:
@article{npbf04,
author = {New, B. and Pallier, C. and Brysbaert, M. and Ferrand, L.},
journal = {ehavior Research Methods, Instruments, & Computers},
number = {3},
pages = {516-524},
title = {Lexique 2 : A New French Lexical Database},
volume = {36},
year = {2004},
eprint = {http://www.lexique.org/?page_id=294},
}
@article{npfm01,
author = {New, B. and Pallier, C. and Ferrand, L. and Matos, R.},
journal = {L'Ann{\'e}e Pschologique},
number = {447-462},
pages = {1396-2},
title = {Une base de donn{\'e}es lexicales du fran\c{c}ais contemporain sur internet: LEXIQUE},
volume = {101},
year = {2001},
}
History
1.2.1 (2021-04-30)
Implemented Type Hinting for main module.
Added a new method to the class Lexique383. The method is Lexique383._save_errors() .
This new method checks that the value of each field in a LexItem is of the right type. If it finds errors it will record the mismatched value/type and save it in ./erros/errors.json
Expanded sample usage of the software in the docs.
Much better documentation including links to Lexique383 pages and manuals.
1.2.0 (2021-04-30)
Added a new method to the class Lexique383. The method is Lexique383.get_lex() .
This new method accepts either a single word as a string or an iterable of strings and will return the asked lexical information.
Expanded sample usage of the software in the docs.
Substantial update to the code and docs.
Removed unneeded dependencies as I reimplement some functionality myself.
1.1.1 (2021-04-28)
Added a new method to the class LexItem. The method is LexItem.to_dict() .
This new method allows the LexItem objects to be converted into dicts with key/value pairs corresponding to the LexItem.
This method allows easy display or serialization of the LexItem objects.
Lexical Items having the same orthography are stored in a list at the word’s orthography key to the LEXIQUE dict.
Expanded sample usage of the software in the docs.
Substantial update to the code and docs.
1.1.0 (2021-04-28)
Drastically reduced dependencies by ditching HDF5 and bolcs as the package is now smaller, faster an easier to build.
Lexical Items having the same orthography are stored in a list at the word’s orthography key to the LEXIQUE dict.
Implemented the “FlyWheel” pattern for light Lexical entries rsiding entirely in memory at run time.
Added sample usage of the software in the docs.
General update to the code and docs.
1.0.7 (2021-04-27)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pylexique-1.2.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6f0b075c70ccf122bdaed21f9fb29a1db0ffa6fe3245d8fe43f0a616a08e59e |
|
MD5 | db1e1e4373a8df73db93705d69bb0149 |
|
BLAKE2b-256 | c4becfd284b7c332028e8d1015e3b230ceee47688d5bd7a5f0ea6c134e9ad537 |