Pylexique is a Python wrapper around Lexique383
Project description
Free software: MIT license
PyLexique Documentation: https://pylexique.readthedocs.io (en) – https://sekoudiaonlp.github.io/pylexique/fr_FR/ (fr)
class LexEntryType:
"""
Type information about all the lexical attributes in a LexItem object.
"""
ortho: str
phon: str
lemme: str
cgram: str
genre: str
nombre: str
freqlemfilms2: float
freqlemlivres: float
freqfilms2: float
freqlivres: float
infover: str
nbhomogr: int
nbhomoph: int
islem: bool
nblettres: int
nbphons: int
cvcv: str
p_cvcv: str
voisorth: int
voisphon: int
puorth: int
puphon: int
syll: str
nbsyll: int
cv_cv: str
orthrenv: str
phonrenv: str
orthosyll: str
cgramortho: str
deflem: float
defobs: int
old20: float
pld20: float
morphoder: str
nbmorph: int
The meanings of the attributes of this object are as follow:
ortho: the word
phon: the phonological forms of the word
lemme: the lemmas of this word
cgram: the grammatical categories of this word
genre: the gender
nombre: the number
freqlemfilms2: the frequency of the lemma according to the corpus of subtitles (per million occurrences)
freqlemlivres: the frequency of the lemma according to the body of books (per million occurrences)
freqfilms2: the frequency of the word according to the corpus of subtitles (per million occurrences)
freqlivres: the frequency of the word according to the body of books (per million occurrences)
infover: modes, tenses, and possible people for verbs
nbhomogr: number of homographs
nbhomoph: number of homophones
islem: indicates if it is a lemma or not
nblettres: the number of letters
nbphons: number of phonemes
cvcv: the orthographic structure
p-cvcv: the phonological structure
voisorth: number of orthographic neighbors
voisphon: number of phonological neighbors
puorth: point of spelling uniqueness
puphon: point of phonological uniqueness
syll: syllable phonological form
nbsyll: number of syllables
cv-cv: syllable phonological structure
orthrenv: reverse orthographic form
phonrenv: reversed phonological form
orthosyll: syllable orthographic form
cgramortho: the different grammatical category for a given orthographic representation
deflem: the percentage of people who said they knew the lemma of the word
defobs: the size of the sample from which ‘deflem’ is derived
old20: orthographic Levenshtein Distance
pld20: phonological Levenshtein Distance
morphoder: inflectional morphology
nbmorph: the number of morphemes directly computed from ‘morphoder’
You can find all the relevant information in the official documentation of Lexique383 (French).
Features
- Extract all lexical information from a French word such as:
orthographic and phonemics representations
associated lemmas
syllabation
grammatical category
gender and number
frequencies in a corpus of books and in a body of film subtitles, etc…
Extract all the lexical forms of a French word.
Easy to use Api.
Easily integrate pylexique in your own projects as an imported library.
Can be used as a command line tool.
Credits
Main developer SekouDiaoNlp.
Lexical corpus: Lexique383
About Lexique383
Lexique3
Lexique 3.83 is a French lexical database that provides for ~ 140,000 words of French: orthographic and phonemics representations, associated lemmas, syllabation, grammatical category, gender and number, frequencies in a corpus of books and in a body of film subtitles, etc…
Table: Lexique383.zip
Web site: http://www.lexique.org
Publications
New, Boris, Christophe Pallier, Marc Brysbaert, and Ludovic Ferrand. 2004. “Lexique 2: A New French Lexical Database.” Behavior Research Methods, Instruments, & Computers 36 (3): 516–524. DOI. pdf
New, Boris, Christophe Pallier, Ludovic Ferrand, and Rafael Matos. 2001. “Une Base de Données Lexicales Du Français Contemporain Sur Internet: LEXIQUE” L’Année Psychologique 101 (3): 447–462. DOI. pdf
Boris New, Marc Brysbaert, Jean Veronis, and Christophe Pallier. 2007. “The Use of Film Subtitles to Estimate Word Frequencies.” Applied Psycholinguistics 28 (4): 661–77. DOI. (pdf)
Contributors
Boris New & Christophe Pallier
Ronald Peereman
Sophie Dufour
Christian Lachaud
and many others… (contact us to be listed)
License
BibTex Entry to cite publications about Lexique383:
@article{npbf04,
author = {New, B. and Pallier, C. and Brysbaert, M. and Ferrand, L.},
journal = {ehavior Research Methods, Instruments, & Computers},
number = {3},
pages = {516-524},
title = {Lexique 2 : A New French Lexical Database},
volume = {36},
year = {2004},
eprint = {http://www.lexique.org/?page_id=294},
}
@article{npfm01,
author = {New, B. and Pallier, C. and Ferrand, L. and Matos, R.},
journal = {L'Ann{\'e}e Pschologique},
number = {447-462},
pages = {1396-2},
title = {Une base de donn{\'e}es lexicales du fran\c{c}ais contemporain sur internet: LEXIQUE},
volume = {101},
year = {2001},
}
@article{new_brysbaert_veronis_pallier_2007,
author={NEW, BORIS and BRYSBAERT, MARC and VERONIS, JEAN and PALLIER, CHRISTOPHE},
title={The use of film subtitles to estimate word frequencies},
volume={28}, DOI={10.1017/S014271640707035X},
number={4}, journal={Applied Psycholinguistics},
publisher={Cambridge University Press},
year={2007},
pages={661–677}}
BibTeX
If you want to cite pylexique in an academic publication use this citation format:
@article{pylexique,
title={pylexique},
author={Sekou Diao},
journal={GitHub. Note: https://github.com/SekouDiaoNlp/pylexique Cited by},
year={2021}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pylexique-1.5.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41ae25154858101b376106d2bf24c21ff33df98623fe69e7d09256e7d4d8eebc |
|
MD5 | 8cf66ade477f27be6c04b17c060e69f8 |
|
BLAKE2b-256 | dc0377f94be96bac73e7e95a1751fa7b26fc288c313e9d9ce559e252e49ee227 |