Skip to main content

Pylexique is a Python wrapper around Lexique383

Project description

Package Maintenance Status Package Maintener Build status on Windows, MacOs and Linux Pypi Python Package Index Status Anaconda Package Index Status Compatible Python versions Supported platforms Documentation Status Dependencies status Code Coverage Status Code Vulnerability Status PyPI Downloads Conda

Pylexique is a Python wrapper around Lexique383.
It allows the extraction of lexical information from more than 140 000 French words in an Object Oriented way.


Each lexical item is represented as a LexItem having the following LexEntryType:

class LexEntryType:
"""
Type information about all the lexical attributes in a LexItem object.

"""
ortho: str
phon: str
lemme: str
cgram: str
genre: str
nombre: str
freqlemfilms2: float
freqlemlivres: float
freqfilms2: float
freqlivres: float
infover: str
nbhomogr: int
nbhomoph: int
islem: bool
nblettres: int
nbphons: int
cvcv: str
p_cvcv: str
voisorth: int
voisphon: int
puorth: int
puphon: int
syll: str
nbsyll: int
cv_cv: str
orthrenv: str
phonrenv: str
orthosyll: str
cgramortho: str
deflem: float
defobs: int
old20: float
pld20: float
morphoder: str
nbmorph: int

The meanings of the attributes of this object are as follow:

  • ortho: the word

  • phon: the phonological forms of the word

  • lemme: the lemmas of this word

  • cgram: the grammatical categories of this word

  • genre: the gender

  • nombre: the number

  • freqlemfilms2: the frequency of the lemma according to the corpus of subtitles (per million occurrences)

  • freqlemlivres: the frequency of the lemma according to the body of books (per million occurrences)

  • freqfilms2: the frequency of the word according to the corpus of subtitles (per million occurrences)

  • freqlivres: the frequency of the word according to the body of books (per million occurrences)

  • infover: modes, tenses, and possible people for verbs

  • nbhomogr: number of homographs

  • nbhomoph: number of homophones

  • islem: indicates if it is a lemma or not

  • nblettres: the number of letters

  • nbphons: number of phonemes

  • cvcv: the orthographic structure

  • p-cvcv: the phonological structure

  • voisorth: number of orthographic neighbors

  • voisphon: number of phonological neighbors

  • puorth: point of spelling uniqueness

  • puphon: point of phonological uniqueness

  • syll: syllable phonological form

  • nbsyll: number of syllables

  • cv-cv: syllable phonological structure

  • orthrenv: reverse orthographic form

  • phonrenv: reversed phonological form

  • orthosyll: syllable orthographic form

  • cgramortho: the different grammatical category for a given orthographic representation

  • deflem: the percentage of people who said they knew the lemma of the word

  • defobs: the size of the sample from which ‘deflem’ is derived

  • old20: orthographic Levenshtein Distance

  • pld20: phonological Levenshtein Distance

  • morphoder: inflectional morphology

  • nbmorph: the number of morphemes directly computed from ‘morphoder’

You can find all the relevant information in the official documentation of Lexique383 (French).

Features

  • Extract all lexical information from a French word such as:
    • orthographic and phonemics representations

    • associated lemmas

    • syllabation

    • grammatical category

    • gender and number

    • frequencies in a corpus of books and in a body of film subtitles, etc…

  • Extract all the lexical forms of a French word.

  • Easy to use Api.

  • Easily integrate pylexique in your own projects as an imported library.

  • Can be used as a command line tool.

Credits

Main developer SekouDiaoNlp.

Lexical corpus: Lexique383

About Lexique383

Lexique3

Lexique 3.83 is a French lexical database that provides for ~ 140,000 words of French: orthographic and phonemics representations, associated lemmas, syllabation, grammatical category, gender and number, frequencies in a corpus of books and in a body of film subtitles, etc…


Table: Lexique383.zip

Web site: http://www.lexique.org

Online: http://www.lexique.org/shiny/lexique

Publications

  • New, Boris, Christophe Pallier, Marc Brysbaert, and Ludovic Ferrand. 2004. “Lexique 2: A New French Lexical Database.” Behavior Research Methods, Instruments, & Computers 36 (3): 516–524. DOI. pdf

  • New, Boris, Christophe Pallier, Ludovic Ferrand, and Rafael Matos. 2001. “Une Base de Données Lexicales Du Français Contemporain Sur Internet: LEXIQUE” L’Année Psychologique 101 (3): 447–462. DOI. pdf

  • Boris New, Marc Brysbaert, Jean Veronis, and Christophe Pallier. 2007. “The Use of Film Subtitles to Estimate Word Frequencies.” Applied Psycholinguistics 28 (4): 661–77. DOI. (pdf)

Contributors

  • Boris New & Christophe Pallier

  • Ronald Peereman

  • Sophie Dufour

  • Christian Lachaud

  • and many others… (contact us to be listed)

License

CC BY SA40.0

BibTex Entry to cite publications about Lexique383:

@article{npbf04,
author = {New, B. and Pallier, C. and Brysbaert, M. and Ferrand, L.},
journal = {ehavior Research Methods, Instruments, & Computers},
number = {3},
pages = {516-524},
title = {Lexique 2 : A New French Lexical Database},
volume = {36},
year = {2004},
eprint = {http://www.lexique.org/?page_id=294},
}
@article{npfm01,
author = {New, B. and Pallier, C. and Ferrand, L. and Matos, R.},
journal = {L'Ann{\'e}e Pschologique},
number = {447-462},
pages = {1396-2},
title = {Une base de donn{\'e}es lexicales du fran\c{c}ais contemporain sur internet: LEXIQUE},
volume = {101},
year = {2001},
}
@article{new_brysbaert_veronis_pallier_2007,
author={NEW, BORIS and BRYSBAERT, MARC and VERONIS, JEAN and PALLIER, CHRISTOPHE},
title={The use of film subtitles to estimate word frequencies},
volume={28}, DOI={10.1017/S014271640707035X},
number={4}, journal={Applied Psycholinguistics},
publisher={Cambridge University Press},
year={2007},
pages={661–677}}

BibTeX

If you want to cite pylexique in an academic publication use this citation format:

@article{pylexique,
  title={pylexique},
  author={Sekou Diao},
  journal={GitHub. Note: https://github.com/SekouDiaoNlp/pylexique Cited by},
  year={2021}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylexique-1.5.1.tar.gz (5.2 MB view details)

Uploaded Source

Built Distribution

pylexique-1.5.1-py3-none-any.whl (5.2 MB view details)

Uploaded Python 3

File details

Details for the file pylexique-1.5.1.tar.gz.

File metadata

  • Download URL: pylexique-1.5.1.tar.gz
  • Upload date:
  • Size: 5.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.4 Linux/5.15.0-1041-azure

File hashes

Hashes for pylexique-1.5.1.tar.gz
Algorithm Hash digest
SHA256 e2173c2b4dd444548c4f2717fee7961814d15bf9ef3f308011284adba09b781b
MD5 6bdd3360a3e9e9d607a39990868e3548
BLAKE2b-256 992b112a825210f174d1e5c47a5e8a1317ea32c89657fb43ef7824ae54249ec2

See more details on using hashes here.

File details

Details for the file pylexique-1.5.1-py3-none-any.whl.

File metadata

  • Download URL: pylexique-1.5.1-py3-none-any.whl
  • Upload date:
  • Size: 5.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.4 Linux/5.15.0-1041-azure

File hashes

Hashes for pylexique-1.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 41ae25154858101b376106d2bf24c21ff33df98623fe69e7d09256e7d4d8eebc
MD5 8cf66ade477f27be6c04b17c060e69f8
BLAKE2b-256 dc0377f94be96bac73e7e95a1751fa7b26fc288c313e9d9ce559e252e49ee227

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page