Skip to main content

Pylexique is a Python wrapper around Lexique83

Project description

pylexique

Package Maintenance Status Package Maintener Build status on Windows, MacOs and Linux Pypi Python Package Index Status Anaconda Package Index Status Compatible Python versions Supported platforms Documentation Status Dependencies status Code Coverage Status Code Vulnerability Status PyPI Downloads Conda

Pylexique is a Python wrapper around Lexique383.
It allows to extract lexical information from more than 140 000 French words in an Object Oriented way.
Each lexical item is represented in a LexItem having the following LexEntryType:

class LexEntryTypes:
"""
Type information about all the lexical attributes in a LexItem object.

"""
ortho: str
phon: str
lemme: str
cgram: str
genre: str
nombre: str
freqlemfilms2: float
freqlemlivres: float
freqfilms2: float
freqlivres: float
infover: str
nbhomogr: int
nbhomoph: int
islem: bool
nblettres: int
nbphons: int
cvcv: str
p_cvcv: str
voisorth: int
voisphon: int
puorth: int
puphon: int
syll: str
nbsyll: int
cv_cv: str
orthrenv: str
phonrenv: str
orthosyll: str
cgramortho: str
deflem: float
defobs: int
old20: float
pld20: float
morphoder: str
nbmorph: int

The meanings of the attributes of this object are as follow:

  • ortho: the word

  • phon: the phonological forms of the word

  • lemme: the lemmas of this word

  • cgram: the grammatical categories of this word

  • genre: the gender

  • nombre: the number

  • freqlemfilms: the frequency of the lemma according to the corpus of subtitles (per million occurrences)

  • freqlemlivres: the frequency of the lemma according to the body of books (per million occurrences)

  • freqfilms: the frequency of the word according to the corpus of subtitles (per million occurrences)

  • freqbooks: the frequency of the word according to the body of books (per million occurrences)

  • infover: modes, tenses, and possible people for verbs

  • nbhomogr: number of homographs

  • nbhomoph: number of homophones

  • islem: indicates if it is a lemma or not

  • nbletters: the number of letters

  • nbphons: number of phonemes

  • cvcv: the orthographic structure

  • p-cvcv: the phonological structure

  • voisorth: number of orthographic neighbors

  • voisphon: number of phonological neighbors

  • puorth: point of spelling uniqueness

  • puphon: point of phonological uniqueness

  • syll: syllable phonological form

  • nbsyll: number of syllables

  • cv-cv: syllable phonological structure

  • orthrenv: reverse orthographic form

  • phonrenv: reversed phonological form

  • orthosyll: syllable orthographic form

You can find all the relevant information in the official documentation of Lexique383

Features

  • Extract all lexical information from a French word.

  • Extract all the lexical forms of a French word.

  • Easy to use Api.

  • Easily integrate pylexique in your own projects as an imported library.

  • Can be used as a command line tool.

Credits

Main developer SekouDiaoNlp.

Lexical corpus: Lexique383

About Lexique383

Lexique3

Lexique 3.83 is a French lexical database that provides for ~ 140,000 words of French: orthographic and phonemics representations, associated lemmas, syllabation, grammatical category, gender and number, frequencies in a corpus of books and in a body of film subtitles, etc…


Lexique 3.83 est une base de données lexicales du français qui fournit pour ~140000 mots du français: les représentations orthographiques et phonémiques, les lemmes associés, la syllabation, la catégorie grammaticale, le genre et le nombre, les fréquences dans un corpus de livres et dans un corpus de sous-titres de films, etc…

Table: Lexique383.zip

Web site: http://www.lexique.org

Online: http://www.lexique.org/shiny/lexique

Publications

  • New, Boris, Christophe Pallier, Marc Brysbaert, and Ludovic Ferrand. 2004. “Lexique 2: A New French Lexical Database.” Behavior Research Methods, Instruments, & Computers 36 (3): 516–524. DOI. pdf

  • New, Boris, Christophe Pallier, Ludovic Ferrand, and Rafael Matos. 2001. “Une Base de Données Lexicales Du Français Contemporain Sur Internet: LEXIQUE” L’Année Psychologique 101 (3): 447–462. DOI. pdf

  • Boris New, Marc Brysbaert, Jean Veronis, and Christophe Pallier. 2007. “The Use of Film Subtitles to Estimate Word Frequencies.” Applied Psycholinguistics 28 (4): 661–77. DOI. (pdf)

Contributors

  • Boris New & Christophe Pallier

  • Ronald Peereman

  • Sophie Dufour

  • Christian Lachaud

  • and many others… (contact us to be listed)

License

CC BY SA40.0

BibTex Entry to cite publications about Lexique383:

@article{npbf04,
author = {New, B. and Pallier, C. and Brysbaert, M. and Ferrand, L.},
journal = {ehavior Research Methods, Instruments, & Computers},
number = {3},
pages = {516-524},
title = {Lexique 2 : A New French Lexical Database},
volume = {36},
year = {2004},
eprint = {http://www.lexique.org/?page_id=294},
}
@article{npfm01,
author = {New, B. and Pallier, C. and Ferrand, L. and Matos, R.},
journal = {L'Ann{\'e}e Pschologique},
number = {447-462},
pages = {1396-2},
title = {Une base de donn{\'e}es lexicales du fran\c{c}ais contemporain sur internet: LEXIQUE},
volume = {101},
year = {2001},
}
@article{new_brysbaert_veronis_pallier_2007,
author={NEW, BORIS and BRYSBAERT, MARC and VERONIS, JEAN and PALLIER, CHRISTOPHE},
title={The use of film subtitles to estimate word frequencies},
volume={28}, DOI={10.1017/S014271640707035X},
number={4}, journal={Applied Psycholinguistics},
publisher={Cambridge University Press},
year={2007},
pages={661–677}}

History

1.3.2 (2021-05-14)

  • Can now use both ‘csv’ and ‘xlsb’ files.

  • Uses ‘csv’ file for storage and faster load times.

  • Updated dependencies.

1.3.1 (2021-05-12)

  • Uses pandas for now for faster resource loading.

  • Uses xlsb file for storage and faster load times

  • Updated dependencies.

1.3.0 (2021-05-11)

  • Uses pandas for now for faster resource loading.

  • In the process of integrating faster-than-csv when MacOs issues get resolved.

  • Refactored and expanded the test suite.

  • Updated dependencies.

1.2.7 (2021-05-07)

  • The new method Lexique383.get_all_forms(word) is now accessible through the cli with option ‘-a’ or ‘–all_forms’.

  • This new method returns a list of LexItems having the same root lemma.

  • Added sample commands using the new option in the docs.

  • Refactored and expanded the test suite.

  • Updated dependencies.

1.2.6 (2021-05-06)

  • allows for new style of relative imports.

  • Now all the attributes of the LexItem objects are immutable for consistency.

  • Added new method Lexique383.get_all_forms(word) to get all the lexical variations of a word.

  • This new method returns a list of LexItems having the same root lemma.

  • Expanded sample usage of the software in the docs.

  • Updated dependencies.

1.2.3 (2021-05-04)

  • Enhanced behaviour of output to stdout to not conflict with the logging strategy of users importing the library in their own projects.

  • Expanded sample usage of the software in the docs.

  • Updated dependencies.

1.2.2 (2021-05-04)

  • Enhanced Type Hinting for main module.

  • Changed the property LexItem.islem to boolean instead of a binary choice 0/1.

  • Expanded sample usage of the software in the docs.

  • Updated dependencies.

1.2.1 (2021-04-30)

  • Implemented Type Hinting for main module.

  • Added a new method to the class Lexique383. The method is Lexique383._save_errors() .

  • This new method checks that the value of each field in a LexItem is of the right type. If it finds errors it will record the mismatched value/type and save it in ./erros/errors.json

  • Expanded sample usage of the software in the docs.

  • Much better documentation including links to Lexique383 pages and manuals.

1.2.0 (2021-04-30)

  • Added a new method to the class Lexique383. The method is Lexique383.get_lex() .

  • This new method accepts either a single word as a string or an iterable of strings and will return the asked lexical information.

  • Expanded sample usage of the software in the docs.

  • Substantial update to the code and docs.

  • Removed unneeded dependencies as I reimplement some functionality myself.

1.1.1 (2021-04-28)

  • Added a new method to the class LexItem. The method is LexItem.to_dict() .

  • This new method allows the LexItem objects to be converted into dicts with key/value pairs corresponding to the LexItem.

  • This method allows easy display or serialization of the LexItem objects.

  • Lexical Items having the same orthography are stored in a list at the word’s orthography key to the LEXIQUE dict.

  • Expanded sample usage of the software in the docs.

  • Substantial update to the code and docs.

1.1.0 (2021-04-28)

  • Drastically reduced dependencies by ditching HDF5 and bolcs as the package is now smaller, faster an easier to build.

  • Lexical Items having the same orthography are stored in a list at the word’s orthography key to the LEXIQUE dict.

  • Implemented the “FlyWheel” pattern for light Lexical entries rsiding entirely in memory at run time.

  • Added sample usage of the software in the docs.

  • General update to the code and docs.

1.0.7 (2021-04-27)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylexique-1.3.2.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylexique-1.3.2-py2.py3-none-any.whl (12.5 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pylexique-1.3.2.tar.gz.

File metadata

  • Download URL: pylexique-1.3.2.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.0 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.8

File hashes

Hashes for pylexique-1.3.2.tar.gz
Algorithm Hash digest
SHA256 1288e47751a593df68538ef50b94af322e151e459ee4b54403dda0a87308f110
MD5 b141653989b597f0af64d9277a6d47d7
BLAKE2b-256 b109d00347a5bbd2dcee58795156c9ebec4fb46f32a7e79393f21f9e9250f73d

See more details on using hashes here.

File details

Details for the file pylexique-1.3.2-py2.py3-none-any.whl.

File metadata

  • Download URL: pylexique-1.3.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.0 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.8

File hashes

Hashes for pylexique-1.3.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bebe076ffd0a36a82c4862416676ad00bce982077c18af104295e1319144ac54
MD5 3d779befaa989d32e2a38d40538935e4
BLAKE2b-256 cc7c29d769dcb55ce9d4f8ca07e89800f23f75ffd0527c42056894139dd41d3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page