Skip to main content

Generates lemma files for vocabsieve

Project description

Overview

Gets a wikipedia dump for a language and creates a lemma table from it for use in vocabsieve. Uses spacy as the lemmatiser but also provides results from simplemma for comparison.

Project is AGPL3+ licensed as it re-uses code from gogadget.

Needs CUDA toolkit installed and an NVIDIA GPU available: https://developer.nvidia.com/cuda-toolkit-archive

On Windows, you will need to install Visual Studio first. I also needed to manually add the following to my PATH: C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\bin\Hostx64\x64

Running

Installation instructions assume the use of uv to automatically deal with package isolation but will work equally well with a pip venv (if you prefer).

No need to download any extra files. The script will automatically grab the wikipedia articles for your chosen language.

Install from Pypi

uv tool install lemma-from-wiki

Standard analysis

lemmafromwiki -l "language code" -n "number of articles to process"

Return only differences from simplemma

lemmafromwiki -l "language code" -n "number of articles to process" --diff

Getting help

lemmafromwiki --help

Getting help (short version)

lemmafromwiki

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lemma_from_wiki-0.2.0.tar.gz (62.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lemma_from_wiki-0.2.0-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file lemma_from_wiki-0.2.0.tar.gz.

File metadata

  • Download URL: lemma_from_wiki-0.2.0.tar.gz
  • Upload date:
  • Size: 62.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for lemma_from_wiki-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4d7a3fca451fc207103d4deca9d56c58dec9fe1decbfb0eb37f13b696501470e
MD5 ffbb4ff071bbb9f8a305bb45b4cd4c49
BLAKE2b-256 5181badd17016193b8b0dd4f5f1d95be24882f6570987e22e977642471bf4a46

See more details on using hashes here.

Provenance

The following attestation bundles were made for lemma_from_wiki-0.2.0.tar.gz:

Publisher: publish-to-pypi.yml on jonathanfox5/lemma_from_wiki

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lemma_from_wiki-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for lemma_from_wiki-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 540288b077e7bd65f5644f860387297d12249742a04f608c1a252d5bd6cd68b2
MD5 12c5987aaafed49df4a98e9f6b7141ed
BLAKE2b-256 3a2a1f1125559282045cd55cdbcf7b160996e0220a490d93285c0127d8b3c51c

See more details on using hashes here.

Provenance

The following attestation bundles were made for lemma_from_wiki-0.2.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on jonathanfox5/lemma_from_wiki

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page