Skip to main content

Generates lemma files for vocabsieve

Project description

Overview

Gets a wikipedia dump for a language and creates a lemma table from it for use in vocabsieve.

Project is AGPL3+ licensed as it re-uses code from gogadget.

Needs CUDA toolkit installed and an NVIDIA GPU available: https://developer.nvidia.com/cuda-toolkit-archive On Windows, you will need to install Visual Studio first. I also needed to manually add the following to my PATH: C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\bin\Hostx64\x64

Running

Assumes uv but will work equally well with a pip venv.

No need to download anything apart from this repository. The script will automatically grab the wikipedia articles for your chosen language.

Running:

git clone https://github.com/jonathanfox5/lemma_from_wiki
cd lemma_from_wiki
uv sync
uv run lemma_from_wiki -l "language code" -n "number of articles to process"

Getting help:

uv run lemma_from_wiki --help

Or just :

uv run lemma_from_wiki

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lemma_from_wiki-0.1.0.tar.gz (61.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lemma_from_wiki-0.1.0-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file lemma_from_wiki-0.1.0.tar.gz.

File metadata

  • Download URL: lemma_from_wiki-0.1.0.tar.gz
  • Upload date:
  • Size: 61.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for lemma_from_wiki-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b990d643de9c4e3c10514e508d8ee4a21bfe46c6bd3460f314e46e197f2df5c4
MD5 d756afebe9467c08d7b5beee1a2b655c
BLAKE2b-256 10bf762ae0858bb2782f6e9e9bd1aafde5f10a37035fc4e97ba95dbe301c24c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for lemma_from_wiki-0.1.0.tar.gz:

Publisher: publish-to-pypi.yml on jonathanfox5/lemma_from_wiki

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lemma_from_wiki-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for lemma_from_wiki-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b5e977f5203bc714de20b38f5d4a70f31b6704e160ed91e358806113521bbc33
MD5 7fe871d2be7872ccfe870ac35c617be8
BLAKE2b-256 2f5e25cb8dca38fb7142adf0b81733a72f03ffe5beb6bad46ac6782333875fce

See more details on using hashes here.

Provenance

The following attestation bundles were made for lemma_from_wiki-0.1.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on jonathanfox5/lemma_from_wiki

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page