Skip to main content

Sparv plugin for using wsd-rs with Sparv.

Project description

sparv-sbx-wsd-rs

PyPI version PyPI license PyPI - Python Version

Maturity badge - level 2 Stage

CI(check) CI(release) CI(scheduled) CI(test)

This plugin to sparv is a rewrite of the the internal module wsd, that use saldowsd-rs instead of saldowsd.jar.

The improvements to the internal module wsd are:

  • Easier setup, the binary saldowsd is installed as a Python package from PyPI, instead of manual install of saldowsd.jar.
  • Faster analysis, saldowsd-rs is about 13% faster than saldowsd.jar.
  • Uses less memory, saldowsd-rs uses about 35% less memory than saldowsd.jar.

Faster than using saldowsd.jar

The running time for sbx_wsd_rs that uses saldowsd-rs is 12.8% faster than using Java version. See results from running both annotations on vivill on our server wombat.

[username@wombat vivill]$ sparv run --stats

  Task                                                           Time taken   Percentage
  wsd:annotate                                                      0:11:25         1.3%
  sbx_wsd_rs:annotate                                               0:09:57         1.1%

Memory usage

Loading models and running a simple example (not using Sparv for this). Rust version uses 35% less memory. Measured with heaptrack.

Tool Top-RSS
saldowsd (Rust) 914 Mb
saldowsd.jar (Java) 1.4 Gb

An example of the output from Sparv can be seen here.

The annotations are probalistic, so they always differ a bit (wsd.sense differs with itself for different runs).

Example of differences:

  • anslag: |anslag..1:0.993|anslag..2:0.004|anslag..3:0.004| != |anslag..1:0.992|anslag..3:0.004|anslag..2:0.004|
  • avvikelse: |avvikelse..1:0.978|avvikelse..2:0.022| != |avvikelse..1:0.977|avvikelse..2:0.023|
  • utskottets: |utskott..2:0.835|utskott..3:0.109|utskott..1:0.056| != |utskott..2:0.843|utskott..3:0.101|utskott..1:0.056|
  • särskilda: |särskilja..1:0.587|särskild..1:0.413| != |särskilja..1:0.589|särskild..1:0.411|

Changelog

This project keeps a changelog.

Minimum supported Python version

This library tries to support as many Python versions as possible. When a Python version is added or dropped, this library's minor version is bumped.

  • v0.1.0: Python 3.11

License

This repository is licensed under the MIT license.

Development

Development prerequisites

For starting to develop on this repository:

  • Clone the repo git clone https://github.com/spraakbanken/sparv-sbx-wsd-rs.git
  • Setup environment: make dev
  • Install pre-commit hooks: pre-commit install

Do your work.

Tasks to do:

  • Test the code with make test or make test-w-coverage.
  • Test the examples with make test-example-small.
  • Lint the code with make lint.
  • Check formatting with make check-fmt.
  • Format the code with make fmt.
  • Type-check the code with make type-check.

This repo uses conventional commits.

Release a new version

  • Prepare the CHANGELOG: make prepare-release and then edit CHANGELOG.md.
  • Add to git: git add CHANGELOG.md
  • Commit with git commit -m 'chore(release): prepare release' or cog commit chore 'prepare release' release.
  • Bump version (depends on `bump-my-version)
    • install with uv tool install bump-my-version
    • Major: make bumpversion part=major
    • Minor: make bumpversion part=minor
    • Patch: make bumpversion part=patch or make bumpversion
  • Push main and tags to GitHub: git push main --tags or make publish
    • GitHub Actions will build, test and publish the package to PyPi.
  • Add metadata for Språkbanken's resource

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparv_sbx_wsd_rs-0.1.1.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparv_sbx_wsd_rs-0.1.1-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file sparv_sbx_wsd_rs-0.1.1.tar.gz.

File metadata

  • Download URL: sparv_sbx_wsd_rs-0.1.1.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sparv_sbx_wsd_rs-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f23487d25a6bac3242910c8e4443ea957216b5e30a7f947b5a40fe3ff2fec4c5
MD5 8c15f723fbdf90c19415d7164e115935
BLAKE2b-256 b2ca203bf316f928658b31f9f71dc416621ba096c2ac2a26d7fa84f2057008ec

See more details on using hashes here.

File details

Details for the file sparv_sbx_wsd_rs-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sparv_sbx_wsd_rs-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sparv_sbx_wsd_rs-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6276762392f559478a0ae186d98037de98cad141d8a3028e997440457c22bacc
MD5 b0fa0fc0fdccddb17a18119193573a83
BLAKE2b-256 afe61998738c76c7b45c8abe2864f1b20a9e3d0441eeb2acfe1162906804b7c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page