Skip to main content

A free tool for sentence similarity evaluation

Project description

semsim

Compare texts easily with semsim Python package!

Features

  • Dozens of parameters to be tuned by you for better performance!
  • Default values of all the parameters validated on datasets for paraphrase detection task
  • 6 different algorithms for efficient syntax tree comparison
  • A small pack of standard "built-in" models which can be easily downloaded via semsim package itself
  • Flexible class taxonomy which you can extend by simply inheriting from one of the model base classes
  • Python library semsim with command line interface (powered by click)

Dependencies

  • attrs
  • click
  • networkx
  • numpy
  • pymorphy2
  • scipy
  • simple_elmo
  • tensorflow
  • tensorrt
  • textract
  • torch
  • torch-geometric
  • torch-scatter
  • torch-sparse
  • torchwordemb
  • tqdm
  • ufal.udpipe

Quick start

To install semsim simply run:

pip install semsim


NOTE: If you encounter problems when installing semsim package, consider first installing some prerequisites in advance: $ pip install torch tensorflow tensorrt Then proceed to install semsim.


Now you can use semsim CLI tool as follows:

$ semsim first_src.txt second_src.txt -o output.txt

You might want to download standard "built-in" (or we should say "add-on") models for better performance. This can be done by executing the following line:

$ semsim download cbow

for fetching pretrained CBOW embeddings or

$ semsim download -a

for downloading all the add-ons at once in parallel.

More info can be found on the documentation page.

Codestyle linters and test frameworks

This library has been fully checked and tested with the following tools:

  • flake8
  • mypy
  • pydocstyle
  • pytest

Interface

CLI interface is described in the examples section of documentation. This is how you can use semsim CLI tool:

$ semsim compare first_src.txt second_src.txt -e cbow -k neural -o output.txt --max-out-pairs 200 -v

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semsim-1.1.1.tar.gz (29.4 kB view hashes)

Uploaded Source

Built Distribution

semsim-1.1.1-py3-none-any.whl (34.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page