A free tool for sentence similarity evaluation
Project description
semsim
Compare texts easily with semsim
Python package!
Features
- Dozens of parameters to be tuned by you for better performance!
- Default values of all the parameters validated on datasets for paraphrase detection task
- 6 different algorithms for efficient syntax tree comparison
- A small pack of standard "built-in" models which can be easily downloaded via
semsim
package itself - Flexible class taxonomy which you can extend by simply inheriting from one of the model base classes
- Python library
semsim
with command line interface (powered byclick
)
Dependencies
- attrs
- click
- networkx
- numpy
- pymorphy2
- scipy
- simple_elmo
- tensorflow
- tensorrt
- textract
- torch
- torch-geometric
- torch-scatter
- torch-sparse
- torchwordemb
- tqdm
- ufal.udpipe
Quick start
To install semsim
simply run:
pip install semsim
NOTE: If you encounter problems when installing
semsim
package, consider first installing some prerequisites in advance:$ pip install torch tensorflow tensorrt
Then proceed to installsemsim
.
Now you can use semsim
CLI tool as follows:
$ semsim first_src.txt second_src.txt -o output.txt
You might want to download standard "built-in" (or we should say "add-on") models for better performance. This can be done by executing the following line:
$ semsim download cbow
for fetching pretrained CBOW embeddings or
$ semsim download -a
for downloading all the add-ons at once in parallel.
More info can be found on the documentation page.
Codestyle linters and test frameworks
This library has been fully checked and tested with the following tools:
- flake8
- mypy
- pydocstyle
- pytest
Interface
CLI interface is described in the examples
section of documentation.
This is how you can use semsim
CLI tool:
$ semsim compare first_src.txt second_src.txt -e cbow -k neural -o output.txt --max-out-pairs 200 -v
Authors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.