Measure similarity in a many-to-many fashion
Project description
Mesi
Measure Similarity
Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The output can be useful in determining which of a collection of files are the most similar to each other.
Installation
Python 3.9+ and pipx are recommended, although Python 3.6+ and/or pip will also work.
pipx install mesi
If you'd like to test out Mesi before installing it, use the remote execution
feature of pipx
, which will temporarily download Mesi and run it in an
isolated virtual environment.
pipx run mesi --help
Usage
For a directory structure that looks like:
lab-one
├── StudentOne
│ ├── pyproject.toml
│ ├── deliverables
│ │ └── python_program.py
│ └── README.md
├── StudentTwo
│ ├── pyproject.toml
│ ├── deliverables
│ │ └── python_program.py
│ └── README.md
│
where similarity should be measured between each student's
deliverables/python_program.py
file, run the command:
mesi lab-one/*/deliverables/python_program.py
A lower distance in the produced table equates to a higher degree of similarity.
See the help menu (mesi --help
) for additional options and configuration.
Algorithms
There are many algorithms to choose from when comparing string similarity! Mesi
implements all the
algorithms provided by
TextDistance. In general levenshtein
is never a bad choice, which is why it is the default.
Bugs/Requests
Please use the GitHub issue tracker to submit bugs or request new features, options, or algorithms.
Dependencies
Mesi uses two primary dependencies for text similarity calculation: polyleven, and TextDistance. Polyleven is the default, as its singular implementation of Levenshtein distance can be faster in most situations. However, if a different edit distance algorithm is requested, TextDistance's implementations will be used.
License
Distributed under the terms of the GPL v3 license, mesi is free and open source software.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mesi-1.0.0.tar.gz
.
File metadata
- Download URL: mesi-1.0.0.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.0a2 CPython/3.9.7 Linux/5.8.0-1042-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47ecd00aa0f350dde4b0b8e488d5e9d994c854d0dfd7531c1f5371a0a7ae98ac |
|
MD5 | d93eb03efe1c1edf06b893d12200c0b2 |
|
BLAKE2b-256 | 247e58499c416fc2a5c15cd77ec5de3c47e5d4bc387d659d018d0dec600f16a5 |
File details
Details for the file mesi-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: mesi-1.0.0-py3-none-any.whl
- Upload date:
- Size: 20.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.0a2 CPython/3.9.7 Linux/5.8.0-1042-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a6a380d590a97d59c749eb783deaf593036c4b2edfb7560eb8b9ad0624e712a |
|
MD5 | 8079d2b70d2b5ea2f34c65e8e236a6fb |
|
BLAKE2b-256 | f97508bc9751367740f6f1ebc8bb11b8fc12673aed5a9ac348ad2a25eb5afc8f |