Skip to main content

Measure similarity in a many-to-many fashion

Project description

Mesi

Lint and Test codecov PyPI PyPI - Downloads License


Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The output can be useful in determining which of a collection of files are the most similar to each other.

Installation

Python 3.9+ and pipx are recommended, although Python 3.6+ and/or pip will also work.

pipx install mesi

If you'd like to test out Mesi before installing it, use the remote execution feature of pipx, which will temporarily download Mesi and run it in an isolated virtual environment.

pipx run mesi --help

Usage

For a directory structure that looks like:

projects
├── project-one
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
├── project-two
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
│

where similarity should be measured between each project's deliverables/python_program.py file, run the command:

mesi projects/*/deliverables/python_program.py

A lower distance in the produced table equates to a higher degree of similarity.

See the help menu (mesi --help) for additional options and configuration.

Algorithms

There are many algorithms to choose from when comparing string similarity! Mesi implements all the algorithms provided by TextDistance. In general levenshtein is never a bad choice, which is why it is the default.

Bugs/Requests

Please use the GitHub issue tracker to submit bugs or request new features, options, or algorithms.

Dependencies

Mesi uses two primary dependencies for text similarity calculation: polyleven, and TextDistance. Polyleven is the default, as its singular implementation of Levenshtein distance can be faster in most situations. However, if a different edit distance algorithm is requested, TextDistance's implementations will be used.

License

Distributed under the terms of the GPL v3 license, mesi is free and open source software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mesi-1.0.1.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

mesi-1.0.1-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file mesi-1.0.1.tar.gz.

File metadata

  • Download URL: mesi-1.0.1.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.0a2 CPython/3.9.7 Linux/5.8.0-1042-azure

File hashes

Hashes for mesi-1.0.1.tar.gz
Algorithm Hash digest
SHA256 24b86418a95ca1a126ecff8528a3a1fe4f6ff2705a453686752c2c361421990e
MD5 2981da4cd4a3e5a18d6f1be68469661f
BLAKE2b-256 c92eb9e953f1c361e45e36f47ee187ead4cdd45c00995ed59e3a90cc6bca1efd

See more details on using hashes here.

File details

Details for the file mesi-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: mesi-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.0a2 CPython/3.9.7 Linux/5.8.0-1042-azure

File hashes

Hashes for mesi-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9cdcbd28c63b2c957f6e35fc8df906d3f40614d4f73b6a48201caa76184bf88a
MD5 10b6b02b80685b4980c92fc84962ad31
BLAKE2b-256 0954915ea236dbc91c33f2c484199c91e05ad73b2cc23f67bb39cb84fd92945d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page