Skip to main content

Measure similarity in a many-to-many fashion

Project description

Mesi

Measure Similarity


Lint and Test codecov PyPI License

Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The output can be useful in determining which of a collection of files are the most similar to each other.

Installation

Python 3.9+ and pipx are recommended, although Python 3.6+ and/or pip will also work.

pipx install mesi

If you'd like to test out Mesi before installing it, use the remote execution feature of pipx, which will temporarily download Mesi and run it in an isolated virtual environment.

pipx run mesi --help

Usage

For a directory structure that looks like:

lab-one
├── StudentOne
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
├── StudentTwo
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
│

where similarity should be measured between each student's deliverables/python_program.py file, run the command:

mesi lab-one/*/deliverables/python_program.py

A lower distance in the produced table equates to a higher degree of similarity.

See the help menu (mesi --help) for additional options and configuration.

Algorithms

There are many algorithms to choose from when comparing string similarity! Mesi implements all the algorithms provided by TextDistance. In general levenshtein is never a bad choice, which is why it is the default.

Bugs/Requests

Please use the GitHub issue tracker to submit bugs or request new features, options, or algorithms.

Dependencies

Mesi uses two primary dependencies for text similarity calculation: polyleven, and TextDistance. Polyleven is the default, as its singular implementation of Levenshtein distance can be faster in most situations. However, if a different edit distance algorithm is requested, TextDistance's implementations will be used.

License

Distributed under the terms of the GPL v3 license, mesi is free and open source software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mesi-1.0.0.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

mesi-1.0.0-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file mesi-1.0.0.tar.gz.

File metadata

  • Download URL: mesi-1.0.0.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.0a2 CPython/3.9.7 Linux/5.8.0-1042-azure

File hashes

Hashes for mesi-1.0.0.tar.gz
Algorithm Hash digest
SHA256 47ecd00aa0f350dde4b0b8e488d5e9d994c854d0dfd7531c1f5371a0a7ae98ac
MD5 d93eb03efe1c1edf06b893d12200c0b2
BLAKE2b-256 247e58499c416fc2a5c15cd77ec5de3c47e5d4bc387d659d018d0dec600f16a5

See more details on using hashes here.

File details

Details for the file mesi-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: mesi-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.0a2 CPython/3.9.7 Linux/5.8.0-1042-azure

File hashes

Hashes for mesi-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2a6a380d590a97d59c749eb783deaf593036c4b2edfb7560eb8b9ad0624e712a
MD5 8079d2b70d2b5ea2f34c65e8e236a6fb
BLAKE2b-256 f97508bc9751367740f6f1ebc8bb11b8fc12673aed5a9ac348ad2a25eb5afc8f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page