Skip to main content

Draw dendrogram of similarity between text files

Project description

Tests

dendro_text

Draw dendrogram of similarity between text files.

Similarity is measured in terms of Damerau-Levenshtein edit distance. Distance of given two texts is count of inserted, deleted, and moved characters required to modify one text to the other (smaller means more similar).

Features:

  • Parallel execution: Supports execution on multiple CPU cores.

  • Options in tokenization: By default, the text is compared with a sequence of words extracted by splitting input text into different character types. Optionally, you can compare texts line by line, character by character, or token by token as extracted with lexical analyzers of programming languages.

  • File-centric search: A function to list files in order of similarity to a given file.

  • Diff (Experimental): Diff functionality to show textual differences between files. (This function is provided to check for differences in similarity calculations depending on tokenization.)

Installation

pip install dendro-text

If you run the command dendro_text and get the following error message, please install dendro-text with docopt-ng.

$ dendro_text
Error: the Docopt module has not installed. Install it with `pip install docopt-ng`.
pip install dendro-text[docopt-ng]

(To make dendro-text compatible with both docopt and docopt-ng, dependencies on them are now explicitly extra dependencies.)

To uninstall,

pip uninstall dendro-text

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dendro_text-1.5.1.tar.gz (20.3 kB view details)

Uploaded Source

File details

Details for the file dendro_text-1.5.1.tar.gz.

File metadata

  • Download URL: dendro_text-1.5.1.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for dendro_text-1.5.1.tar.gz
Algorithm Hash digest
SHA256 9368bf5824bfe209f9813a30c6a6ea275f1464a7ee28a91dec4b81df58a04189
MD5 bc5b40a51db60666aee430e5d6f5f816
BLAKE2b-256 013634a823d4812720e16a5d8f97da0335d9a08f478a7315b3ce37f3ab9af38f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page