Skip to main content

Draw dendrogram of similarity between text files

Project description

Tests

dendro_text

Draw dendrogram of similarity between text files.

Similarity is measured in terms of Damerau-Levenshtein edit distance. Distance of given two texts is count of inserted, deleted, and moved characters required to modify one text to the other (smaller means more similar).

Features:

  • Parallel execution: Supports execution on multiple CPU cores.

  • Options in tokenization: By default, the text is compared with a sequence of words extracted by splitting inputtext into different character types. Optionally, you can compare texts line by line, character by character, or token by token as extracted with lexical analyzers of programming languages.

  • File-centric search: A function to list files in order of similarity to a given file.

Please refer to the home page on the github for usage.

Installation

To make dendro_text compatible with both docopt and docopt-ng, dependencies on them are now explicitly extra dependencies.

If you know either docopt or docopt-ng is already installed on your system, just try the following:

pip install dendro_text

If you are unsure docopt or docopt-ng is installed on your system, try the following:

pip install dendro_text[docopt-ng]

To uninstall,

pip uninstall dendro_text

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dendro_text-1.4.2.tar.gz (18.6 kB view details)

Uploaded Source

File details

Details for the file dendro_text-1.4.2.tar.gz.

File metadata

  • Download URL: dendro_text-1.4.2.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for dendro_text-1.4.2.tar.gz
Algorithm Hash digest
SHA256 489ca5daf46cc8179a393121c1ed0eea512b4883ae48f64f27ce77029868d41c
MD5 9c0682fef6f261e46d243faa072bbb27
BLAKE2b-256 6b7df1c3fd4f1141de79723737187ca1668365ee019435c35c7782ea30470e76

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page