Draw dendrogram of similarity between text files
Project description
dendro_text
Draw dendrogram of similarity between text files.
Similarity is measured in terms of Damerau-Levenshtein edit distance. Distance of given two texts is count of inserted, deleted, and moved characters required to modify one text to the other (smaller means more similar).
Features:
-
Parallel execution: Supports execution on multiple CPU cores.
-
Options in tokenization: By default, the text is compared with a sequence of words extracted by splitting inputtext into different character types. Optionally, you can compare texts line by line, character by character, or token by token as extracted with lexical analyzers of programming languages.
-
File-centric search: A function to list files in order of similarity to a given file.
Please refer to the home page on the github for usage.
Installation
To make dendro_text
compatible with both docopt
and docopt-ng
, dependencies on them are now explicitly extra dependencies.
If you know either docopt
or docopt-ng
is already installed on your system, just try the following:
pip install dendro_text
If you are unsure docopt
or docopt-ng
is installed on your system, try the following:
pip install dendro_text[docopt-ng]
To uninstall,
pip uninstall dendro_text
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file dendro_text-1.4.2.tar.gz
.
File metadata
- Download URL: dendro_text-1.4.2.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 489ca5daf46cc8179a393121c1ed0eea512b4883ae48f64f27ce77029868d41c |
|
MD5 | 9c0682fef6f261e46d243faa072bbb27 |
|
BLAKE2b-256 | 6b7df1c3fd4f1141de79723737187ca1668365ee019435c35c7782ea30470e76 |