Skip to main content

Compares text in a file to reference/glossary/key-items/dictionary file.

Project description


Pylint Makefile CI Python Version License

┬┌─┌─┐┬ ┬┌─┐┌─┐┬    ┌─┐┌┬┐┌─┐┬─┐
├┴┐├┤ └┬┘│   ││    ├─┤   │├┬┘
┴ ┴└─┘  └─┘└─┘┴─┘┴─┘┴   └─┘┴└─

Compares text in a file to reference/glossary/key-items/dictionary.

🧱 Built by David Rush fueled by ☕️ ℹ️ info

https://pypi.org/project/keycollator/0.0.3/


🗂️ Structure

.
│
├── assets
│   └── images
│       └── coverage.svg
│
├── docs
│   ├── cli.md
│   └── index.md
│
├── src
│   ├── __init__.py
│   ├── cli.py
│   ├── keycollator.py
│   ├── test_keycollator.py
│   ├── extractonator.py
│   ├── requirements.txt
│   └──data
│       ├── (placeholder)       └── (placeholder)
│
├── tests
│   └── test_keycollator
│       ├── __init__.py
│       └── test_keycollator.py
│
├── COD_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── make-venv.sh
├── Makefile
├── pyproject.toml
├── README.README
├── README.rst
├── setup.cfg
└── setup.py

🚀 Features

  • Extract text from file to dictionary
  • Extract keys from file to dictionary
  • Find matches of keys in text file
  • Apply fuzzy matching

🧰 Installation

🖥️ Install from Pypi using pip3

📦 https://pypi.org/project/keycollator/

pip3 install keycollator

📄 Documentation

Official documentation can be found here:

https://github.com/davidprush/keycollator/tree/main/docs

💪 Supported File Formats

  • TXT/CSV files (Mac/Linux/Win)
  • Plans to add PDF and JSON

📐 Usage

🖥️ Import keycollator it into Python Projects

from keycollator import ZTimer, KeyKrawler

🖥️ CLI

keycollator uses the CLI to change default parameters and functions

python3 src/keycollator.py --help                         
Usage: keycollator.py [OPTIONS] COMMAND [ARGS]...

  keycollator is an app that finds occurances of keys in a text file

Options:
  -t, --text-file PATH            Path/file name of the text to be searched
                                  for against items in the key file
  -k, --key-file PATH             Path/file name of the key file containing a
                                  dictionary, key items, glossary, or
                                  reference list used to search the text file
  -O, --output-file PATH          Path/file name of the output file that
                                  will contain the results (CSV or TXT)
  -R, --limit-results INTEGER     Limit the number of results
  -f, --fuzzy-matching INTEGER RANGE
                                  Set the level of fuzzy matching (default=99)
                                  to validate matches using
                                  approximations/edit distances, uses
                                  acceptance ratios with integer values from 0
                                  to 99, where 99 is nearly identical and 0 is
                                  not similar  [0<=x<=99]
  -U, --ubound-limit INTEGER RANGE
                                  Ignores items from the results with matches
                                  greater than the upper boundary (upper-
                                  limit); reduce eroneous matches
                                  [1<=x<=99999]
  -L, --lbound-limit INTEGER RANGE
                                  Ignores items from the results with matches
                                  less than the lower boundary (lower-limit);
                                  reduce eroneous matches  [0<=x<=99999]
  -v, --set-verbose               Turn on verbose
  -l, --set-logging               Turn on logging
  -Z, --log-file PATH             Path/file name to be used for the log file
  --help                          Show this message and exit.

🖥️ Turn on verbose output

currently provides only one level for verbose, future versions will implement multiple levels (DEBUG, INFO, WARN, etc.)

keycollator --verbose

🖥️ Apply fuzzy matching

fuzzy matching uses approximate matches (edit distances) whereby 0 is the least strict and accepts nearly anything as a match and more strictly 99 accepts only nearly identical matches; by default the app uses level 99 only if regular matching finds no matches

keycollator --fuzzy-matching=[0-99]

🖥️ Set the key file

each line of text represents a key which will be used to match with items in the text file

keycollator --key-file="/path/to/key/file/keys.txt"

🖥️ Set the text file

text file whereby each line represents an item that will be compared with the items in the keys file

keycollator --text-file="/path/to/key/file/text.txt"

🖥️ Specify the output file

currently uses CSV but will add additional file formats in future releases (PDF/JSON/DOCX)

keycollator --output-file="/path/to/results/result.csv"

🖥️ Set limit results for console and output file

Limit the number of results

keycollator --limit-results=30

🖥️ Set upper bound limit

rejects items with matches over the integer value set, helps with eroneous matches when using fuzzy matching

keycollator --ubound-limit

🖥️ Turn on logging:

turn on logging whereby if no log file is supplied by user it will create one using the default log.log

keycollator --set-logging

🖥️ Create a log file

set the name of the log file to be used by logging

keycollator --log-file="/path/to/log/file/log.log"

Example Output

python3 src/keycollator.py --set-logging --limit-results=30 Extracted text.txt items.[[0.16]seconds] Extracted keys.txt items.[[0.25]seconds] Matched keys.txt items to text.txt items.[[76.45]seconds] results.csv Complete.[[76.52]seconds]
╭─────┬───────────────┬───────╮
│ No.  Key            Count │
├─────┼───────────────┼───────┤
│  1   manage          73   │
├─────┼───────────────┼───────┤
│  2   develop         62   │
├─────┼───────────────┼───────┤
│  3   report          58   │
├─────┼───────────────┼───────┤
│  4   support         46   │
├─────┼───────────────┼───────┤
│  5   process         43   │
├─────┼───────────────┼───────┤
│  6   analysis        36   │
├─────┼───────────────┼───────┤
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
├─────┼───────────────┼───────┤
│ 28   dashboards      11   │
├─────┼───────────────┼───────┤
│ 29   sales           10   │
├─────┼───────────────┼───────┤
│ 30   create          10   │
╰─────┴───────────────┴───────╯
╭─────────────┬────────╮
│ Statistic    Total  │
├─────────────┼────────┤
│ Keys          701   │
├─────────────┼────────┤
│ Text          695   │
├─────────────┼────────┤
│ Matches       1207  │
├─────────────┼────────┤
│ Comparisons  376855 │
├─────────────┼────────┤
│ Logs           0    │
├─────────────┼────────┤
│ Runtime      76.60  │
╰─────────────┴────────╯

🎯 Todo 📌

     Update requirements.txt
     Add proper error handling
     Add CHANGELOG.md
     Update requirements.txt
     Add functions/methods to handle STOP_WORDS
     Verify python3 -m nltk.downloader punkt is properly immported
     Separating project into multiple files
     Add progress inicator using halo when extracting and comparing
     Create a logger class (for some reason logging is broken)
     KeyKrawler matching is broken
     Update README.md(.rst) with correct CLI
     Create method to KeyKrawler to select and _create missing files_
     Update CODE_OF_CONDUCT.md
     Update CONTRIBUTING.md
     Format KeyCrawler console results as a table
     Create ZLog class in extractonator.py (parse out __logit method)
     Cleanup verbose output (conflicts with halo)
     Update all comments
     Migrate click functionality to cli.py
     Refactor all methods and functions
     Test ALL CLI options

👔 Project Resource Acknowledgements

  1. Creating a Python Package
  2. javiertejero

💼 Deployment Features

📈 Releases

Currently stage: testing

🛡 License

License

This project is licensed under the terms of the MIT license. See LICENSE for more details.

@misc{keycollator,
  author = {David Rush},
  title = {Compares text in a file to reference/glossary/key-items/dictionary file.},
  year = {2022},
  publisher = {Rush Solutions, LLC},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/davidprush/keycollator}}
}

Additional Information

  1. The latest version of this document can be found here; if you are viewing it there (via HTTPS), you can download the Markdown/reStructuredText source here.
  2. You can contact the author via e-mail.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keycollator-0.0.4.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

keycollator-0.0.4-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file keycollator-0.0.4.tar.gz.

File metadata

  • Download URL: keycollator-0.0.4.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for keycollator-0.0.4.tar.gz
Algorithm Hash digest
SHA256 46e199aad3c89fffb5f9e218556e82cc29822f3afe853dde2ebe7bf8d12dc28b
MD5 e8221e21d6a8ee9b129b037c6c57b3ac
BLAKE2b-256 3a72e16e9086b5ab326d3b1b4ffcd25ec48b4840c19a35eb96309b56efa8ac0c

See more details on using hashes here.

File details

Details for the file keycollator-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: keycollator-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for keycollator-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 28455d06b8996bf88ca2ba83b71e7c6cc63143b3e7b3a2632b74c71cc4529cb2
MD5 3fa27e7039e39c189b4e66628cecd8b9
BLAKE2b-256 b93d8e62584fc744392490ef2f457cdbfe2be3880f9265710f0dce1244482fe8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page