Compares text in a file to reference/glossary/key-items/dictionary file.
Project description
┬┌─┌─┐┬ ┬┌─┐┌─┐┬ ┬ ┌─┐┌┬┐┌─┐┬─┐
├┴┐├┤ └┬┘│ │ ││ │ ├─┤ │ │ │├┬┘
┴ ┴└─┘ ┴ └─┘└─┘┴─┘┴─┘┴ ┴ ┴ └─┘┴└─
Compares text in a file to reference/glossary/key-items/dictionary.[1][2]
🧱 Built by David Rush fueled by ☕️ ℹ️ info
keycollator #.#.# Pypi Project Description
👇 Table of Contents
- Structure
- Features
- Installation
- Documentation
- Supported File Formats
- Usage
- Example Output
- Todo
- Project Resource Acknowledgements
- Deployment Features
- Releases
- License
- Citation
- Additional Information
🗂️ Structure
.
│
├── assets
│ └── images
│ └── coverage.svg
│
├── docs
│ ├── cli.md
│ └── index.md
│
├── src
│ ├── __init__.py
│ ├── cli.py
│ ├── keycollator.py
│ ├── test_keycollator.py
│ ├── extractonator.py
│ ├── requirements.txt
│ └──data
│ ├── (placeholder)
│ └── (placeholder)
│
├── tests
│ └── test_keycollator
│ ├── __init__.py
│ └── test_keycollator.py
│
├── COD_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── make-venv.sh
├── Makefile
├── pyproject.toml
├── README.README
├── README.rst
├── setup.cfg
└── setup.py
🚀 Features
- Extract text from file to dictionary
- Extract keys from file to dictionary
- Find matches of keys in text file
- Apply fuzzy matching
🧰 Installation
🖥️ Install from Pypi using pip3
📦 https://pypi.org/project/keycollator/
pip3 install keycollator
📄 Documentation
Official documentation can be found here:
https://github.com/davidprush/keycollator/tree/main/docs
💪 Supported File Formats
- TXT/CSV files (Mac/Linux/Win)
- Plans to add PDF and JSON
📐 Usage
🖥️ Import keycollator into Python Projects
from keycollator.customlogger import CustomLogger as cl
from keycollator.proceduretimer import ProcedureTimer as pt
clobj = cl([message=str], [filemode='a'|'w'|'r'], [level='info'|'success'|'warning'|'error'],
[filename=str], [dtformat='locale'|'standar'|'timeonly'|'compressed'|'long'|'micro'])
**locale='%c', default='%d/%m/%Y %H:%M:%S',
timeonly='%H:%M:%S', compressed='%d%m%Y%H%M%S',
long='%A %B %d, %Y, [%I:%M:%S %p]', micro='%H:%M:%S:%f'
ptobj = pt([str])
*where str is whatever message you want saved for the timer
🖥️ Requirements
click >= 8.0.2
datetime >= 4.7
fuzzywuzzy >= 0.18.0
halo >= 0.0.31
nltk >= 3.7
pytest >= 7.1.3
python-Levenshtein >= 0.12.2
termtables >= 0.2.4
joblib >= 1.2.0
🖥️ CLI
keycollator uses the CLI
to change default parameters and functions
Usage: keycollator.py [OPTIONS] COMMAND [ARGS]...
==================================================================
keycollator is an app that finds occurances of keys in a text file
==================================================================
Options:
-t, --text-file PATH Path/file name of the text to be searched
for against items in the key file
-k, --key-file PATH Path/file name of the key file containing a
dictionary, key items, glossary, or
reference list used to search the text file
-r, --result-file PATH Path/file name of the output file that
will contain the results (CSV or TXT)
--limit-result TEXT Limit the number of results
--abreviate-result-items INTEGER
Limit the text length of the results
(default=32)
--fuzzy-match-ratio INTEGER RANGE
Set the level of fuzzy matching (default=99)
to validate matches using
approximations/edit distances, uses
acceptance ratios with integer values from 0
to 99, where 99 is nearly identical and 0 is
not similar [0<=x<=99]
--ubound-limit INTEGER RANGE Ignores items from the results with matches
greater than the upper boundary (upper-
limit); reduce eroneous matches
[1<=x<=99999]
--lbound-limit INTEGER RANGE Ignores items from the results with matches
less than the lower boundary (lower-limit);
reduce eroneous matches [0<=x<=99999]
-v, --verbose Turn on verbose
-l, --logging Turn on logging
-L, --log-file PATH Path/file name to be used for the log file
--help Show this message and exit.
🖥️ Turn on verbose output
currently provides only one level for verbose, future versions will implement multiple levels (DEBUG, INFO, WARN, etc.)
keycollator --verbose
🖥️ Apply fuzzy matching
fuzzy matching uses approximate matches (edit distances) whereby 0 is the least strict and accepts nearly anything as a match and more strictly 99 accepts only nearly identical matches; by default the app uses level 99 only if regular matching finds no matches
keycollator --fuzzy-matching=[0-99]
🖥️ Set the key file
each line of text represents a key which will be used to match with items in the text file
keycollator --key-file="/path/to/key/file/keys.txt"
🖥️ Set the text file
text file whereby each line represents an item that will be compared with the items in the keys file
keycollator --text-file="/path/to/key/file/text.txt"
🖥️ Specify the output file
currently uses CSV but will add additional file formats in future releases (PDF/JSON/DOCX)
keycollator --output-file="/path/to/results/result.csv"
🖥️ Set limit results for console and output file
Limit the number of results
keycollator --limit-results=30
🖥️ Set upper bound limit
rejects items with matches over the integer value set, helps with eroneous matches when using fuzzy matching
keycollator --ubound-limit
🖥️ Turn on logging:
turn on logging whereby if no log file is supplied by user it will create one using the default log.log
keycollator --set-logging
🖥️ Create a log file
set the name of the log file to be used by logging
keycollator --log-file="/path/to/log/file/log.log"
Example Output
python3 src/keycollator.py --set-logging --limit-results=30
✔ Extracted text.txt items.[[0.16]seconds]
✔ Extracted keys.txt items.[[0.25]seconds]
✔ Matched keys.txt items to text.txt items.[[76.45]seconds]
✔ results.csv Complete.[[76.52]seconds]
╭─────┬───────────────┬───────╮
│ No. │ Key │ Count │
├─────┼───────────────┼───────┤
│ 1 │ manage │ 73 │
├─────┼───────────────┼───────┤
│ 2 │ develop │ 62 │
├─────┼───────────────┼───────┤
│ 3 │ report │ 58 │
├─────┼───────────────┼───────┤
│ 4 │ support │ 46 │
├─────┼───────────────┼───────┤
│ 5 │ process │ 43 │
├─────┼───────────────┼───────┤
│ 6 │ analysis │ 36 │
├─────┼───────────────┼───────┤
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
├─────┼───────────────┼───────┤
│ 28 │ dashboards │ 11 │
├─────┼───────────────┼───────┤
│ 29 │ sales │ 10 │
├─────┼───────────────┼───────┤
│ 30 │ create │ 10 │
╰─────┴───────────────┴───────╯
╭─────────────┬────────╮
│ Statistic │ Total │
├─────────────┼────────┤
│ Keys │ 701 │
├─────────────┼────────┤
│ Text │ 695 │
├─────────────┼────────┤
│ Matches │ 1207 │
├─────────────┼────────┤
│ Comparisons │ 376855 │
├─────────────┼────────┤
│ Logs │ 0 │
├─────────────┼────────┤
│ Runtime │ 76.60 │
╰─────────────┴────────╯
🎯 Todo 📌
❌ Fix pylint errors
❌ Refactor code and remove redunancies
❌ Fix pylint errors
❌ Add proper error handling
❌ Add CHANGELOG.md
❌ Create method to KeyKrawler to select and _create missing files_
❌ Update CODE_OF_CONDUCT.md
❌ Update CONTRIBUTING.md
❌ Github: issue and pr templates
❌ Workflow Automation
❌ Makefile Usage
❌ Dockerfile
❌ @dependabot configuration
❌ Release Drafter (release-drafter.yml)
👔 Project Resource Acknowledgements
💼 Deployment Features
Feature | Notes |
---|---|
Github | issue and pr templates |
Workflows | Automate your workflow from idea to production |
Makefile-usage | Makefile Usage |
Dockerfile | Docker Library: Python |
@dependabot | Configuring Dependabot version updates |
Release Drafter | release-drafter.yml |
📈 Releases
Release | Version | Status |
---|---|---|
Current: | 0.0.5 | Working |
📦 Pypi Versions
Version | Notes |
---|---|
0.0.1 | Initial prototype |
0.0.2 | Bug fixes |
0.0.4 | Fixed functions/methods |
0.0.5 | Fixed functions/methods |
🛡 License
This project is licensed under the terms of the MIT license. See LICENSE for more details.
📄 Citation
@misc{keycollator,
author = {David Rush},
title = {Compares text in a file to reference/glossary/key-items/dictionary file.},
year = {2022},
publisher = {Rush Solutions, LLC},
journal = {GitHub repository},
howpublished = {\url{https://github.com/davidprush/keycollator}}
}
Additional Information
- The latest version of this document can be found here; if you are viewing it there (via HTTPS), you can download the Markdown/reStructuredText source here.
- You can contact the author via e-mail.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file keycollator-0.0.6.tar.gz
.
File metadata
- Download URL: keycollator-0.0.6.tar.gz
- Upload date:
- Size: 23.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34bf566a598d3231ca296680ab0938f4bf4294ea4975b64d31e605e5b0fa43bd |
|
MD5 | cf664f8bfcb0c003431d5b1ac1adc899 |
|
BLAKE2b-256 | 1511aaaeb73ac38a6dcb550b88300b4299524bfc3a22e237dfbe13aee134e74c |
File details
Details for the file keycollator-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: keycollator-0.0.6-py3-none-any.whl
- Upload date:
- Size: 23.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 424ef1229e3267ae0de13b1d34db9ebf55c0a2ab23737ec77e424bd980ecb153 |
|
MD5 | 0e56699cdbbbd07e08307631f0d62db8 |
|
BLAKE2b-256 | fb317d3f181f7db29efcfc26bc55beaa8e96e78c0f76efdead086bbacb34c7cb |