Compares text in a file to reference/glossary/key-items/dictionary file.
Project description
┬┌─┌─┐┬ ┬┌─┐┌─┐┬ ┬ ┌─┐┌┬┐┌─┐┬─┐
├┴┐├┤ └┬┘│ │ ││ │ ├─┤ │ │ │├┬┘
┴ ┴└─┘ ┴ └─┘└─┘┴─┘┴─┘┴ ┴ ┴ └─┘┴└─
Compares text in a file to reference/glossary/key-items/dictionary.[1][2]
🧱 Built by David Rush fueled by ☕️ ℹ️ info
keycollator #.#.# Pypi Project Description
👇 Table of Contents
- Structure
- Features
- Installation
- Documentation
- Supported File Formats
- Usage
- Example Output
- Todo
- Project Resource Acknowledgements
- Deployment Features
- Releases
- License
- Citation
- Additional Information
🗂️ Structure
.
│
├── assets
│ └── images
│ └── coverage.svg
│
├── docs
│ ├── cli.md
│ └── index.md
│
├── src
│ ├── __init__.py
│ ├── cli.py
│ ├── keycollator.py
│ ├── test_keycollator.py
│ ├── extractonator.py
│ ├── requirements.txt
│ └──data
│ ├── (placeholder)
│ └── (placeholder)
│
├── tests
│ └── test_keycollator
│ ├── __init__.py
│ └── test_keycollator.py
│
├── COD_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── make-venv.sh
├── Makefile
├── pyproject.toml
├── README.README
├── README.rst
├── setup.cfg
└── setup.py
🚀 Features
- Extract text from file to dictionary
- Extract keys from file to dictionary
- Find matches of keys in text file
- Apply fuzzy matching
🧰 Installation
🖥️ Install from Pypi using pip3
📦 https://pypi.org/project/keycollator/
pip3 install keycollator
📄 Documentation
Official documentation can be found here:
https://github.com/davidprush/keycollator/tree/main/docs
💪 Supported File Formats
- TXT/CSV files (Mac/Linux/Win)
- Plans to add PDF and JSON
📐 Usage
🖥️ Import keycollator into Python Projects
from keycollator.customlogger import CustomLogger as cl
from keycollator.proceduretimer import ProcedureTimer as pt
clobj = cl([message=str], [filemode='a'|'w'|'r'], [level='info'|'success'|'warning'|'error'],
[filename=str], [dtformat='locale'|'standar'|'timeonly'|'compressed'|'long'|'micro'])
**locale='%c', default='%d/%m/%Y %H:%M:%S',
timeonly='%H:%M:%S', compressed='%d%m%Y%H%M%S',
long='%A %B %d, %Y, [%I:%M:%S %p]', micro='%H:%M:%S:%f'
ptobj = pt([str])
*where str is whatever message you want saved for the timer
🖥️ Requirements
click >= 8.0.2
datetime >= 4.7
fuzzywuzzy >= 0.18.0
halo >= 0.0.31
nltk >= 3.7
pytest >= 7.1.3
python-Levenshtein >= 0.12.2
termtables >= 0.2.4
joblib >= 1.2.0
🖥️ CLI
keycollator uses the CLI
to change default parameters and functions
Usage: keycollator.py [OPTIONS] COMMAND [ARGS]...
==================================================================
keycollator is an app that finds occurances of keys in a text file
==================================================================
Options:
-t, --text-file PATH Path/file name of the text to be searched
for against items in the key file
-k, --key-file PATH Path/file name of the key file containing a
dictionary, key items, glossary, or
reference list used to search the text file
-r, --result-file PATH Path/file name of the output file that
will contain the results (CSV or TXT)
--limit-result TEXT Limit the number of results
--abreviate-result-items INTEGER
Limit the text length of the results
(default=32)
--fuzzy-match-ratio INTEGER RANGE
Set the level of fuzzy matching (default=99)
to validate matches using
approximations/edit distances, uses
acceptance ratios with integer values from 0
to 99, where 99 is nearly identical and 0 is
not similar [0<=x<=99]
--ubound-limit INTEGER RANGE Ignores items from the results with matches
greater than the upper boundary (upper-
limit); reduce eroneous matches
[1<=x<=99999]
--lbound-limit INTEGER RANGE Ignores items from the results with matches
less than the lower boundary (lower-limit);
reduce eroneous matches [0<=x<=99999]
-v, --verbose Turn on verbose
-l, --logging Turn on logging
-L, --log-file PATH Path/file name to be used for the log file
--help Show this message and exit.
🖥️ Turn on verbose output
currently provides only one level for verbose, future versions will implement multiple levels (DEBUG, INFO, WARN, etc.)
keycollator --verbose
🖥️ Apply fuzzy matching
fuzzy matching uses approximate matches (edit distances) whereby 0 is the least strict and accepts nearly anything as a match and more strictly 99 accepts only nearly identical matches; by default the app uses level 99 only if regular matching finds no matches
keycollator --fuzzy-matching=[0-99]
🖥️ Set the key file
each line of text represents a key which will be used to match with items in the text file
keycollator --key-file="/path/to/key/file/keys.txt"
🖥️ Set the text file
text file whereby each line represents an item that will be compared with the items in the keys file
keycollator --text-file="/path/to/key/file/text.txt"
🖥️ Specify the output file
currently uses CSV but will add additional file formats in future releases (PDF/JSON/DOCX)
keycollator --output-file="/path/to/results/result.csv"
🖥️ Set limit results for console and output file
Limit the number of results
keycollator --limit-results=30
🖥️ Set upper bound limit
rejects items with matches over the integer value set, helps with eroneous matches when using fuzzy matching
keycollator --ubound-limit
🖥️ Turn on logging:
turn on logging whereby if no log file is supplied by user it will create one using the default log.log
keycollator --set-logging
🖥️ Create a log file
set the name of the log file to be used by logging
keycollator --log-file="/path/to/log/file/log.log"
Example Output
python3 src/keycollator.py --set-logging --limit-results=30
✔ Extracted text.txt items.[[0.16]seconds]
✔ Extracted keys.txt items.[[0.25]seconds]
✔ Matched keys.txt items to text.txt items.[[76.45]seconds]
✔ results.csv Complete.[[76.52]seconds]
╭─────┬───────────────┬───────╮
│ No. │ Key │ Count │
├─────┼───────────────┼───────┤
│ 1 │ manage │ 73 │
├─────┼───────────────┼───────┤
│ 2 │ develop │ 62 │
├─────┼───────────────┼───────┤
│ 3 │ report │ 58 │
├─────┼───────────────┼───────┤
│ 4 │ support │ 46 │
├─────┼───────────────┼───────┤
│ 5 │ process │ 43 │
├─────┼───────────────┼───────┤
│ 6 │ analysis │ 36 │
├─────┼───────────────┼───────┤
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
├─────┼───────────────┼───────┤
│ 28 │ dashboards │ 11 │
├─────┼───────────────┼───────┤
│ 29 │ sales │ 10 │
├─────┼───────────────┼───────┤
│ 30 │ create │ 10 │
╰─────┴───────────────┴───────╯
╭─────────────┬────────╮
│ Statistic │ Total │
├─────────────┼────────┤
│ Keys │ 701 │
├─────────────┼────────┤
│ Text │ 695 │
├─────────────┼────────┤
│ Matches │ 1207 │
├─────────────┼────────┤
│ Comparisons │ 376855 │
├─────────────┼────────┤
│ Logs │ 0 │
├─────────────┼────────┤
│ Runtime │ 76.60 │
╰─────────────┴────────╯
🎯 Todo 📌
❌ Fix pylint errors
❌ Refactor code and remove redunancies
❌ Fix pylint errors
❌ Add proper error handling
❌ Add CHANGELOG.md
❌ Create method to KeyKrawler to select and _create missing files_
❌ Update CODE_OF_CONDUCT.md
❌ Update CONTRIBUTING.md
❌ Github: issue and pr templates
❌ Workflow Automation
❌ Makefile Usage
❌ Dockerfile
❌ @dependabot configuration
❌ Release Drafter (release-drafter.yml)
👔 Project Resource Acknowledgements
💼 Deployment Features
Feature | Notes |
---|---|
Github | issue and pr templates |
Workflows | Automate your workflow from idea to production |
Makefile-usage | Makefile Usage |
Dockerfile | Docker Library: Python |
@dependabot | Configuring Dependabot version updates |
Release Drafter | release-drafter.yml |
📈 Releases
Release | Version | Status |
---|---|---|
Current: | 0.0.5 | Working |
📦 Pypi Versions
Version | Notes |
---|---|
0.0.1 | Initial prototype |
0.0.2 | Bug fixes |
0.0.4 | Fixed functions/methods |
0.0.5 | Fixed functions/methods |
🛡 License
This project is licensed under the terms of the MIT license. See LICENSE for more details.
📄 Citation
@misc{keycollator,
author = {David Rush},
title = {Compares text in a file to reference/glossary/key-items/dictionary file.},
year = {2022},
publisher = {Rush Solutions, LLC},
journal = {GitHub repository},
howpublished = {\url{https://github.com/davidprush/keycollator}}
}
Additional Information
- The latest version of this document can be found here; if you are viewing it there (via HTTPS), you can download the Markdown/reStructuredText source here.
- You can contact the author via e-mail.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for keycollator-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 424ef1229e3267ae0de13b1d34db9ebf55c0a2ab23737ec77e424bd980ecb153 |
|
MD5 | 0e56699cdbbbd07e08307631f0d62db8 |
|
BLAKE2b-256 | fb317d3f181f7db29efcfc26bc55beaa8e96e78c0f76efdead086bbacb34c7cb |