Skip to main content

Compares text in a file to reference/glossary/key-items/dictionary file.

Project description


Pylint Makefile CI Python Version License

┬┌─┌─┐┬ ┬┌─┐┌─┐┬    ┌─┐┌┬┐┌─┐┬─┐
├┴┐├┤ └┬┘│   ││    ├─┤   │├┬┘
┴ ┴└─┘  └─┘└─┘┴─┘┴─┘┴   └─┘┴└─

Compares text in a file to reference/glossary/key-items/dictionary.[1][2]

🧱 Built by David Rush fueled by ☕️ ℹ️ info

keycollator #.#.# Pypi Project Description


👇 Table of Contents

  1. Structure
  2. Features
  3. Installation
    1. Install from Pypi using pip3
  4. Documentation
  5. Supported File Formats
  6. Usage
    1. Import keycollator into Python Projects
    2. Requirements
    3. CLI
    4. Turn on verbose output
    5. Apply fuzzy matching
    6. Set the key file
    7. Set the text file
    8. Specify the output file
    9. Set limit results for console and output file
    10. Set upper bound limit
    11. Turn on logging:
    12. Create a log file
  7. Example Output
  8. Todo
  9. Project Resource Acknowledgements
  10. Deployment Features
  11. Releases
    1. Pypi Versions
  12. License
  13. Citation
  14. Additional Information

🗂️ Structure

.
│
├── assets
│   └── images
│       └── coverage.svg
│
├── docs
│   ├── cli.md
│   └── index.md
│
├── src
│   ├── __init__.py
│   ├── cli.py
│   ├── keycollator.py
│   ├── test_keycollator.py
│   ├── extractonator.py
│   ├── requirements.txt
│   └──data
│       ├── (placeholder)       └── (placeholder)
│
├── tests
│   └── test_keycollator
│       ├── __init__.py
│       └── test_keycollator.py
│
├── COD_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── make-venv.sh
├── Makefile
├── pyproject.toml
├── README.README
├── README.rst
├── setup.cfg
└── setup.py

🚀 Features

  • Extract text from file to dictionary
  • Extract keys from file to dictionary
  • Find matches of keys in text file
  • Apply fuzzy matching

🧰 Installation

🖥️ Install from Pypi using pip3

📦 https://pypi.org/project/keycollator/

pip3 install keycollator

📄 Documentation

Official documentation can be found here:

https://github.com/davidprush/keycollator/tree/main/docs

💪 Supported File Formats

  • TXT/CSV files (Mac/Linux/Win)
  • Plans to add PDF and JSON

📐 Usage

🖥️ Import keycollator into Python Projects

from keycollator.customlogger import CustomLogger as cl
from keycollator.proceduretimer import ProcedureTimer as pt

clobj = cl([message=str], [filemode='a'|'w'|'r'], [level='info'|'success'|'warning'|'error'],
        [filename=str], [dtformat='locale'|'standar'|'timeonly'|'compressed'|'long'|'micro'])
        **locale='%c', default='%d/%m/%Y %H:%M:%S',
        timeonly='%H:%M:%S', compressed='%d%m%Y%H%M%S',
        long='%A %B %d, %Y, [%I:%M:%S %p]', micro='%H:%M:%S:%f'

ptobj = pt([str])
        *where str is whatever message you want saved for the timer

🖥️ Requirements

click >= 8.0.2
datetime >= 4.7
fuzzywuzzy >= 0.18.0
halo >= 0.0.31
nltk >= 3.7
pytest >= 7.1.3
python-Levenshtein >= 0.12.2
termtables >= 0.2.4
joblib >= 1.2.0

🖥️ CLI

keycollator uses the CLI to change default parameters and functions

Usage: keycollator.py [OPTIONS] COMMAND [ARGS]...

  ==================================================================

  keycollator is an app that finds occurances of keys in a text file

  ==================================================================



Options:
  -t, --text-file PATH            Path/file name of the text to be searched
                                  for against items in the key file
  -k, --key-file PATH             Path/file name of the key file containing a
                                  dictionary, key items, glossary, or
                                  reference list used to search the text file
  -r, --result-file PATH          Path/file name of the output file that
                                  will contain the results (CSV or TXT)
  --limit-result TEXT             Limit the number of results
  --abreviate-result-items INTEGER
                                  Limit the text length of the results
                                  (default=32)
  --fuzzy-match-ratio INTEGER RANGE
                                  Set the level of fuzzy matching (default=99)
                                  to validate matches using
                                  approximations/edit distances, uses
                                  acceptance ratios with integer values from 0
                                  to 99, where 99 is nearly identical and 0 is
                                  not similar  [0<=x<=99]
  --ubound-limit INTEGER RANGE    Ignores items from the results with matches
                                  greater than the upper boundary (upper-
                                  limit); reduce eroneous matches
                                  [1<=x<=99999]
  --lbound-limit INTEGER RANGE    Ignores items from the results with matches
                                  less than the lower boundary (lower-limit);
                                  reduce eroneous matches  [0<=x<=99999]
  -v, --verbose                   Turn on verbose
  -l, --logging                   Turn on logging
  -L, --log-file PATH             Path/file name to be used for the log file
  --help                          Show this message and exit.

🖥️ Turn on verbose output

currently provides only one level for verbose, future versions will implement multiple levels (DEBUG, INFO, WARN, etc.)

keycollator --verbose

🖥️ Apply fuzzy matching

fuzzy matching uses approximate matches (edit distances) whereby 0 is the least strict and accepts nearly anything as a match and more strictly 99 accepts only nearly identical matches; by default the app uses level 99 only if regular matching finds no matches

keycollator --fuzzy-matching=[0-99]

🖥️ Set the key file

each line of text represents a key which will be used to match with items in the text file

keycollator --key-file="/path/to/key/file/keys.txt"

🖥️ Set the text file

text file whereby each line represents an item that will be compared with the items in the keys file

keycollator --text-file="/path/to/key/file/text.txt"

🖥️ Specify the output file

currently uses CSV but will add additional file formats in future releases (PDF/JSON/DOCX)

keycollator --output-file="/path/to/results/result.csv"

🖥️ Set limit results for console and output file

Limit the number of results

keycollator --limit-results=30

🖥️ Set upper bound limit

rejects items with matches over the integer value set, helps with eroneous matches when using fuzzy matching

keycollator --ubound-limit

🖥️ Turn on logging:

turn on logging whereby if no log file is supplied by user it will create one using the default log.log

keycollator --set-logging

🖥️ Create a log file

set the name of the log file to be used by logging

keycollator --log-file="/path/to/log/file/log.log"

Example Output

python3 src/keycollator.py --set-logging --limit-results=30 Extracted text.txt items.[[0.16]seconds] Extracted keys.txt items.[[0.25]seconds] Matched keys.txt items to text.txt items.[[76.45]seconds] results.csv Complete.[[76.52]seconds]
╭─────┬───────────────┬───────╮
│ No.  Key            Count │
├─────┼───────────────┼───────┤
│  1   manage          73   │
├─────┼───────────────┼───────┤
│  2   develop         62   │
├─────┼───────────────┼───────┤
│  3   report          58   │
├─────┼───────────────┼───────┤
│  4   support         46   │
├─────┼───────────────┼───────┤
│  5   process         43   │
├─────┼───────────────┼───────┤
│  6   analysis        36   │
├─────┼───────────────┼───────┤
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
├─────┼───────────────┼───────┤
│ 28   dashboards      11   │
├─────┼───────────────┼───────┤
│ 29   sales           10   │
├─────┼───────────────┼───────┤
│ 30   create          10   │
╰─────┴───────────────┴───────╯
╭─────────────┬────────╮
│ Statistic    Total  │
├─────────────┼────────┤
│ Keys          701   │
├─────────────┼────────┤
│ Text          695   │
├─────────────┼────────┤
│ Matches       1207  │
├─────────────┼────────┤
│ Comparisons  376855 │
├─────────────┼────────┤
│ Logs           0    │
├─────────────┼────────┤
│ Runtime      76.60  │
╰─────────────┴────────╯

🎯 Todo 📌

     Fix pylint errors
     Refactor code and remove redunancies
     Fix pylint errors
     Add proper error handling
     Add CHANGELOG.md
     Create method to KeyKrawler to select and _create missing files_
     Update CODE_OF_CONDUCT.md
     Update CONTRIBUTING.md
     Github: issue and pr templates
     Workflow Automation
     Makefile Usage
     Dockerfile
     @dependabot configuration
     Release Drafter (release-drafter.yml)

👔 Project Resource Acknowledgements

  1. Creating a Python Package
  2. javiertejero

💼 Deployment Features

Feature Notes
Github issue and pr templates
Workflows Automate your workflow from idea to production
Makefile-usage Makefile Usage
Dockerfile Docker Library: Python
@dependabot Configuring Dependabot version updates
Release Drafter release-drafter.yml

📈 Releases

Release Version Status
Current: 0.0.5 Working

📦 Pypi Versions

Version Notes
0.0.1 Initial prototype
0.0.2 Bug fixes
0.0.4 Fixed functions/methods
0.0.5 Fixed functions/methods

🛡 License

License

This project is licensed under the terms of the MIT license. See LICENSE for more details.

📄 Citation

@misc{keycollator,
  author = {David Rush},
  title = {Compares text in a file to reference/glossary/key-items/dictionary file.},
  year = {2022},
  publisher = {Rush Solutions, LLC},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/davidprush/keycollator}}
}

Additional Information

  1. The latest version of this document can be found here; if you are viewing it there (via HTTPS), you can download the Markdown/reStructuredText source here.
  2. You can contact the author via e-mail.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keycollator-0.0.6.tar.gz (23.5 kB view hashes)

Uploaded Source

Built Distribution

keycollator-0.0.6-py3-none-any.whl (23.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page