Skip to main content

Compares text in a file to reference/glossary/key-items/dictionary file.

Project description

Build status Python Version License

┬┌─┌─┐┬ ┬┌─┐┌─┐┬    ┌─┐┌┬┐┌─┐┬─┐
├┴┐├┤ └┬┘│   ││    ├─┤   │├┬┘
┴ ┴└─┘  └─┘└─┘┴─┘┴─┘┴   └─┘┴└─

Compares text in a file to reference/glossary/key-items/dictionary file.

🧱 Built by 'David Rush https://github.com/davidprush'

The latest version of this document can be found at https://github.com/davidprush/keycollator/blob/main/README.rst; if you are viewing it there (via HTTPS), you can download the Markdown/reStructuredText source at https://github.com/davidprush/keycollator. You can contact the author via e-mail at davidprush@gmail.com.

🗂️ Structure

.
│
├── assets
│   └── images
│       └── coverage.svg
│
├── docs
│   ├── cli.md
│   └── index.md
│
├── src
│   ├── __init__.py
│   ├── cli.py
│   ├── keycollator.py
│   ├── test_keycollator.py
│   ├── extractonator.py
│   ├── requirements.txt
│   └──data
│       ├── (placeholder)       └── (placeholder)
│
├── tests
│   └── test_keycollator
│       ├── __init__.py
│       └── test_keycollator.py
│
├── COD_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── make-venv.sh
├── Makefile
├── pyproject.toml
├── README.README
├── README.rst
├── setup.cfg
└── setup.py

🚀 Features

  • Extract text from file to dictionary
  • Extract keys from file to dictionary
  • Find matches of keys in text file
  • Apply fuzzy matching

🧰 Installation

  • Future plans to install as python package
pip install keycollator

📄 Documentation

Official documentation can be found here: https://github.com/davidprush

💪 Supported File Formats

  • TXT/CSV files (Mac/Linux/Win)
  • Plans to add PDF and JSON

📐 Usage

  • Import it into Python Projects

    from keycollator import extractionator

🖥️ CLI

keycollator uses the CLI to change default parameters and functions

usage: keycollator.py [-h] [-c CSV_FILE] [-d DICTIONARY_FILE] [-f COUNT]
                  [-i IN_FILE] [-l] [-o OUT_FILE] [-v] [--version]

App takes two files, a dictionary file and a and a text file, and counts
how many times each line in the dictionary file appears in the text file.
The app can output the results to the console and/or csv/text file.
Matching will use fuzzy matching to get desired results.

optional arguments:
-h, --help            show this help message and exit
-c CSV_FILE, --csv-file CSV_FILE
                    Change the csv output file name (defautl is results.csv)
-d DICTIONARY_FILE, --dictionary-file DICTIONARY_FILE
                    Change the dictionary file name (default is dictionary.txt)
-f COUNT, --fuzzy COUNT
                    Select a value for fuzzyness 1-99, 1 for decresed accuracy, 99 for increased
                    accuracy, default is set to 95
-i IN_FILE, --in-file IN_FILE
                    Change the input file name (default is text.txt)
-l, --logging         Set flag to True for logging.
-o OUT_FILE, --out-file OUT_FILE
                    Change the output file name (default is results.txt)
-v, --verbose         Verbosity (-v, -vv, etc)
--version             show version number and exit

🖥️ Applying fuzzy matching

keycollator -f=[1-99]

🖥️ Setting the dictionary file (simple text file with items separated by line)

keycollator -d=/path/to/dictionary/directory/

🖥️ Create a log file

keycollator -l=/path/to/log_file/directory/

🖥️ Specify the CSV results file

keycollator -c=/path/to/results/file.csv

🖥️ Turn on verbose output

keycollator -v

🖥️ Turn on logging:

keycollator -l

🎯 Todo:

❌ Currently refactoring all code
✅ Separating project into multiple files
✅ Add progress bars when extracting and comparing
📌Create a logger class (for some reason logging is broken)
❌ Fix matching method in KeyKrawler
❌ Update `README.md(.rst)` with correct CLI
❌ Add method to KeyKrawler to select and create missing files
❌ Update `CODE_OF_CONDUCT.md`
❌ Update `CONTRIBUTING.md`
❌ Format KeyCrawler results as a table

👔 Project Resource Acknowledgements

- https://betterscientificsoftware.github.io/python-for-hpc/tutorials/python-pypi-packaging/#creating-a-python-package

- https://gist.github.com/javiertejero/4585196

💼 Deployment Features

📈 Releases

🛡 License

License

This project is licensed under the terms of the MIT license. See LICENSE for more details.

@misc{keycollator,
  author = {keycollator},
  title = {Compares text in a file to reference/glossary/key-items/dictionary file.},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/davidprush/keycollator}}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keycollator-0.0.1.tar.gz (10.2 kB view details)

Uploaded Source

File details

Details for the file keycollator-0.0.1.tar.gz.

File metadata

  • Download URL: keycollator-0.0.1.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for keycollator-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f12035fb425cf95f8bb818950b19ec91a2e7cfeccf96ee0fb42c5ed216ff7f25
MD5 7f2d9b9a509953a09199e24ccbbe7f72
BLAKE2b-256 62bf6029a33b5c681a8f24371a17c20db3bdf449c3e722921f1ef2d0c6ba9b6d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page