A Python tool that counts specific selected words in a directory of files of different file formats. Outputs the results in a .excel file
Project description
Selected Words Counter
Selected Words Counter is a Python tool that scans various file formats within a specified directory for occurrences of specific words from a predefined list, outputting the results in an organized Excel file.
How It Works
- File Conversion: The tool automatically converts supported file formats to
.txtfor streamlined text searching. - Word Count Analysis: It then searches each converted file for the words in your word list, creating an Excel report where each column represents a word and each row represents a file, displaying the count of occurrences.
Getting Started
Prerequisites
- Python 3.x
- Necessary packages (see
requirements.txt)
Installation
Clone this repository and install the dependencies:
git clone https://github.com/Provincie-Zuid-Holland/selected_words_counter.git
cd selected-words-counter
pip install -r requirements.txt
pip install -e .
If you are on Windows ensure that you have either MS-Office or LibreOffice installed in the default locations for now. So "C:\Program Files\Microsoft Office" or "C:/Program Files/LibreOffice/"
Configuration
Customize the config.py file to specify:
- Your target directory
- Supported file formats
- List of words to search for
Usage
Run the tool using:
python main.py
The output Excel file will be saved to the specified location, providing a summary of word counts per file.
Unit Tests
To run unit tests on synthetic data, navigate to the ./tests folder and run pytest:
cd tests
pytest
Author
Michael de Winter
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file selected_words_counter-0.7.tar.gz.
File metadata
- Download URL: selected_words_counter-0.7.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
936bb0b1c8e232c63878ec3b265a58d63c9423e1210f14b85d8abbba978aa15d
|
|
| MD5 |
374771610b277fe8aeebf12c4d9e82ff
|
|
| BLAKE2b-256 |
df20bb60b00dd8b40aa9ad2110d688a70775812c6daa18f7310fbc690a3d984b
|
File details
Details for the file selected_words_counter-0.7-py3-none-any.whl.
File metadata
- Download URL: selected_words_counter-0.7-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8867c67965cd015311bcb640a6de4db0b85d0a15c4e5de7c74541107b7a0a4c
|
|
| MD5 |
ad2b284a4da8fcc893bc74664e4a9309
|
|
| BLAKE2b-256 |
513eb3a43566115639b619a1b66f63af44e47b8516d6339c82db2b15626dea42
|