Skip to main content

A Python tool that counts specific selected words in a directory of files of different file formats. Outputs the results in a .excel file

Project description

Selected Words Counter

Selected Words Counter is a Python tool that scans various file formats within a specified directory for occurrences of specific words from a predefined list, outputting the results in an organized Excel file.

How It Works

  1. File Conversion: The tool automatically converts supported file formats to .txt for streamlined text searching.
  2. Word Count Analysis: It then searches each converted file for the words in your word list, creating an Excel report where each column represents a word and each row represents a file, displaying the count of occurrences.

Getting Started

Prerequisites

  • Python 3.x
  • Necessary packages (see requirements.txt)

Installation

Clone this repository and install the dependencies:

git clone https://github.com/Provincie-Zuid-Holland/selected_words_counter.git
cd selected-words-counter
pip install -r requirements.txt
pip install -e .

Configuration

Customize the config.py file to specify:

  • Your target directory
  • Supported file formats
  • List of words to search for

Usage

Run the tool using:

python main.py

The output Excel file will be saved to the specified location, providing a summary of word counts per file.

Unit Tests

To run unit tests on synthetic data, navigate to the ./tests folder and run pytest:

cd tests
pytest

Author

Michael de Winter

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selected_words_counter-0.5.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

selected_words_counter-0.5-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file selected_words_counter-0.5.tar.gz.

File metadata

  • Download URL: selected_words_counter-0.5.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for selected_words_counter-0.5.tar.gz
Algorithm Hash digest
SHA256 fea341d9eab163b43ea101ba93759b435ad7dff6325e89d6c3cf5880b45ef9b6
MD5 0326e26afd86852c60bdc1474823c21b
BLAKE2b-256 c7ff9d8456d2523ea3eba03e592e695f21a23a824b497246174f7fe19cb8126f

See more details on using hashes here.

File details

Details for the file selected_words_counter-0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for selected_words_counter-0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 4b9e967b5f0befeda8f50e5e7870d7087421460d66345bb242dcb0d856a4dc98
MD5 9f61e997302290197618908f2c9f62fd
BLAKE2b-256 200ea99267f1fa12dd70573c55de4db469c05094d75d1ff2bdfb3beca65b482b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page