Skip to main content

A Python tool that counts specific selected words in a directory of files of different file formats. Outputs the results in a .excel file

Project description

Selected Words Counter

Selected Words Counter is a Python tool that scans various file formats within a specified directory for occurrences of specific words from a predefined list, outputting the results in an organized Excel file.

How It Works

  1. File Conversion: The tool automatically converts supported file formats to .txt for streamlined text searching.
  2. Word Count Analysis: It then searches each converted file for the words in your word list, creating an Excel report where each column represents a word and each row represents a file, displaying the count of occurrences.

Getting Started

Prerequisites

  • Python 3.x
  • Necessary packages (see requirements.txt)

Installation

Clone this repository and install the dependencies:

git clone https://github.com/Provincie-Zuid-Holland/selected_words_counter.git
cd selected-words-counter
pip install -r requirements.txt
pip install -e .

If you are on Windows ensure that you have either MS-Office or LibreOffice installed in the default locations for now. So "C:\Program Files\Microsoft Office" or "C:/Program Files/LibreOffice/"

Configuration

Customize the config.py file to specify:

  • Your target directory
  • Supported file formats
  • List of words to search for

Usage

Run the tool using:

python main.py

The output Excel file will be saved to the specified location, providing a summary of word counts per file.

Unit Tests

To run unit tests on synthetic data, navigate to the ./tests folder and run pytest:

cd tests
pytest

Author

Michael de Winter

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selected_words_counter-0.7.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

selected_words_counter-0.7-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file selected_words_counter-0.7.tar.gz.

File metadata

  • Download URL: selected_words_counter-0.7.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.1

File hashes

Hashes for selected_words_counter-0.7.tar.gz
Algorithm Hash digest
SHA256 936bb0b1c8e232c63878ec3b265a58d63c9423e1210f14b85d8abbba978aa15d
MD5 374771610b277fe8aeebf12c4d9e82ff
BLAKE2b-256 df20bb60b00dd8b40aa9ad2110d688a70775812c6daa18f7310fbc690a3d984b

See more details on using hashes here.

File details

Details for the file selected_words_counter-0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for selected_words_counter-0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 c8867c67965cd015311bcb640a6de4db0b85d0a15c4e5de7c74541107b7a0a4c
MD5 ad2b284a4da8fcc893bc74664e4a9309
BLAKE2b-256 513eb3a43566115639b619a1b66f63af44e47b8516d6339c82db2b15626dea42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page