A script for counting word families in a text file.
Project description
word_family_counter
A Python script for counting word families in a text file using advanced morphological analysis with spaCy.
Features
- Processes text files to count word families
- Uses spaCy for advanced linguistic analysis and lemmatization
- Handles contractions, compound words, and various text preprocessing tasks
- Supports multiple languages (depending on available spaCy models)
- Provides detailed output with word family frequencies
Installation
-
Clone the repository:
git clone https://github.com/BlueBirdBack/word_family_counter.git cd word_family_counter
-
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Download the spaCy language model:
python -m spacy download en_core_web_sm
Usage
Run the command with a text file as an argument:
word_family_counter path/to/your/text_file.txt
Optional arguments:
--verbose
: Increase output verbosity for debugging purposes--language
: Specify the spaCy model to use (default: en_core_web_sm)
Example:
word_family_counter sample.txt --verbose --language en_core_web_md
Note: Ensure that you have installed the required spaCy model before running the command. If you encounter an error about missing models, run the installation command in step 4 again.
Output
The script will display:
- Total number of words in the text
- Total number of unique word families
- A list of word families sorted by frequency (descending) and then alphabetically
License
This project is licensed under the MIT License. See the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Contact
BlueBirdBack - avery@bluebirdback.com
Project Link: https://github.com/BlueBirdBack/word_family_counter
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file word_family_counter-0.1.1.tar.gz
.
File metadata
- Download URL: word_family_counter-0.1.1.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a27260c9a9c79fdbdcae72211c9e95642236e7950cf75d1310a7a291e36e8c8 |
|
MD5 | 81ff9720daa1160c591b82e469ad130d |
|
BLAKE2b-256 | bffc47e67159c9801a568931f9444cf81b6830ef98b5b8b034ed90aea7fcdffd |
File details
Details for the file word_family_counter-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: word_family_counter-0.1.1-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a66bdd584a57a99ddc2283bd08212f7a50ade2181e8258a803a22bb9479d444 |
|
MD5 | 0962e8e8fc4a6ab1ba13941b430b1696 |
|
BLAKE2b-256 | a1950f1e0c6c6d5c96a9fa169ba7eaf1f188a8be2773a0c5f994c26ed319bb41 |