Skip to main content

A script for counting word families in a text file.

Project description

word_family_counter

A Python script for counting word families in a text file using advanced morphological analysis with spaCy.

Features

  • Processes text files to count word families
  • Uses spaCy for advanced linguistic analysis and lemmatization
  • Handles contractions, compound words, and various text preprocessing tasks
  • Supports multiple languages (depending on available spaCy models)
  • Provides detailed output with word family frequencies

Installation

  1. Clone the repository:

    git clone https://github.com/BlueBirdBack/word_family_counter.git
    cd word_family_counter
    
  2. Create a virtual environment (optional but recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
    
  3. Install the required dependencies:

    pip install -r requirements.txt
    
  4. Download the spaCy language model:

    python -m spacy download en_core_web_sm
    

Usage

Run the script with a text file as an argument:

python src/word_family_counter/main.py path/to/your/text_file.txt

Optional arguments:

  • --verbose: Increase output verbosity for debugging purposes
  • --language: Specify the spaCy model to use (default: en_core_web_sm)

Example:

python src/word_family_counter/main.py sample.txt --verbose --language en_core_web_md

Output

The script will display:

  1. Total number of words in the text
  2. Total number of unique word families
  3. A list of word families sorted by frequency (descending) and then alphabetically

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Contact

BlueBirdBack - avery@bluebirdback.com

Project Link: https://github.com/BlueBirdBack/word_family_counter

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

word_family_counter-0.1.0.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

word_family_counter-0.1.0-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file word_family_counter-0.1.0.tar.gz.

File metadata

  • Download URL: word_family_counter-0.1.0.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for word_family_counter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5cf38cae80dd38d8530f326691bd1ca6d585fae941441fff92b0ebd0bffe3c71
MD5 88a4b67722e4457d6a80bc57b9765aaa
BLAKE2b-256 debfc638e956a22ce33734dd72433a82f65985bc1578f90b48c000298f7adf00

See more details on using hashes here.

File details

Details for the file word_family_counter-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for word_family_counter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1c4105237cdf6ad4aa6ce2aa6c073c68043ee508e7c7f9ed997ddb1477d4ebc8
MD5 3ec3a9f80ae01279de4b83bcc4f28a2d
BLAKE2b-256 fa731a9e6b0df873c99760847a74b4dacc7d14bcd653c697573390a11855f353

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page