Skip to main content

A script for counting word families in a text file.

Project description

word_family_counter

A Python script for counting word families in a text file using advanced morphological analysis with spaCy.

Features

  • Processes text files to count word families
  • Uses spaCy for advanced linguistic analysis and lemmatization
  • Handles contractions, compound words, and various text preprocessing tasks
  • Supports multiple languages (depending on available spaCy models)
  • Provides detailed output with word family frequencies

Installation

  1. Clone the repository:

    git clone https://github.com/BlueBirdBack/word_family_counter.git
    cd word_family_counter
    
  2. Create a virtual environment (optional but recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
    
  3. Install the required dependencies:

    pip install -r requirements.txt
    
  4. Download the spaCy language model:

    python -m spacy download en_core_web_sm
    

Usage

Run the command with a text file as an argument:

word_family_counter path/to/your/text_file.txt

Optional arguments:

  • --verbose: Increase output verbosity for debugging purposes
  • --language: Specify the spaCy model to use (default: en_core_web_sm)

Example:

word_family_counter sample.txt --verbose --language en_core_web_md

Note: Ensure that you have installed the required spaCy model before running the command. If you encounter an error about missing models, run the installation command in step 4 again.

Output

The script will display:

  1. Total number of words in the text
  2. Total number of unique word families
  3. A list of word families sorted by frequency (descending) and then alphabetically

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Contact

BlueBirdBack - avery@bluebirdback.com

Project Link: https://github.com/BlueBirdBack/word_family_counter

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

word_family_counter-0.1.1.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

word_family_counter-0.1.1-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file word_family_counter-0.1.1.tar.gz.

File metadata

  • Download URL: word_family_counter-0.1.1.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for word_family_counter-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8a27260c9a9c79fdbdcae72211c9e95642236e7950cf75d1310a7a291e36e8c8
MD5 81ff9720daa1160c591b82e469ad130d
BLAKE2b-256 bffc47e67159c9801a568931f9444cf81b6830ef98b5b8b034ed90aea7fcdffd

See more details on using hashes here.

File details

Details for the file word_family_counter-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for word_family_counter-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8a66bdd584a57a99ddc2283bd08212f7a50ade2181e8258a803a22bb9479d444
MD5 0962e8e8fc4a6ab1ba13941b430b1696
BLAKE2b-256 a1950f1e0c6c6d5c96a9fa169ba7eaf1f188a8be2773a0c5f994c26ed319bb41

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page