CLI tool for that outputs the N (N by default 100) most common n-word (n by default is 3) sequence in text.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Environment
- Console
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- MacOS
- Unix
Programming Language
- Python
- Python :: 3

Project description

Word Count

CLI tool for that outputs the N (N by default 100) most common n-word (n by default is 3) sequence in text, along with a count of how many times each occurred in the text.

The CLI can get the text on 'stdin' with default params:

>$ cat text_file.txt | word-count
...

Or using positional arguments:

>$ word-count --files text_file.txt
...
>$ word-count --files text_file.txt --number-of-words 4 --top 5
...

Important

It is not case sensitive (e.g. “I love\nsandwiches.” is treated the same as "(I LOVE SANDWICHES!!)")
When more than 1 file are passed as argument, each file is processed independently but the series of words are counted together.

How to install

Install the cli using pip

>$ pip intall words-count
...

Then, it will be available to use:

>$ words-count --help
usage: words-count [-h] [-f [FILES ...]] [-n NUMBER_OF_WORDS] [-t TOP]

CLI tool for that outputs the N (N by default 100) most common n-word (n by default is 3) sequence in text, along with a count of how many times each
occurred in the text.

optional arguments:
  -h, --help            show this help message and exit
  -f [FILES ...], --files [FILES ...]
                        Files path to read
  -n NUMBER_OF_WORDS, --number-of-words NUMBER_OF_WORDS
                        Number of words to group
  -t TOP, --top TOP     Max number of groups of words to output

Examples of use

Process to stdin:

>$ cat pg2009.txt | words-count
{
    "of the same": 320,
    "the same species": 126,
    "conditions of life": 125,
    "in the same": 116,
    "of natural selection": 107,
    "from each other": 103,
    "species of the": 98,
    "on the other": 89,
    "the other hand": 81,
    "the case of": 78,
    "the theory of": 75,
...

Use arguments for adjusting the options:

words-count --files pg2009.txt --number-of-words 6 --top 5
{
    "the individuals of the same species": 31,
    "the species of the same genus": 19,
    "we can understand how it is": 13,
    "can understand how it is that": 13,
    "the project gutenberg literary archive foundation": 13
}

Process multiple files:

words-count --files pg2009.txt pg2009.txt --number-of-words 6 --top 5
{
    "the individuals of the same species": 62,
    "the species of the same genus": 38,
    "we can understand how it is": 26,
    "can understand how it is that": 26,
    "the project gutenberg literary archive foundation": 26
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Environment
- Console
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- MacOS
- Unix
Programming Language
- Python
- Python :: 3

Release history Release notifications | RSS feed

This version

1.0.0

Feb 28, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

words-count-1.0.0.tar.gz (3.7 kB view hashes)

Uploaded Feb 28, 2021 Source

Built Distribution

words_count-1.0.0-py3-none-any.whl (4.1 kB view hashes)

Uploaded Feb 28, 2021 Python 3

Hashes for words-count-1.0.0.tar.gz

Hashes for words-count-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`3e51aa158aa9d5d988b8ab144bf3e60b4099880aeecdf32b0d500344153b159f`
MD5	`789ee5636eb14577216d4389fa834859`
BLAKE2b-256	`655e593d0c6b792ed28565bb37aecd70606ac98f1a33e8e0511e40a0d44c0a1f`

Hashes for words_count-1.0.0-py3-none-any.whl

Hashes for words_count-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7199c3224a8a737b8c23cef2d046afce401602a47457d30caf95ca0a345fbc12`
MD5	`d9ce03c54b2ebf336b2f3ffbdd5023a1`
BLAKE2b-256	`0d9ba0197beea844fbc192b75f74567b5989a4de1bd96e4a31f9e31be7750b52`