Skip to main content

CLI tool for that outputs the N (N by default 100) most common n-word (n by default is 3) sequence in text.

Project description

Word Count

CLI tool for that outputs the N (N by default 100) most common n-word (n by default is 3) sequence in text, along with a count of how many times each occurred in the text.

The CLI can get the text on 'stdin' with default params:

>$ cat text_file.txt | word-count
...

Or using positional arguments:

>$ word-count --files text_file.txt
...
>$ word-count --files text_file.txt --number-of-words 4 --top 5
...

Important

  • It is not case sensitive (e.g. “I love\nsandwiches.” is treated the same as "(I LOVE SANDWICHES!!)")
  • When more than 1 file are passed as argument, each file is processed independently but the series of words are counted together.

How to install

Install the cli using pip

>$ pip intall words-count
...

Then, it will be available to use:

>$ words-count --help
usage: words-count [-h] [-f [FILES ...]] [-n NUMBER_OF_WORDS] [-t TOP]

CLI tool for that outputs the N (N by default 100) most common n-word (n by default is 3) sequence in text, along with a count of how many times each
occurred in the text.

optional arguments:
  -h, --help            show this help message and exit
  -f [FILES ...], --files [FILES ...]
                        Files path to read
  -n NUMBER_OF_WORDS, --number-of-words NUMBER_OF_WORDS
                        Number of words to group
  -t TOP, --top TOP     Max number of groups of words to output

Examples of use

Process to stdin:

>$ cat pg2009.txt | words-count
{
    "of the same": 320,
    "the same species": 126,
    "conditions of life": 125,
    "in the same": 116,
    "of natural selection": 107,
    "from each other": 103,
    "species of the": 98,
    "on the other": 89,
    "the other hand": 81,
    "the case of": 78,
    "the theory of": 75,
...

Use arguments for adjusting the options:

words-count --files pg2009.txt --number-of-words 6 --top 5
{
    "the individuals of the same species": 31,
    "the species of the same genus": 19,
    "we can understand how it is": 13,
    "can understand how it is that": 13,
    "the project gutenberg literary archive foundation": 13
}

Process multiple files:

words-count --files pg2009.txt pg2009.txt --number-of-words 6 --top 5
{
    "the individuals of the same species": 62,
    "the species of the same genus": 38,
    "we can understand how it is": 26,
    "can understand how it is that": 26,
    "the project gutenberg literary archive foundation": 26
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

words-count-1.0.0.tar.gz (3.7 kB view hashes)

Uploaded Source

Built Distribution

words_count-1.0.0-py3-none-any.whl (4.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page