CLI tool for that outputs the N (N by default 100) most common n-word (n by default is 3) sequence in text.
Project description
Word Count
CLI tool for that outputs the N (N by default 100) most common n-word (n by default is 3) sequence in text, along with a count of how many times each occurred in the text.
The CLI can get the text on 'stdin' with default params:
>$ cat text_file.txt | word-count
...
Or using positional arguments:
>$ word-count --files text_file.txt
...
>$ word-count --files text_file.txt --number-of-words 4 --top 5
...
Important
- It is not case sensitive (e.g. “I love\nsandwiches.” is treated the same as "(I LOVE SANDWICHES!!)")
- When more than 1 file are passed as argument, each file is processed independently but the series of words are counted together.
How to install
Install the cli using pip
>$ pip intall words-count
...
Then, it will be available to use:
>$ words-count --help
usage: words-count [-h] [-f [FILES ...]] [-n NUMBER_OF_WORDS] [-t TOP]
CLI tool for that outputs the N (N by default 100) most common n-word (n by default is 3) sequence in text, along with a count of how many times each
occurred in the text.
optional arguments:
-h, --help show this help message and exit
-f [FILES ...], --files [FILES ...]
Files path to read
-n NUMBER_OF_WORDS, --number-of-words NUMBER_OF_WORDS
Number of words to group
-t TOP, --top TOP Max number of groups of words to output
Examples of use
Process to stdin
:
>$ cat pg2009.txt | words-count
{
"of the same": 320,
"the same species": 126,
"conditions of life": 125,
"in the same": 116,
"of natural selection": 107,
"from each other": 103,
"species of the": 98,
"on the other": 89,
"the other hand": 81,
"the case of": 78,
"the theory of": 75,
...
Use arguments for adjusting the options:
words-count --files pg2009.txt --number-of-words 6 --top 5
{
"the individuals of the same species": 31,
"the species of the same genus": 19,
"we can understand how it is": 13,
"can understand how it is that": 13,
"the project gutenberg literary archive foundation": 13
}
Process multiple files:
words-count --files pg2009.txt pg2009.txt --number-of-words 6 --top 5
{
"the individuals of the same species": 62,
"the species of the same genus": 38,
"we can understand how it is": 26,
"can understand how it is that": 26,
"the project gutenberg literary archive foundation": 26
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
words-count-1.0.0.tar.gz
(3.7 kB
view details)
Built Distribution
File details
Details for the file words-count-1.0.0.tar.gz
.
File metadata
- Download URL: words-count-1.0.0.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e51aa158aa9d5d988b8ab144bf3e60b4099880aeecdf32b0d500344153b159f |
|
MD5 | 789ee5636eb14577216d4389fa834859 |
|
BLAKE2b-256 | 655e593d0c6b792ed28565bb37aecd70606ac98f1a33e8e0511e40a0d44c0a1f |
File details
Details for the file words_count-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: words_count-1.0.0-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7199c3224a8a737b8c23cef2d046afce401602a47457d30caf95ca0a345fbc12 |
|
MD5 | d9ce03c54b2ebf336b2f3ffbdd5023a1 |
|
BLAKE2b-256 | 0d9ba0197beea844fbc192b75f74567b5989a4de1bd96e4a31f9e31be7750b52 |