CLI tool for that outputs the N (N by default 100) most common n-word (n by default is 3) sequence in text.
Project description
Word Count
CLI tool for that outputs the N (N by default 100) most common n-word (n by default is 3) sequence in text, along with a count of how many times each occurred in the text.
The CLI can get the text on 'stdin' with default params:
>$ cat text_file.txt | word-count
...
Or using positional arguments:
>$ word-count --files text_file.txt
...
>$ word-count --files text_file.txt --number-of-words 4 --top 5
...
Important
- It is not case sensitive (e.g. “I love\nsandwiches.” is treated the same as "(I LOVE SANDWICHES!!)")
- When more than 1 file are passed as argument, each file is processed independently but the series of words are counted together.
How to install
Install the cli using pip
>$ pip intall words-count
...
Then, it will be available to use:
>$ words-count --help
usage: words-count [-h] [-f [FILES ...]] [-n NUMBER_OF_WORDS] [-t TOP]
CLI tool for that outputs the N (N by default 100) most common n-word (n by default is 3) sequence in text, along with a count of how many times each
occurred in the text.
optional arguments:
-h, --help show this help message and exit
-f [FILES ...], --files [FILES ...]
Files path to read
-n NUMBER_OF_WORDS, --number-of-words NUMBER_OF_WORDS
Number of words to group
-t TOP, --top TOP Max number of groups of words to output
Examples of use
Process to stdin:
>$ cat pg2009.txt | words-count
{
"of the same": 320,
"the same species": 126,
"conditions of life": 125,
"in the same": 116,
"of natural selection": 107,
"from each other": 103,
"species of the": 98,
"on the other": 89,
"the other hand": 81,
"the case of": 78,
"the theory of": 75,
...
Use arguments for adjusting the options:
words-count --files pg2009.txt --number-of-words 6 --top 5
{
"the individuals of the same species": 31,
"the species of the same genus": 19,
"we can understand how it is": 13,
"can understand how it is that": 13,
"the project gutenberg literary archive foundation": 13
}
Process multiple files:
words-count --files pg2009.txt pg2009.txt --number-of-words 6 --top 5
{
"the individuals of the same species": 62,
"the species of the same genus": 38,
"we can understand how it is": 26,
"can understand how it is that": 26,
"the project gutenberg literary archive foundation": 26
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file words-count-1.0.0.tar.gz.
File metadata
- Download URL: words-count-1.0.0.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e51aa158aa9d5d988b8ab144bf3e60b4099880aeecdf32b0d500344153b159f
|
|
| MD5 |
789ee5636eb14577216d4389fa834859
|
|
| BLAKE2b-256 |
655e593d0c6b792ed28565bb37aecd70606ac98f1a33e8e0511e40a0d44c0a1f
|
File details
Details for the file words_count-1.0.0-py3-none-any.whl.
File metadata
- Download URL: words_count-1.0.0-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7199c3224a8a737b8c23cef2d046afce401602a47457d30caf95ca0a345fbc12
|
|
| MD5 |
d9ce03c54b2ebf336b2f3ffbdd5023a1
|
|
| BLAKE2b-256 |
0d9ba0197beea844fbc192b75f74567b5989a4de1bd96e4a31f9e31be7750b52
|