A simple Python (2 or 3) script to generate a PNG word-cloud image from a bunch of text files. Based on word_cloud.
Project description
A simple Python script to generate a square wordcloud from one (or more) text file(s). Supporting both Python 2 and 3 (2.7+ and 3.4+).
Based on the great word_cloud module by @amueller.
How to use it?
1. Requirements
The usual module matplotlib is needed for the plotting, docopt is needed for the command line interface, and word_cloud is needed for the actual work (generating the cloud of words after reading the files).
The required Python (2 or 3) modules can be installed with pip, either directly:
# Directly:
sudo pip install matplotlib docopt word_cloud
Or with the requirements.txt file:
sudo pip install -r requirements.txt
Note: if ansicolortags is available, it will be used to print nice colors in the help and during the generation of word clouds.
2. Installation
Clone the repository, copy the script (generate-word-cloud.py) somewhere in your PATH (e.g., ~/.local/bin/).
You can also just download the script itself:
$ wget https://raw.githubusercontent.com/Naereen/generate-word-cloud.py/master/generate-word-cloud.py
$ cp generate-word-cloud.py /path/to/a/directory/in/your/PATH/
Note: The script is also available from PyPI : pypi.python.org/pypi/generatewordcloud. You can install it using pip.
$ sudo pip install generatewordcloud
3. Usage
Help:
$ generate-word-cloud.py --help
From one or two files
Generate a wordcloud from two txt files in the current directory, save it to wordcloud_txt.png.
$ generate-word-cloud.py -o ./wordcloud_txt.png ./file1.txt ./file2.txt
Generate a wordcloud from the textfile hamlet.txt (~ 8000 lines), saving to hamlet.png:
$ generate-word-cloud.py -o ./hamlet.png ./hamlet.txt
(It should work on pretty big text files without any issue.)
Other examples
From a lot of Python scripts (~ 200)
From a lot of Bash scripts (~ 150)
From a lot of LaTeX files (~ 180)
Meta example
Generate a wordcloud from the README.md and generate-word-cloud.py files of this very project, save it to wordcloud_meta.png!
$ generate-word-cloud.py -o ./wordcloud_meta.png ./*.md ./*.py
Features
[x] Support one or more input file(s), will cleanly skip any file it fails to find or fails to read,
[x] Custom output file, won’t be overwritten (except with -f flag),
[x] Nice command line interface (argparse powered). I switched to docopt after realizing how awesome it is!
[x] Has a command line option for every important parameter (max nb of words, width, height etc).
[x] Input filenames with spaces in their name were seen as several files (e.g. this file.txt), FIXED with the switch to docopt.
Complete documentation (--help)
$ generate-word-cloud.py -h | --help Usage: generate-word-cloud.py [-s | --show] [-f | --force] [-o OUTFILE | --outfile=OUTFILE] [-t TITLE | --title=TITLE] [-m MAX | --max=MAX] [-w WIDTH | --width=WIDTH] [-H HEIGHT | --height=HEIGHT] INFILE... generate-word-cloud.py (-h | --help) generate-word-cloud.py (-v | --version) Options: -h --help Show this help message and exit. -v --version Show program's version number and exit. -s --show Show the image but do not save it [default False]. -f --force Force to write the image, even if present (default is to ask before overwriting an existing file) [default False]. -o OUTFILE --outfile=OUTFILE Filename for the generated image [default 'wordcloud.png']. -t TITLE --title=TITLE Title for the image [default None]. -m MAX --max MAX Max number of words to display on the cloud word [default 150]. -w WIDTH --width WIDTH Width of the generate image [default 400]. -H HEIGHT --height HEIGHT Height of the generate image [default 300]. INFILE A text file to read.
TODO
[x] Start it, from this example,
[x] Run it on some interesting examples, embed them here (as images),
[X] Check on weird encodings? (i.e., not UTF-8). It works fine!
[X] Test it against :closed_book: VERY large files (million of line) ? It works fine, slowly but fine.
[X] Test it against LOTS of files (several thousands) ? It works fine, slowly but fine.
[X] Publish it on PyPI: it is available at pypi.python.org/pypi/generatewordcloud/.
[ ] Write a small article about it for my blog.
Knows issues
[ ] Only tested on (X)Ubuntu (15.10), but it should work on other GNU/Linux distribution and Mac OS X (and probably Windows), if they support docopt and has both docopt and word_cloud installed.
Unknown issues?
Use the issue tracker to notify me of a bug!
About
Why write this script?
There already is a lot of good cloud word generator online, e.g. wordle.net.
I wanted a way to visualize the major keywords of Bash and Python (my two favorite programming languages) and of Markdown/Strapdown, reStructuredText and LaTeX (my favorite typeset documents system),
The original project word_cloud seemed cool. And it is. Great job @amueller !
Clouds of words are interesting! And Python is awesome!
License ?
This plug-in is published under the terms of the GPLv3 License (file LICENSE.txt), © Lilian Besson, 2016.