A simple Python (2 or 3) script to generate a PNG word-cloud image from a bunch of text files. Based on word_cloud.
How to use it?
The usual module matplotlib is needed for the plotting, docopt is needed for the command line interface, and word_cloud is needed for the actual work (generating the cloud of words after reading the files).
The required Python (2 or 3) modules can be installed with pip, either directly:
# Directly: sudo pip install matplotlib docopt word_cloud
Or with the requirements.txt file:
sudo pip install -r requirements.txt
Note: if ansicolortags is available, it will be used to print nice colors in the help and during the generation of word clouds.
Clone the repository, copy the script (generate-word-cloud.py) somewhere in your PATH (e.g., ~/.local/bin/).
You can also just download the script itself:
$ wget https://raw.githubusercontent.com/Naereen/generate-word-cloud.py/master/generate-word-cloud.py $ cp generate-word-cloud.py /path/to/a/directory/in/your/PATH/
$ sudo pip install generatewordcloud
$ generate-word-cloud.py --help
From one or two files
Generate a wordcloud from two txt files in the current directory, save it to wordcloud_txt.png.
$ generate-word-cloud.py -o ./wordcloud_txt.png ./file1.txt ./file2.txt
Generate a wordcloud from the textfile hamlet.txt (~ 8000 lines), saving to hamlet.png:
$ generate-word-cloud.py -o ./hamlet.png ./hamlet.txt
(It should work on pretty big text files without any issue.)
From a lot of Python scripts (~ 200)
From a lot of Bash scripts (~ 150)
From a lot of LaTeX files (~ 180)
- [x] Support one or more input file(s), will cleanly skip any file it fails to find or fails to read,
- [x] Custom output file, won’t be overwritten (except with -f flag),
- [x] Nice command line interface (argparse powered). I switched to docopt after realizing how awesome it is!
- [x] Has a command line option for every important parameter (max nb of words, width, height etc).
- [x] Input filenames with spaces in their name were seen as several files (e.g. this file.txt), FIXED with the switch to docopt.
Complete documentation (--help)
$ generate-word-cloud.py -h | --help Usage: generate-word-cloud.py [-s | --show] [-f | --force] [-o OUTFILE | --outfile=OUTFILE] [-t TITLE | --title=TITLE] [-m MAX | --max=MAX] [-w WIDTH | --width=WIDTH] [-H HEIGHT | --height=HEIGHT] INFILE... generate-word-cloud.py (-h | --help) generate-word-cloud.py (-v | --version) Options: -h --help Show this help message and exit. -v --version Show program's version number and exit. -s --show Show the image but do not save it [default False]. -f --force Force to write the image, even if present (default is to ask before overwriting an existing file) [default False]. -o OUTFILE --outfile=OUTFILE Filename for the generated image [default 'wordcloud.png']. -t TITLE --title=TITLE Title for the image [default None]. -m MAX --max MAX Max number of words to display on the cloud word [default 150]. -w WIDTH --width WIDTH Width of the generate image [default 400]. -H HEIGHT --height HEIGHT Height of the generate image [default 300]. INFILE A text file to read.
- [x] Start it, from this example,
- [x] Run it on some interesting examples, embed them here (as images),
- [X] Check on weird encodings? (i.e., not UTF-8). It works fine!
- [X] Test it against :closed_book: VERY large files (million of line) ? It works fine, slowly but fine.
- [X] Test it against LOTS of files (several thousands) ? It works fine, slowly but fine.
- [X] Publish it on PyPI: it is available at pypi.python.org/pypi/generatewordcloud/.
- [ ] Write a small article about it for my blog.
Use the issue tracker to notify me of a bug!
Why write this script?
- I wanted a way to visualize the major keywords of Bash and Python (my two favorite programming languages) and of Markdown/Strapdown, reStructuredText and LaTeX (my favorite typeset documents system),
- The original project word_cloud seemed cool. And it is. Great job @amueller !
- Clouds of words are interesting! And Python is awesome!