This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

A simple Python script to generate a square wordcloud from one (or more) text file(s). Supporting both Python 2 and 3 (2.7+ and 3.4+).

Based on the great word_cloud module by @amueller.


How to use it?

1. Requirements

The usual module matplotlib is needed for the plotting, docopt is needed for the command line interface, and word_cloud is needed for the actual work (generating the cloud of words after reading the files).

The required Python (2 or 3) modules can be installed with pip, either directly:

# Directly:
sudo pip install matplotlib docopt word_cloud

Or with the requirements.txt file:

sudo pip install -r requirements.txt

Note: if ansicolortags is available, it will be used to print nice colors in the help and during the generation of word clouds.

2. Installation

Clone the repository, copy the script (generate-word-cloud.py) somewhere in your PATH (e.g., ~/.local/bin/).

You can also just download the script itself:

$ wget https://raw.githubusercontent.com/Naereen/generate-word-cloud.py/master/generate-word-cloud.py
$ cp generate-word-cloud.py /path/to/a/directory/in/your/PATH/

Note: The script is also available from PyPI : pypi.python.org/pypi/generatewordcloud. You can install it using pip.

$ sudo pip install generatewordcloud

3. Usage

Help:

$ generate-word-cloud.py --help

From one or two files

Generate a wordcloud from two txt files in the current directory, save it to wordcloud_txt.png.

$ generate-word-cloud.py -o ./wordcloud_txt.png ./file1.txt ./file2.txt

Generate a wordcloud from the textfile hamlet.txt (~ 8000 lines), saving to hamlet.png:

$ generate-word-cloud.py -o ./hamlet.png ./hamlet.txt

(It should work on pretty big text files without any issue.)


Other examples

From a lot of Python scripts (~ 200)

From a lot of Bash scripts (~ 150)

From a lot of LaTeX files (~ 180)

Meta example

Generate a wordcloud from the README.md and generate-word-cloud.py files of this very project, save it to wordcloud_meta.png!

$ generate-word-cloud.py -o ./wordcloud_meta.png ./*.md ./*.py


Features

  • [x] Support one or more input file(s), will cleanly skip any file it fails to find or fails to read,
  • [x] Custom output file, won’t be overwritten (except with -f flag),
  • [x] Nice command line interface (argparse powered). I switched to docopt after realizing how awesome it is!
  • [x] Has a command line option for every important parameter (max nb of words, width, height etc).
  • [x] Input filenames with spaces in their name were seen as several files (e.g. this file.txt), FIXED with the switch to docopt.

Complete documentation (--help)

$ generate-word-cloud.py -h | --help
Usage:
  generate-word-cloud.py [-s | --show] [-f | --force] [-o OUTFILE | --outfile=OUTFILE]
                         [-t TITLE | --title=TITLE] [-m MAX | --max=MAX]
                         [-w WIDTH | --width=WIDTH] [-H HEIGHT | --height=HEIGHT]
                         INFILE...
  generate-word-cloud.py (-h | --help)
  generate-word-cloud.py (-v | --version)

Options:
  -h --help            Show this help message and exit.
  -v --version         Show program's version number and exit.
  -s --show            Show the image but do not save it [default False].
  -f --force           Force to write the image, even if present (default is to ask before overwriting an existing file) [default False].
  -o OUTFILE --outfile=OUTFILE
                       Filename for the generated image [default 'wordcloud.png'].
  -t TITLE --title=TITLE
                       Title for the image [default None].
  -m MAX --max MAX
                       Max number of words to display on the cloud word [default 150].
  -w WIDTH --width WIDTH
                       Width of the generate image [default 400].
  -H HEIGHT --height HEIGHT
                       Height of the generate image [default 300].
  INFILE               A text file to read.

TODO

  • [x] Start it, from this example,
  • [x] Run it on some interesting examples, embed them here (as images),
  • [X] Check on weird encodings? (i.e., not UTF-8). It works fine!
  • [X] Test it against :closed_book: VERY large files (million of line) ? It works fine, slowly but fine.
  • [X] Test it against LOTS of files (several thousands) ? It works fine, slowly but fine.
  • [X] Publish it on PyPI: it is available at pypi.python.org/pypi/generatewordcloud/.
  • [ ] Write a small article about it for my blog.

Knows issues

  • [ ] Only tested on (X)Ubuntu (15.10), but it should work on other GNU/Linux distribution and Mac OS X (and probably Windows), if they support docopt and has both docopt and word_cloud installed.

Unknown issues?

Use the issue tracker to notify me of a bug!


About

Why write this script?

There already is a lot of good cloud word generator online, e.g. wordle.net.

  1. I wanted a way to visualize the major keywords of Bash and Python (my two favorite programming languages) and of Markdown/Strapdown, reStructuredText and LaTeX (my favorite typeset documents system),
  2. The original project word_cloud seemed cool. And it is. Great job @amueller !
  3. Clouds of words are interesting! And Python is awesome!

License ?

This plug-in is published under the terms of the GPLv3 License (file LICENSE.txt), © Lilian Besson, 2016.

Release History

Release History

0.3

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting