Skip to main content

Value-proportional word cloud generator with true size relationships

Project description

TrueWordCloud

Value-Proportional Word Cloud Generator

A word cloud generator that maintains TRUE proportional relationships between values. Unlike traditional word clouds that arbitrarily resize words to fit a canvas, TrueWordCloud ensures font sizes are ALWAYS proportional to the input values.


v1.2.0 Update:

  • Refactored for clarity: all parameters are set in the constructor (__init__), with a unified naming scheme.
  • Redundant parameters removed, API simplified.
  • Documentation and examples updated for consistency.

Key Features

  • True Proportionality - Font sizes strictly proportional to input values (no squeezing/normalization)
  • 🎨 Three Layout Algorithms - Choose between 'greedy' (fast, deterministic), 'square' (compact, randomized), and 'distance_transform' (compact packing using distance transform)
  • 🖼️ Mask Support - Use custom mask images to constrain word placement (black=allowed, white=forbidden)
  • 🌈 Color Masks - Use colored masks to assign word colors from an image
  • 🖋️ Mask Outline - Optionally overlay the mask outline on the generated word cloud
  • 📐 Dynamic Canvas - Canvas size determined by content, not pre-fixed dimensions
  • 🔢 Any Numeric Values - Works with frequencies, keyness scores, TF-IDF, probabilities, etc.
  • 🎯 No Overlaps - Guaranteed non-overlapping word placement
  • 🌈 Custom Colors - Flexible color function support
  • 📊 Detailed Statistics - Use generate_with_stats() to get placement and layout stats

Installation

pip install truewordcloud

Or install from source:

git clone https://github.com/laurenceanthony/truewordcloud.git
cd truewordcloud
pip install -e .

Quick Start

from truewordcloud import TrueWordCloud

# Simple usage
values = {'python': 100, 'data': 80, 'science': 75, 'visualization': 60}
twc = TrueWordCloud(values=values)
image = twc.generate()
image.save('wordcloud.png')

Visual Overview

The examples below are generated by examples.py and saved in examples/. Because TrueWordCloud preserves TRUE proportional font sizes, the image canvas expands as needed to fit every word without rescaling. As a result, distance_transform outputs can sometimes appear slightly larger than greedy and square, since the latter two methods can often pack words more densely.

Greedy Square Distance Transform
Greedy basic layout Square basic layout Distance transform basic layout

Layout Algorithms

Greedy Spiral (method='greedy')

Best for: Speed, reproducibility, circular aesthetics

  • ⚡ Fast spiral placement from center outward
  • 🔒 Deterministic (same input → same output)
  • 🎯 Creates radial/circular patterns
  • ✅ Ideal for scientific papers, reports, consistent branding
twc = TrueWordCloud(values=values, method='greedy')

Greedy layout example

Square Packing (method='square')

Best for: Compact layouts, gap filling, visual variety

  • 📦 Center-outward square packing with intelligent gap filling
  • 🎲 Randomized (varied layouts each run)
  • 📐 Maintains roughly square aspect ratio (width ≈ height)
  • ✅ Ideal for presentations, posters, artistic displays
twc = TrueWordCloud(values=values, method='square')

Square layout example

Distance Transform Packing (method='distance_transform')

Best for: Most compact, mask-constrained layouts

  • 🧲 Uses distance transform to pack words tightly
  • 🖼️ Works especially well with masks
  • 🧩 Fills gaps more efficiently than other methods
  • ✅ Ideal for artistic, shape-constrained, or dense word clouds
twc = TrueWordCloud(values=values, method='distance_transform')

Distance transform layout example

Mask Support

You can constrain word placement to a custom shape using a mask image (black=allowed, white=forbidden):

Mask asset:

Heart mask asset
from PIL import Image
mask_img = Image.open('mask.png').convert('L')
twc = TrueWordCloud(values=values, method='greedy', mask=mask_img)
image = twc.generate()
image.save('masked_wordcloud.png')

Layout comparison with the same heart mask:

Greedy Mask Square Mask Distance Transform Mask
Greedy heart mask layout Square heart mask layout Distance transform heart mask layout

Mask Outline

To overlay the mask outline on the word cloud:

twc = TrueWordCloud(values=values, method='greedy', mask=mask_img, show_mask_outline=True, mask_outline_color='#00AAFF', mask_outline_width=2)
image = twc.generate()
image.save('masked_wordcloud_with_outline.png')

Color Masks

You can use a colored mask to assign word colors from an image:

Color mask asset:

Colored heart mask asset
color_mask_img = Image.open('color_mask.png')
twc = TrueWordCloud(values=values, method='greedy', mask=color_mask_img, use_mask_colors=True, mask_shape_transparency=True)
image = twc.generate()
image.save('color_masked_wordcloud.png')

Color-mask layout comparison:

Greedy Color Mask Square Color Mask Distance Transform Color Mask
Greedy colored heart mask layout Square colored heart mask layout Distance transform colored heart mask layout

Advanced Usage

Custom Colors

def color_func(word, freq, norm_freq):
    # norm_freq is between 0 and 1
    if norm_freq > 0.7:
        return (255, 0, 0)  # Red for high frequency
    elif norm_freq > 0.4:
        return (0, 0, 255)  # Blue for medium
    else:
        return (128, 128, 128)  # Gray for low

twc = TrueWordCloud(values=values, color_func=color_func)

All Parameters

twc = TrueWordCloud(
    values={'word': 100, 'cloud': 50},  # Required: word -> value mapping
    method='greedy',                    # 'greedy', 'square', or 'distance_transform'
    margin=2,                           # Pixels between words
    angle_divisor=3.0,                  # Angle divisor for spiral layout
    max_attempts=20,                    # Max mask scaling attempts
    scale_factor=1.2,                   # Mask scaling factor
    seed=None,                          # Random seed
    base_font_size=100,                 # Font size for max value word
    font_path='/path/to/font.ttf',      # Custom font (auto-detected if None)
    min_font_size=10,                   # Minimum font size
    background_color=(255, 255, 255),   # RGB tuple
    color_func=None,                    # Custom color function
    mask=None,                          # Mask image (PIL Image)
    use_mask_colors=False,              # Use colors from mask image
    mask_shape_transparency=False,      # True for transparent mask, False for white background
    show_mask_outline=False,            # Overlay mask outline
    mask_outline_color=(0, 0, 0),       # Outline color
    mask_outline_width=1,               # Outline width
)

# Generate with statistics
image, stats = twc.generate_with_stats()
print(stats)  # {'num_words': 2, 'size_range': (50, 100), 'canvas_size': (800, 600), 'method': 'greedy', ...}

Comparison with Traditional Word Clouds

Feature TrueWordCloud Traditional Word Clouds
Proportionality ✅ Strict (font_size ∝ value) ❌ Arbitrary resizing to fit
Canvas Size Dynamic (fits content) Fixed (pre-defined)
Reproducibility ✅ Greedy method Sometimes
Layout Options 3 algorithms + mask Usually 1
Value Types Any numeric Usually just frequencies
Mask Support ✅ Yes Sometimes
Color Masks ✅ Yes Rare

Why True Proportionality Matters

Traditional word clouds often lie about the data:

  • A word with value 100 might be rendered at 80pt
  • A word with value 50 might be rendered at 75pt
  • Ratios like 2:1 become 1.07:1

TrueWordCloud guarantees:

  • Value 100 → 100pt, Value 50 → 50pt
  • Ratios are preserved: 2:1 stays 2:1
  • Visual size accurately represents data magnitude

Use Cases

  • Linguistic Analysis - Word frequencies, keyness scores, TF-IDF
  • Survey Results - Response counts, satisfaction scores
  • Scientific Papers - Maintaining accurate proportional relationships
  • Marketing - Brand mentions, sentiment scores
  • Education - Concept importance, study time allocation
  • Artistic/Shape Clouds - Custom shapes, logos, or images as masks

Requirements

  • Python 3.7+
  • Pillow (PIL)
  • numpy
  • scipy

License

MIT License - see LICENSE file for details

Contributing

Contributions welcome! Please open an issue or submit a pull request.

Citation

If you use TrueWordCloud in academic work, please cite:

@software{truewordcloud2026,
  title={TrueWordCloud: Value-Proportional Word Cloud Generator},
  author={Laurence Anthony},
  year={2026},
  url={https://github.com/laurenceanthony/truewordcloud}
}

Examples

Frequency Data

word_frequencies = {
    'the': 1000, 'Python': 500, 'data': 400, 'analysis': 300,
    'machine': 250, 'learning': 250, 'algorithm': 200
}
twc = TrueWordCloud(values=word_frequencies, method='greedy')
twc.generate().save('frequencies.png')

Frequency data example

Keyness Scores

keyness_scores = {
    'significant': 12.5, 'analysis': 8.3, 'corpus': 6.7,
    'frequency': 5.2, 'text': 4.1
}
twc = TrueWordCloud(values=keyness_scores, method='square', base_font_size=50)
twc.generate().save('keyness.png')

Keyness example

With Custom Styling

from PIL import ImageColor

def rainbow_color(word, freq, norm_freq):
    # Rainbow gradient based on frequency
    hue = int(norm_freq * 270)  # 0 (red) to 270 (blue)
    return ImageColor.getrgb(f'hsl({hue}, 100%, 50%)')

twc = TrueWordCloud(
    values=word_frequencies,
    method='square',
    color_func=rainbow_color,
    background_color=(0, 0, 0),  # Black background
    margin=5
)
twc.generate().save('rainbow.png')

Custom color example

With Mask and Mask Outline

from PIL import Image
mask_img = Image.open('mask_heart.png').convert('L')
twc = TrueWordCloud(values=word_frequencies, method='distance_transform', mask=mask_img, show_mask_outline=True, mask_outline_color='#00AAFF', mask_outline_width=2)
image = twc.generate()
image.save('heart_mask_wordcloud.png')

Mask outline example

With Color Mask

color_mask_img = Image.open('mask_heart_color.png')
twc = TrueWordCloud(values=word_frequencies, method='greedy', mask=color_mask_img, mask_shape_transparency=True, use_mask_colors=True)
image = twc.generate()
image.save('color_mask_wordcloud.png')

Color mask example

FAQ

Q: Why are the layouts different sizes?
A: Canvas size is determined by content. More words or higher values = larger canvas. This maintains true proportions.

Q: Can I fix the canvas size?
A: Not directly, as that would require resizing words (breaking true proportionality). Instead, adjust base_font_size to control overall scale.

Q: Which method should I use?
A: Use greedy for speed and reproducibility. Use square for compact layouts and visual variety. Use distance_transform for the most compact, mask-constrained layouts.

Q: How do I make words fit in a specific area?
A: Reduce base_font_size until the generated canvas is the desired size.

Q: How do I use a mask or color mask?
A: See the Mask Support and Color Masks sections above for examples.


Made with ❤️ for accurate data visualization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

truewordcloud-1.2.1.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

truewordcloud-1.2.1-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file truewordcloud-1.2.1.tar.gz.

File metadata

  • Download URL: truewordcloud-1.2.1.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for truewordcloud-1.2.1.tar.gz
Algorithm Hash digest
SHA256 e8724efa9604766d176fd74a314892d03b43b5029dac6cfaca8ff385ecf735b7
MD5 7047b9122052c2201f4373b21755ae0e
BLAKE2b-256 7625ee91e2b7e103aec7920d450dd12ddf0ab7005d7dbda1a675a57cc04a569c

See more details on using hashes here.

File details

Details for the file truewordcloud-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: truewordcloud-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for truewordcloud-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8d09ea739ebb27ee044956963b4bf8ed311516b65740d13924144b76a93ee1ed
MD5 bff1b7b7bd03cb11e5dcab26be4d1c28
BLAKE2b-256 9e2ca4205d1dd3564571d5c69772279b281ad56b96d3459e1252814fd55698a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page