Skip to main content

Value-proportional word cloud generator with true size relationships

Project description

TrueWordCloud

Value-Proportional Word Cloud Generator

A word cloud generator that maintains TRUE proportional relationships between values. Unlike traditional word clouds that arbitrarily resize words to fit a canvas, TrueWordCloud ensures font sizes are ALWAYS proportional to the input values.

Key Features

  • True Proportionality - Font sizes strictly proportional to input values (no squeezing/normalization)
  • 🎨 Three Layout Algorithms - Choose between 'greedy' (fast, deterministic), 'square' (compact, randomized), and 'distance_transform' (compact packing using distance transform)
  • 🖼️ Mask Support - Use custom mask images to constrain word placement (black=allowed, white=forbidden)
  • 🌈 Color Masks - Use colored masks to assign word colors from an image
  • 🖋️ Mask Outline - Optionally overlay the mask outline on the generated word cloud
  • 📐 Dynamic Canvas - Canvas size determined by content, not pre-fixed dimensions
  • 🔢 Any Numeric Values - Works with frequencies, keyness scores, TF-IDF, probabilities, etc.
  • 🎯 No Overlaps - Guaranteed non-overlapping word placement
  • 🌈 Custom Colors - Flexible color function support
  • 📊 Detailed Statistics - Use generate_with_stats() to get placement and layout stats

Installation

pip install truewordcloud

Or install from source:

git clone https://github.com/laurenceanthony/truewordcloud.git
cd truewordcloud
pip install -e .

Quick Start

from truewordcloud import TrueWordCloud

# Simple usage
values = {'python': 100, 'data': 80, 'science': 75, 'visualization': 60}
twc = TrueWordCloud(values=values)
image = twc.generate()
image.save('wordcloud.png')

Layout Algorithms

Greedy Spiral (method='greedy')

Best for: Speed, reproducibility, circular aesthetics

  • ⚡ Fast spiral placement from center outward
  • 🔒 Deterministic (same input → same output)
  • 🎯 Creates radial/circular patterns
  • ✅ Ideal for scientific papers, reports, consistent branding
twc = TrueWordCloud(values=values, method='greedy')

Square Packing (method='square')

Best for: Compact layouts, gap filling, visual variety

  • 📦 Center-outward square packing with intelligent gap filling
  • 🎲 Randomized (varied layouts each run)
  • 📐 Maintains roughly square aspect ratio (width ≈ height)
  • ✅ Ideal for presentations, posters, artistic displays
twc = TrueWordCloud(values=values, method='square')

Distance Transform Packing (method='distance_transform')

Best for: Most compact, mask-constrained layouts

  • 🧲 Uses distance transform to pack words tightly
  • 🖼️ Works especially well with masks
  • 🧩 Fills gaps more efficiently than other methods
  • ✅ Ideal for artistic, shape-constrained, or dense word clouds
twc = TrueWordCloud(values=values, method='distance_transform')

Mask Support

You can constrain word placement to a custom shape using a mask image (black=allowed, white=forbidden):

from PIL import Image
mask_img = Image.open('mask.png').convert('L')
twc = TrueWordCloud(values=values, method='greedy')
image = twc.generate(mask=mask_img)
image.save('masked_wordcloud.png')

Mask Outline

To overlay the mask outline on the word cloud:

image = twc.generate(mask=mask_img, mask_outline=True, mask_outline_color='#00AAFF', mask_outline_width=2)
image.save('masked_wordcloud_with_outline.png')

Color Masks

You can use a colored mask to assign word colors from an image:

color_mask_img = Image.open('color_mask.png')
twc = TrueWordCloud(values=values, method='greedy', use_mask_colors=True, mask_shape_mode='colors')
image = twc.generate(mask=color_mask_img)
image.save('color_masked_wordcloud.png')

Advanced Usage

Custom Colors

def color_func(word, freq, norm_freq):
    # norm_freq is between 0 and 1
    if norm_freq > 0.7:
        return (255, 0, 0)  # Red for high frequency
    elif norm_freq > 0.4:
        return (0, 0, 255)  # Blue for medium
    else:
        return (128, 128, 128)  # Gray for low

twc = TrueWordCloud(values=values, color_func=color_func)

All Parameters

twc = TrueWordCloud(
    values={'word': 100, 'cloud': 50},  # Required: word -> value mapping
    method='greedy',                     # 'greedy', 'square', or 'distance_transform'
    base_font_size=100,                  # Font size for max value word
    font_path='/path/to/font.ttf',       # Custom font (auto-detected if None)
    min_font_size=10,                    # Minimum font size
    background_color=(255, 255, 255),    # RGB tuple
    margin=2,                            # Pixels between words
    color_func=None,                     # Custom color function
    use_mask_colors=False,               # Use colors from mask image
    mask_shape_mode='no-colors'          # 'no-colors' or 'colors'
)

# Generate with statistics
image, stats = twc.generate_with_stats(mask=mask_img)
print(stats)  # {'num_words': 2, 'size_range': (50, 100), 'canvas_size': (800, 600), 'method': 'greedy', ...}

Comparison with Traditional Word Clouds

Feature TrueWordCloud Traditional Word Clouds
Proportionality ✅ Strict (font_size ∝ value) ❌ Arbitrary resizing to fit
Canvas Size Dynamic (fits content) Fixed (pre-defined)
Reproducibility ✅ Greedy method Sometimes
Layout Options 3 algorithms + mask Usually 1
Value Types Any numeric Usually just frequencies
Mask Support ✅ Yes Sometimes
Color Masks ✅ Yes Rare

Why True Proportionality Matters

Traditional word clouds often lie about the data:

  • A word with value 100 might be rendered at 80pt
  • A word with value 50 might be rendered at 75pt
  • Ratios like 2:1 become 1.07:1

TrueWordCloud guarantees:

  • Value 100 → 100pt, Value 50 → 50pt
  • Ratios are preserved: 2:1 stays 2:1
  • Visual size accurately represents data magnitude

Use Cases

  • Linguistic Analysis - Word frequencies, keyness scores, TF-IDF
  • Survey Results - Response counts, satisfaction scores
  • Scientific Papers - Maintaining accurate proportional relationships
  • Marketing - Brand mentions, sentiment scores
  • Education - Concept importance, study time allocation
  • Artistic/Shape Clouds - Custom shapes, logos, or images as masks

Requirements

  • Python 3.7+
  • Pillow (PIL)
  • numpy
  • scipy

License

MIT License - see LICENSE file for details

Contributing

Contributions welcome! Please open an issue or submit a pull request.

Citation

If you use TrueWordCloud in academic work, please cite:

@software{truewordcloud2026,
  title={TrueWordCloud: Value-Proportional Word Cloud Generator},
  author={Laurence Anthony},
  year={2026},
  url={https://github.com/laurenceanthony/truewordcloud}
}

Examples

Frequency Data

word_frequencies = {
    'the': 1000, 'Python': 500, 'data': 400, 'analysis': 300,
    'machine': 250, 'learning': 250, 'algorithm': 200
}
twc = TrueWordCloud(values=word_frequencies, method='greedy')
twc.generate().save('frequencies.png')

Keyness Scores

keyness_scores = {
    'significant': 12.5, 'analysis': 8.3, 'corpus': 6.7,
    'frequency': 5.2, 'text': 4.1
}
twc = TrueWordCloud(values=keyness_scores, method='square', base_font_size=50)
twc.generate().save('keyness.png')

With Custom Styling

from PIL import ImageColor

def rainbow_color(word, freq, norm_freq):
    # Rainbow gradient based on frequency
    hue = int(norm_freq * 270)  # 0 (red) to 270 (blue)
    return ImageColor.getrgb(f'hsl({hue}, 100%, 50%)')

twc = TrueWordCloud(
    values=word_frequencies,
    method='square',
    color_func=rainbow_color,
    background_color=(0, 0, 0),  # Black background
    margin=5
)
twc.generate().save('rainbow.png')

With Mask and Mask Outline

from PIL import Image
mask_img = Image.open('mask_heart.png').convert('L')
twc = TrueWordCloud(values=word_frequencies, method='distance_transform')
image = twc.generate(mask=mask_img, mask_outline=True, mask_outline_color='#00AAFF', mask_outline_width=2)
image.save('heart_mask_wordcloud.png')

With Color Mask

color_mask_img = Image.open('mask_heart_color.png')
twc = TrueWordCloud(values=word_frequencies, method='greedy', use_mask_colors=True, mask_shape_mode='colors')
image = twc.generate(mask=color_mask_img)
image.save('color_mask_wordcloud.png')

FAQ

Q: Why are the layouts different sizes?
A: Canvas size is determined by content. More words or higher values = larger canvas. This maintains true proportions.

Q: Can I fix the canvas size?
A: Not directly, as that would require resizing words (breaking true proportionality). Instead, adjust base_font_size to control overall scale.

Q: Which method should I use?
A: Use greedy for speed and reproducibility. Use square for compact layouts and visual variety. Use distance_transform for the most compact, mask-constrained layouts.

Q: How do I make words fit in a specific area?
A: Reduce base_font_size until the generated canvas is the desired size.

Q: How do I use a mask or color mask?
A: See the Mask Support and Color Masks sections above for examples.


Made with ❤️ for accurate data visualization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

truewordcloud-1.1.1.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

truewordcloud-1.1.1-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file truewordcloud-1.1.1.tar.gz.

File metadata

  • Download URL: truewordcloud-1.1.1.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for truewordcloud-1.1.1.tar.gz
Algorithm Hash digest
SHA256 e5d9af9ac6bb6b00a11c599ff3087767370ef84872cbad82e34a78b3bc2d6d2f
MD5 47c6ef79ebd1c9b958953d093a894adc
BLAKE2b-256 67d2e655df1133dfef3e6706ce2beeffc6d7a59e6ee7cd0ad7b9911c68f43e8e

See more details on using hashes here.

File details

Details for the file truewordcloud-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: truewordcloud-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for truewordcloud-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a9a3177d555a237d5dbdc97ecf0e24be4f8fa8558a315684a856b6453e6f1e0a
MD5 4d4297075323af56c7988dda31eb1cd7
BLAKE2b-256 284a42c18fb2ca349257a64806d35a3987dbfcff286312c7253f2f96d8b64c52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page