Hyper fast hyphenation for Python
Project description
TextShape
A high-performance Python library for shaping text into columns using advanced line breaking algorithms and precise character positioning with HarfBuzz. TextShape is built with performance in mind, vectorizing most operations with NumPy for efficient text layout operations.
Features
- Advanced Text Wrapping: Intelligent line breaking with support for hyphenation
- Multi-Column Page Layout: Support for multiple columns on a page with customizable spacing for margins
- Font-Aware Rendering: Precise character positioning using HarfBuzz
- Justification: Full text justification support
- Performance Optimized: Vectorized operations with NumPy for speed
- SVG Output: Generate SVG visualizations of text
Installation
Install TextShape using pip:
pip install textshape
Requirements
- Python 3.9+
- NumPy >= 2.0.0
- vharfbuzz
Optional dependency for hyphenation:
- hyperhyphen
Quick Start
from textshape import FontMeasure, TextFragmenter, TextColumn
# Load a font
font_path = "path/to/your/font.ttf"
font_measure = FontMeasure(font_path)
# Create text fragments
text = "Your text here..."
fragmenter = TextFragmenter(measure=font_measure)
fragments = fragmenter(text)
# Create a text column
column = TextColumn(
fragments=fragments,
column_width=300, # Width in points
fontsize=12,
justify=True
)
# Get bounding boxes for rendering
text, x, dx, x_origin, y, dy, y_origin = column.to_bounding_boxes()
Usage Examples
Basic Text Wrapping
from textshape import FontMeasure, TextFragmenter, TextColumn
# Setup
font_measure = FontMeasure("fonts/NotoSans-Regular.ttf")
fragmenter = TextFragmenter(measure=font_measure)
text = """Whether I shall turn out to be the hero of my own life, or whether that
station will be held by anybody else, these pages must show."""
# Fragment and wrap text
fragments = fragmenter(text)
column = TextColumn(
fragments=fragments,
column_width=31 * 12, # 31 characters at 12pt
fontsize=12,
justify=False
)
# Get wrapped lines as strings
lines = column.to_list()
for i, line in enumerate(lines):
print(f"{i+1:02d}: {line}")
Multi-Column Layout
from textshape import FontMeasure, TextFragmenter, MultiColumn
# Setup for multi-column layout
font_measure = FontMeasure("fonts/NotoSans-Regular.ttf")
fragmenter = TextFragmenter(measure=font_measure)
# Long text content
text = "Your long text content here..."
fragments = fragmenter(text)
# Create multi-column layout
multi_column = MultiColumn(
fragments=fragments,
column_width=250,
fontsize=12,
justify=True
)
# Get bounding boxes with column information
text, x, dx, x_origin, y, dy, y_origin, column_id = multi_column.to_bounding_boxes(
max_lines_per_column=20,
line_spacing=1.2
)
Page Layout with Margins
from textshape import FontMeasure, TextFragmenter, MultiColumn, Layout
# Setup
font_measure = FontMeasure("fonts/NotoSans-Regular.ttf")
fragmenter = TextFragmenter(measure=font_measure)
text = "Your document text..."
fragments = fragmenter(text)
# Create page layout
layout = Layout(
columns=2,
column_spacing=15,
page_size=(600, 800), # width, height
margins=50 # uniform margins
)
# Create multi-column text
multi_column = MultiColumn(
fragments=fragments,
column_width=layout.column_widths,
fontsize=12,
justify=True
)
# Get positioned text for the layout
text, x, dx, x_origin, y, dy, y_origin, page = layout.to_bounding_boxes(multi_column)
SVG Rendering
# Generate SVG output
svg_content = font_measure.render_svg(
text=text,
x_origin=x_origin,
y_origin=y_origin,
fontsize=12,
canvas_width=600,
canvas_height=800
)
# Save to file
with open("output.svg", "w") as f:
f.write(svg_content)
Note on Coordinate System: TextShape uses the SVG/HTML coordinate system where the origin (0, 0) is at the top-left corner, and the y-axis increases downward. All coordinates returned by to_bounding_boxes() follow this convention.
Line breaking with hyphenation
import re
from hyperhyphen import Hyphenator
hyph = Hyphenator(mode="spans", language="en_US")
# Use custom splitter
fragmenter = TextFragmenter(
measure=font_measure,
splitter=hyph,
)
API Reference
Core Classes
FontMeasure
Handles font loading and character measurement using HarfBuzz.
FontMeasure(fontpath: str, features: Optional[dict] = None)
TextFragmenter
Breaks a string of text into atomic fragments. A fragment cannot be split further, and lines can only be broken at fragment boundaries. Fragments can be full words, or parts of words if allowing for hyphenation.
TextFragmenter(
measure: Optional[Callable] = None,
splitter: Optional[Callable] = None,
tab_width: float | int = 4
)
TextColumn
Wraps text fragments into a single column.
TextColumn(
fragments: TextFragments,
column_width: int | float | list[float],
fontsize: int | float,
justify: bool = False
)
MultiColumn
Extends TextColumn to support multiple columns.
MultiColumn(
fragments: TextFragments,
column_width: int | float | list[float],
fontsize: int | float,
justify: bool = False
)
Layout
Manages page layout with multiple columns and margins.
Layout(
columns: int,
column_spacing: float,
page_size: tuple[float, float],
margins: float | tuple[float, ...]
)
Advanced Features
Text Justification
Enable full justification to align text to both left and right margins:
column = TextColumn(fragments, column_width=300, fontsize=12, justify=True)
Variable Column Widths
Support different widths for each line:
import numpy as np
# Varying column widths
widths = np.linspace(200, 400, num_lines)
column = TextColumn(fragments, column_width=widths, fontsize=12)
Custom Line Spacing
Control spacing between lines:
text, x, dx, x_origin, y, dy, y_origin = column.to_bounding_boxes(line_spacing=1.5)
Performance Tips
- Reuse FontMeasure objects - Font loading is expensive
- Use vectorized operations - The library is optimized for batch processing
- Cache fragments - TextFragmenter results can be reused for different layouts
- Choose appropriate column widths - Very narrow columns increase computation time
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file textshape-1.1.0.tar.gz.
File metadata
- Download URL: textshape-1.1.0.tar.gz
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1fb564b158ebf5600c8f5ea24eafce0f273df4c4114019ad25ef6c6cde6c378
|
|
| MD5 |
0f66f49fe8adf97b97accb62b3c41260
|
|
| BLAKE2b-256 |
8834eefd4d3b69803b6a814cc799307c69c7f96c4d9a2721244c960d4f4df73d
|
Provenance
The following attestation bundles were made for textshape-1.1.0.tar.gz:
Publisher:
release.yml on digi-deity/textshape
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
textshape-1.1.0.tar.gz -
Subject digest:
d1fb564b158ebf5600c8f5ea24eafce0f273df4c4114019ad25ef6c6cde6c378 - Sigstore transparency entry: 958906108
- Sigstore integration time:
-
Permalink:
digi-deity/textshape@ffe7dde314b18456d7ad55dce4caae188521008e -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/digi-deity
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ffe7dde314b18456d7ad55dce4caae188521008e -
Trigger Event:
push
-
Statement type: