Utilities for building ML applications from the Google Fonts dataset
Project description
blys
blys is a utility library for building ML models that learn from font data.
It provides:
- structured access to Google Fonts metadata and font files
- robust glyph rasterization by GID/codepoint
- reusable PyTorch dataset and dataloader building blocks
- a compact, extensible training loop with checkpointing and TensorBoard logging
The package is designed to be imported by task-specific projects rather than prescribing a single model architecture.
Installation
From PyPI:
pip install blys
Data Source Expectations
Most dataset utilities expect a local clone of the Google Fonts repository with at least:
ofl/*/*.ttftags/all/families.csv
Example:
git clone https://github.com/google/fonts.git /data/google-fonts
Quick Start
1) Load Google Fonts and inspect metadata
from blys.googlefonts import GoogleFonts
gf = GoogleFonts("/data/google-fonts")
print(f"Loaded {len(gf.fonts)} fonts")
font = gf.fonts[0]
print(font.family)
print(font.classification())
print(font.description_with_tags_and_display())
2) Render glyphs
from blys.render import render_gid
from blys.googlefonts import GoogleFonts
import uharfbuzz as hb
gf = GoogleFonts("/data/google-fonts")
font = gf.fonts[0]
# Find GID for 'A'
gid = hb.Font(font.hb_face).get_nominal_glyph(ord("A"))
# CHW float image in [0, 1], shape (3, 128, 128)
image = render_gid(font.path, gid=gid, size=128)
print(image.shape, image.min(), image.max())
3) Build a task-specific DatasetMaker
DatasetMaker handles train/test font splits and DataLoader construction. You provide collate_fn.
from blys.dataset import DatasetMaker
import torch
class GlyphDatasetMaker(DatasetMaker):
def collate_fn(self, batch):
chars = torch.tensor([item["char"] for item in batch], dtype=torch.long)
images = torch.stack(
[
torch.tensor(item["font"].render_char(item["char"], size=128), dtype=torch.float32)
for item in batch
]
)
return {
"char": chars,
"image": images,
"description": [item["font"].description_with_tags_and_display() for item in batch],
}
maker = GlyphDatasetMaker(
repo_url="/data/google-fonts",
batch_size=16,
)
train_loader = maker.train_loader()
batch = next(iter(train_loader))
print(batch["image"].shape)
Core Modules
blys.googlefonts
GoogleFonts: loads/filter fonts from a Google Fonts checkoutGoogleFont: one font with metadata/tag/description helpersStandaloneFont: local-font fallback implementing the same interfacefind_google_font_by_basename: match a font by filenamecompute_display_score: derive display/text style centile from tags
blys.font
Font: abstract base with shared operations:- rendering by codepoint (
render_char) and gid (render_gid) - codepoint queries
- variable-axis sampling (
sample_axis_positions) - empty-glyph checks (
has_non_empty_codepoint,has_non_empty_gid)
- rendering by codepoint (
blys.render
render_gid: deterministic glyph rasterization by GID with optional variable-axis coordinatesis_blank_rendering: utility to detect all-white/all-black outputs- a small CLI entry point for local rendering/debugging
blys.dataset
- constants for commonly used character sets:
LATIN_CORELATIN_KERNEL
DatasetMaker: split and loader orchestrationDataset:(font, char)samples filtered by available codepointsAllGidsDataset:(font, gid)samples over non-empty glyphsClassBalancedBatchSampler: class-balanced index batching by font classification
blys.utils
TrainingLoop: minimal training harness with:- reproducibility setup
- git cleanliness/commit tracking
- TensorBoard logging
- best-checkpoint saving
- helpers for device selection and CLI codepoint parsing
Example Training Loop Usage
import torch
from blys.utils import TrainingLoop, SaveLoadModel
from blys.dataset import DatasetMaker
class MyModel(SaveLoadModel):
def __init__(self):
super().__init__()
self.net = torch.nn.Linear(10, 1)
def forward(self, x):
return self.net(x)
class MyLoop(TrainingLoop):
def post_init(self, train_args):
self.model = MyModel().to(self.device)
maker = DatasetMaker(
repo_url=train_args.dataset_path,
batch_size=train_args.batch_size,
target_codepoints={ord("A"), ord("B"), ord("C")},
)
self.train_loader = maker.train_loader()
self.test_loader = maker.test_loader()
self.optimizer = torch.optim.Adam(self.model.parameters(), lr=1e-3)
self.num_epochs = 1
self.target_steps = train_args.target_steps
self.validation_direction = "higher"
def train_step(self, batch):
# Replace with your real tensorization/model logic
dummy_x = torch.randn(4, 10, device=self.device)
dummy_y = torch.randn(4, 1, device=self.device)
pred = self.model(dummy_x)
loss = torch.nn.functional.mse_loss(pred, dummy_y)
return loss, {"loss": loss}
Testing
Run test suite:
pip install -e .[test]
pytest -q
Some tests require a real Google Fonts checkout path and expect the GOOGLE_FONTS_REPO environment variable to be set; others run against tests/dummy_repo.
License
This project is available under the Apache 2.0 License. See the LICENSE file for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file blys-0.1.0.tar.gz.
File metadata
- Download URL: blys-0.1.0.tar.gz
- Upload date:
- Size: 802.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a988e92db4dd1a007089aafe23081baf86e2f2f3517845bfda1d8ae71abc61b
|
|
| MD5 |
a53b25929dd80d998f90ee34cc118171
|
|
| BLAKE2b-256 |
add1d7b32dd8fc5465b7a623169f31b1358b12cd2a3e8a77b999dc9baa9a1689
|
File details
Details for the file blys-0.1.0-py3-none-any.whl.
File metadata
- Download URL: blys-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72d2ce2cc3f185b35bfa0d764e3f8fed1c040af5bb7a520016907fd5fcba7cec
|
|
| MD5 |
aec5a2dcb3ba7658ae8aad0fea676d7e
|
|
| BLAKE2b-256 |
13356bb984419c9656b4bd813e5d5a9fa66f329de884088d314cdf4e78257e74
|