Skip to main content

Utilities for building ML applications from the Google Fonts dataset

Project description

blys

blys is a utility library for building ML models that learn from font data.

It provides:

  • structured access to Google Fonts metadata and font files
  • robust glyph rasterization by GID/codepoint
  • reusable PyTorch dataset and dataloader building blocks
  • a compact, extensible training loop with checkpointing and TensorBoard logging

The package is designed to be imported by task-specific projects rather than prescribing a single model architecture.

Installation

From PyPI:

pip install blys

Data Source Expectations

Most dataset utilities expect a local clone of the Google Fonts repository with at least:

  • ofl/*/*.ttf
  • tags/all/families.csv

Example:

git clone https://github.com/google/fonts.git /data/google-fonts

Quick Start

1) Load Google Fonts and inspect metadata

from blys.googlefonts import GoogleFonts

gf = GoogleFonts("/data/google-fonts")
print(f"Loaded {len(gf.fonts)} fonts")

font = gf.fonts[0]
print(font.family)
print(font.classification())
print(font.description_with_tags_and_display())

2) Render glyphs

from blys.render import render_gid
from blys.googlefonts import GoogleFonts
import uharfbuzz as hb

gf = GoogleFonts("/data/google-fonts")
font = gf.fonts[0]

# Find GID for 'A'
gid = hb.Font(font.hb_face).get_nominal_glyph(ord("A"))

# CHW float image in [0, 1], shape (3, 128, 128)
image = render_gid(font.path, gid=gid, size=128)
print(image.shape, image.min(), image.max())

3) Build a task-specific DatasetMaker

DatasetMaker handles train/test font splits and DataLoader construction. You provide collate_fn.

from blys.dataset import DatasetMaker
import torch


class GlyphDatasetMaker(DatasetMaker):
    def collate_fn(self, batch):
        chars = torch.tensor([item["char"] for item in batch], dtype=torch.long)
        images = torch.stack(
            [
                torch.tensor(item["font"].render_char(item["char"], size=128), dtype=torch.float32)
                for item in batch
            ]
        )
        return {
            "char": chars,
            "image": images,
            "description": [item["font"].description_with_tags_and_display() for item in batch],
        }


maker = GlyphDatasetMaker(
    repo_url="/data/google-fonts",
    batch_size=16,
)
train_loader = maker.train_loader()
batch = next(iter(train_loader))
print(batch["image"].shape)

Core Modules

blys.googlefonts

  • GoogleFonts: loads/filter fonts from a Google Fonts checkout
  • GoogleFont: one font with metadata/tag/description helpers
  • StandaloneFont: local-font fallback implementing the same interface
  • find_google_font_by_basename: match a font by filename
  • compute_display_score: derive display/text style centile from tags

blys.font

  • Font: abstract base with shared operations:
    • rendering by codepoint (render_char) and gid (render_gid)
    • codepoint queries
    • variable-axis sampling (sample_axis_positions)
    • empty-glyph checks (has_non_empty_codepoint, has_non_empty_gid)

blys.render

  • render_gid: deterministic glyph rasterization by GID with optional variable-axis coordinates
  • is_blank_rendering: utility to detect all-white/all-black outputs
  • a small CLI entry point for local rendering/debugging

blys.dataset

  • constants for commonly used character sets:
    • LATIN_CORE
    • LATIN_KERNEL
  • DatasetMaker: split and loader orchestration
  • Dataset: (font, char) samples filtered by available codepoints
  • AllGidsDataset: (font, gid) samples over non-empty glyphs
  • ClassBalancedBatchSampler: class-balanced index batching by font classification

blys.utils

  • TrainingLoop: minimal training harness with:
    • reproducibility setup
    • git cleanliness/commit tracking
    • TensorBoard logging
    • best-checkpoint saving
  • helpers for device selection and CLI codepoint parsing

Example Training Loop Usage

import torch
from blys.utils import TrainingLoop, SaveLoadModel
from blys.dataset import DatasetMaker


class MyModel(SaveLoadModel):
    def __init__(self):
        super().__init__()
        self.net = torch.nn.Linear(10, 1)

    def forward(self, x):
        return self.net(x)


class MyLoop(TrainingLoop):
    def post_init(self, train_args):
        self.model = MyModel().to(self.device)
        maker = DatasetMaker(
            repo_url=train_args.dataset_path,
            batch_size=train_args.batch_size,
            target_codepoints={ord("A"), ord("B"), ord("C")},
        )
        self.train_loader = maker.train_loader()
        self.test_loader = maker.test_loader()
        self.optimizer = torch.optim.Adam(self.model.parameters(), lr=1e-3)
        self.num_epochs = 1
        self.target_steps = train_args.target_steps
        self.validation_direction = "higher"

    def train_step(self, batch):
        # Replace with your real tensorization/model logic
        dummy_x = torch.randn(4, 10, device=self.device)
        dummy_y = torch.randn(4, 1, device=self.device)
        pred = self.model(dummy_x)
        loss = torch.nn.functional.mse_loss(pred, dummy_y)
        return loss, {"loss": loss}

Testing

Run test suite:

pip install -e .[test]
pytest -q

Some tests require a real Google Fonts checkout path and expect the GOOGLE_FONTS_REPO environment variable to be set; others run against tests/dummy_repo.

License

This project is available under the Apache 2.0 License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blys-0.1.0.tar.gz (802.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blys-0.1.0-py3-none-any.whl (26.8 kB view details)

Uploaded Python 3

File details

Details for the file blys-0.1.0.tar.gz.

File metadata

  • Download URL: blys-0.1.0.tar.gz
  • Upload date:
  • Size: 802.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for blys-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7a988e92db4dd1a007089aafe23081baf86e2f2f3517845bfda1d8ae71abc61b
MD5 a53b25929dd80d998f90ee34cc118171
BLAKE2b-256 add1d7b32dd8fc5465b7a623169f31b1358b12cd2a3e8a77b999dc9baa9a1689

See more details on using hashes here.

File details

Details for the file blys-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: blys-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for blys-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 72d2ce2cc3f185b35bfa0d764e3f8fed1c040af5bb7a520016907fd5fcba7cec
MD5 aec5a2dcb3ba7658ae8aad0fea676d7e
BLAKE2b-256 13356bb984419c9656b4bd813e5d5a9fa66f329de884088d314cdf4e78257e74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page