Visual font recognition (VFR) library
Project description
fontina
fontina
is a PyTorch library that helps with training models for the task of Visual Font
Recognition and doing inference with them.
Feature highlights
- DeepFont-like network architecture. See Z. Wang, J. Yang, H. Jin, E. Shechtman, A. Agarwala, J. Brandt and T. Huang, “DeepFont: Identify Your Font from An Image”, In Proceedings of ACM International Conference on Multimedia (ACM MM) , 2015
- Configuration-based synthetic dataset generation
- Configuration-based model training via PyTorch Lightning
- Supports training and inference on Linux, MacOS and Windows.
Using fontina
Installing the dependencies
Starting from a cloned repository directory:
# Create a virtual environment: this uses venv but any system
# would work!
python -m venv .venv
# Activate the virtual environment: this depends on the OS. See
# the two options below.
# .venv/Scripts/activate # Windows
source .venv/bin/activate # Linux
# Install the dependencies needed to use .
pip install .
Note Windows users must manually install the CUDA-based version of PyTorch, as pip will only install the CPU version on this platform. See PyTorch Get Started for the specific command to ru, which should be something along the lines of
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu117
.
(Optional) - Installing development dependencies
The following dependencies are only needed to develop fontina
.
# Install the developer dependencies.
pip install .[linting]
# Run linting and tests!
make lint
make test
Generating a synthetic dataset
If needed, the model can be trained on synthetic data. fontina
provides a synthetic
dataset generator that follows part of the recommendations from the DeepFont paper
to make the synthetic data look closer to the real data. To use the generator:
- Make a copy of
configs/sample.yaml
, e.g.configs/mymodel.yaml
- Open
configs/mymodel.yaml
and tweak thefonts
section:
fonts:
# ...
# Force an uniform white background for the generated images.
# backgrounds_path: "assets/backgrounds"
samples_per_font: 1000
# Fill in the paths of the fonts that need to be used to generate
# the data.
classes:
- name: Test Font
path: "assets/fonts/test/Test.ttf"
- name: Other Test Font
path: "assets/fonts/test2/Test2.ttf"
- Run the generation:
fontina-generate -c configs/mymodel.yaml -o outputs/font-images/mymodel
After this completes, there should be one directory per configured font in outputs/font-images/mymodel
.
Training
fontina
currently only supports training a DeepFont-like architecture. The training process
has two major steps: unsupervised training of the stacked autoencoders and supervised training
of the full network.
Before starting, make a copy of configs/sample.yaml
, e.g. configs/mymodel.yaml
(or use the
existing one that was created for the dataset generation step).
Part 1 - Unsupervised training
- Open
configs/mymodel.yaml
and tweak thetraining
section:
training:
# Set this to True to train the stacked autoencoders.
only_autoencoder: True
# Don't use an existing checkpoint for the unsupervised training.
# scae_checkpoint_file: "outputs/models/autoenc/good-checkpoint.ckpt"
data_root: "outputs/font-images/mymodel"
# The directory that will contain the model checkpoints.
output_dir: "outputs/models/mymodel-scae"
# The size of the batch to use for training.
batch_size: 128
# The initial learning rate to use for training.
learning_rate: 0.01
epochs: 20
# Whether or not to use a fraction of the data to run a
# test cycle on the trained model.
run_test_cycle: True
- Then run the training with:
python src/fontina/train.py -c configs/mymodel.yaml
Part 2 - Supervised training
- Open
configs/mymodel.yaml
(or create a new one!) and tweak thetraining
section:
training:
# This will freeze the stacked autoencoders.
only_autoencoder: False
# Pick the best checkpoint from the unsupervised training.
scae_checkpoint_file: "outputs/models/mymodel-scae/good-checkpoint.ckpt"
data_root: "outputs/font-images/mymodel"
# The directory that will contain the model checkpoints.
output_dir: "outputs/models/mymodel-full"
# The size of the batch to use for training.
batch_size: 128
# The initial learning rate to use for training.
learning_rate: 0.01
epochs: 20
# Whether or not to use a fraction of the data to run a
# test cycle on the trained model.
run_test_cycle: True
- Then run the training with:
fontina-train -c configs/mymodel.yaml
(Optional) - Monitor performance using TensorBoard
fontina
automatically captures the performances of the training runs in a TensorBoard-compatible
way. It should be possible to visualize the recorded data by pointing TensorBoard to the logs
directory as follows:
tensorboard --logdir=lightning_logs
Inference
Once training is complete, the resulting model can be used to run inference.
fontina-predict -n 6 -w "outputs/models/mymodel-full/best_checkpoint.ckpt" -i "assets/images/test.png"
AdobeVFR Pre-trained model
The AdobeVFR dataset is currently available for download at Dropbox, here. The license for using and distributing the dataset is available here, which cites:
This dataset ('Licensed Material') is made available to the scientific community for non-commercial research purposes such as academic research, teaching, scientific publications or personal experimentation.
The model, being trained on that dataset, retains the same spirit and the same license applies: the release model can only be used for non-commercial purposes.
How to train
- Download the dataset to
assets/AdobeVFR
- Unpack
assets/AdobeVFR/Raw Image/VFR_real_u/scrape-wtf-new.zip
in that directory so that theassets/AdobeVFR/Raw Image/VFR_real_u/scrape-wtf-new/
path exists - Run
fontina-train -c configs/adobe-vfr-autoencoder.yaml
. This will take a long while but progress can be checked with Tensorboard (see the previous sections) during training - Change
configs/adobe-vfr.yaml
so thatscae_checkpoint_file
points to the best checkpoint from step (3). - Run
fontina-train -c configs/adobe-vfr.yaml
. This will take a long while (but less than the unsupervised training round)
Downloading the models
While only the full model is needed, the stand-alone autonencoder model is being released as well.
- Stand-alone autoencoder model: Google Drive
- Full model: Google Drive
Note The pre-trained model achieves a validation loss of 0.3523, with an accuracy of 0.8855 after 14 epochs. Unfortunately the test performance on
VFR_real_test
is much worse, with a top-1 accuracy of 0.05. I'm releasing the model in the hope that somebody could help me fixing this 😊😅
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fontina-0.0.1rc6.tar.gz
.
File metadata
- Download URL: fontina-0.0.1rc6.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e906615c4cfcb830706943eb149af3fd988bef8257a06800d8d4f9414b4c9bf |
|
MD5 | 7ed41d6424508eafea57ed67d1efb295 |
|
BLAKE2b-256 | bda4c3fd16ab8ed4d0fde6ff1dc967f2f22f2ef419b3db7fa0df2ae9e59f4e75 |
File details
Details for the file fontina-0.0.1rc6-py3-none-any.whl
.
File metadata
- Download URL: fontina-0.0.1rc6-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9687599f36ae6ba9f9e8b266d5368ae396f9f1b2a78647cfe8b29d4370b8c5e |
|
MD5 | ab01f5cd0732a4d6bb32e301fad06dfa |
|
BLAKE2b-256 | 015a929914d58d16b5aadfd432adf17649f54069c9edc1db057e874bbf2d93bc |