A set of tools to compress gensim fasttext models
Project description
Compress-fasttext
This Python 3 package allows to compress fastText models (from the gensim
package) by orders of magnitude,
without seriously affecting their quality.
Use it like this:
import gensim
import compress_fasttext
big_model = gensim.models.fasttext.FastTextKeyedVectors.load('path-to-original-model')
small_model = compress_fasttext.prune_ft_freq(big_model, pq=True)
small_model.save('path-to-new-model')
Different compression methods include:
- matrix decomposition (
svd_ft
) - product quantization (
quantize_ft
) - optimization of feature hashing (
prune_ft
) - feature selection (
prune_ft_freq
)
The recommended approach is combination of feature selection and quantization
(the function prune_ft_freq
with pq=True
).
This code is heavily based on the navec package by Alexander Kukushkin and the blogpost by Andrey Vasnetsov about shrinking fastText embeddings.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.