floret Python bindings
Project description
floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy
floret is an extended version of fastText that can produce word representations for any word from a compact vector table. It combines:
- fastText's subwords to provide embeddings for any word
- Bloom embeddings ("hashing trick") for a compact vector table
Installation
pip install floret
Usage
Train floret vectors using the options:
mode
:"floret"
, storing both words and subwords in the same compact hash tablehashCount
: store each entry in 1-4 rows in the hash table (recommended:2
)bucket
: in combination withhashCount>1
, the size of the hash table can be greatly reduced (recommended:25000
--100000
, reduced from the fastText default of2000000
)minn
: min length of char ngram (default:3
)maxn
: max length of char ngram (default:6
)
import floret
# train vectors
model = floret.train_unsupervised(
"data.txt",
model="cbow",
mode="floret",
hashCount=2,
bucket=50000,
minn=3,
maxn=6,
)
# query vector
model.get_word_vector("broccoli")
# save full model
model.save_model("vectors.bin")
# export standard word-only vector table
model.save_vectors("vectors.vec")
# export floret vector table
model.save_floret_vectors("vectors.floret")
Note: with the default setting mode="fasttext"
, floret
trains original
fastText vectors.
Use floret vectors in spaCy
Import floret vectors into spaCy v3.2+:
spacy init vectors --mode floret vectors.floret spacy_vectors_model
Notes
floret
contains all features of the original fasttext
module. See the fasttext
docs for more information.
The fasttext
and floret
binary formats saved with
model.save_model("model.bin")
are not compatible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
floret-0.10.1.tar.gz
(70.4 kB
view hashes)
Built Distributions
floret-0.10.1-cp39-cp39-win_amd64.whl
(247.1 kB
view hashes)
floret-0.10.1-cp38-cp38-win_amd64.whl
(252.3 kB
view hashes)
Close
Hashes for floret-0.10.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed2aeb76b93cee621da1014191c64c3a88c90e5ae3104099e0343be6dc2510a2 |
|
MD5 | 633e2e5ab8183d2b2f8b6545f1eed19a |
|
BLAKE2b-256 | e68140e09fa165db8c0d854e104b26ff3b97ecb5adc93f424d4f6d195379d99d |
Close
Hashes for floret-0.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b8c1cdfd0c18c131370634d9f433ef04bfd7e1bda475ac60b8bd3e424148cd5 |
|
MD5 | e58c154ca308d48ad775475a4d7f920d |
|
BLAKE2b-256 | 58d8b0d1857f73896604eaf06d75391627b26ca48dd65ab06ecd21ed97f136b3 |
Close
Hashes for floret-0.10.1-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d20a5883c2ea2ecf7971f7fda480e2105178e66bc7381dea84a480982054006f |
|
MD5 | f001209c563ee62cd834a0aadebb4a0c |
|
BLAKE2b-256 | a665f95e555530d5b614d9766325bb3f6e3e40cea1d0a7130405333baeeef099 |
Close
Hashes for floret-0.10.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1156b296dd3430f5e382bb42f83aead45d6e5ea8205246a7660372c4d16ead2 |
|
MD5 | ce218eb3cb9ff54ffcea5dff0dc275e5 |
|
BLAKE2b-256 | 18eaa6f82b9bd5ef5e51485ce4150f3f2e12e59963ead82f7788cc09c28cdb05 |
Close
Hashes for floret-0.10.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8b21e1b16a13afb0e06402eb711fb414b33f6f25b9fe50f9318e162cf22ace8 |
|
MD5 | 5e44c31d82f6f969c68bf826e884a72d |
|
BLAKE2b-256 | 4a8c21e012a2037b4a8c9045d0c7d0f1831c607789e6e6b7b7c6d97d046f81c8 |
Close
Hashes for floret-0.10.1-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ea8c97ce5bc2593435ff67f7eb1347fcc486ce20f109df0a8ff03c56a8eef5c |
|
MD5 | bef6cf4d51691e80be83f2c933041859 |
|
BLAKE2b-256 | 294d84a5c7abf18d07b413cc800d4f79e6d906265bb3df8efdd090283bae185e |
Close
Hashes for floret-0.10.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 672fbed57f557a390ced8998b97f699360a9b23746a9ef0e910cdcd53b9ccc5b |
|
MD5 | e7e3046e79914f49045add9dc10f3f69 |
|
BLAKE2b-256 | 069597e99c68dfe6204f7f7f1b5c18ac0ebaabd5b312d7d3ccac11b4f88cfb08 |
Close
Hashes for floret-0.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e857b4d87a6d0c93cec0a8584572fdb8c72bff4634fcc8157cc2cc1e3d3d099 |
|
MD5 | 65c8e29b557bedf64b8123e8f8a7b5fa |
|
BLAKE2b-256 | f499ff94d8543b5a3135e9ed5963d1576168839e73926702c27e67b3bf72362e |
Close
Hashes for floret-0.10.1-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d84e5e793497faefcf69eac3cdbf36f21c780ce8d5287b0e95fceb5d6ff3a1e |
|
MD5 | df1c9e7906b40a67a4176a650e552565 |
|
BLAKE2b-256 | 85625c1fcf0604a3992066a9b4e6ee56b949ba74f62df3d391e1ec87bc8a1976 |
Close
Hashes for floret-0.10.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe0381d1129e3a9f1d04d5c61b4ec706497ae8be2ec8a9eecbbd4adf04787dad |
|
MD5 | 6f4e66b4dfb581441fa097f00d2ad9e8 |
|
BLAKE2b-256 | 59f2eb423d577dec3d4c97e3b2e42314a2161bbce683fab461ad96b6a831568e |
Close
Hashes for floret-0.10.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a797acc23e9cad248bc947c926bf6a8253b2d27df308baa221155655a1ede4e |
|
MD5 | 5213829b055f4897f03f3d8dac696141 |
|
BLAKE2b-256 | cb743818cca28d46f4bf6760752c7ccaeda8b665b1a96c2901319fe910f4a116 |
Close
Hashes for floret-0.10.1-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d474f85cc6c3eba917cb3595c590a970660b65629d21521026a52691421a5969 |
|
MD5 | 069963efc5e66a7106571446604ccc22 |
|
BLAKE2b-256 | 455317fc8b9d40f93e27fab4e040ca9090cc7454271d4fd35f7a55ab0fe04eaf |
Close
Hashes for floret-0.10.1-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2e6f696891a88610e1787dc380595ab0f03074def44a5ad67974ad132fb2f63 |
|
MD5 | b99fc7c56458bab67abe6c3b4a680c5b |
|
BLAKE2b-256 | 78230d37f591be98bd10156d508d594126c868be9805c84ac5ae1710bda560ab |
Close
Hashes for floret-0.10.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b5fc992464f6ea43ec8c464c90aaf26850bc11b3364ad1f656cd2d8385b0b8c |
|
MD5 | fa9796cb3ba9b176741cc56b24936bcc |
|
BLAKE2b-256 | e01c2ce2476cd739115021f8167169cb343787ca0db068613d7530b28facf79b |
Close
Hashes for floret-0.10.1-cp36-cp36m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | acdb67bebf7c36de49f5def9d2f5d0103e394ca960ad6f7d7c376fdd92fe82f5 |
|
MD5 | 914769e9b3660fe08acc66f71589fbc3 |
|
BLAKE2b-256 | c62473dcbf1faae0eb8b2b339867084aa33dde3199f49436cd6713d71496341d |