floret Python bindings
Project description
floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy
floret is an extended version of fastText that can produce word representations for any word from a compact vector table. It combines:
- fastText's subwords to provide embeddings for any word
- Bloom embeddings ("hashing trick") for a compact vector table
Installation
pip install floret
Usage
Train floret vectors using the options:
mode
:"floret"
, storing both words and subwords in the same compact hash tablehashCount
: store each entry in 1-4 rows in the hash table (recommended:2
)bucket
: in combination withhashCount>1
, the size of the hash table can be greatly reduced (recommended:25000
--100000
, reduced from the fastText default of2000000
)minn
: min length of char ngram (default:3
)maxn
: max length of char ngram (default:6
)
import floret
# train vectors
model = floret.train_unsupervised(
"data.txt",
model="cbow",
mode="floret",
hashCount=2,
bucket=50000,
minn=3,
maxn=6,
)
# query vector
model.get_word_vector("broccoli")
# save full model
model.save_model("vectors.bin")
# export standard word-only vector table
model.save_vectors("vectors.vec")
# export floret vector table
model.save_floret_vectors("vectors.floret")
Note: with the default setting mode="fasttext"
, floret
trains original
fastText vectors.
Use floret vectors in spaCy
Import floret vectors into spaCy v3.2+:
spacy init vectors --mode floret vectors.floret spacy_vectors_model
Notes
floret
contains all features of the original fasttext
module. See the fasttext
docs for more information.
The fasttext
and floret
binary formats saved with
model.save_model("model.bin")
are not compatible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
floret-0.10.0.tar.gz
(70.4 kB
view hashes)
Built Distributions
floret-0.10.0-cp39-cp39-win_amd64.whl
(245.1 kB
view hashes)
floret-0.10.0-cp38-cp38-win_amd64.whl
(245.0 kB
view hashes)
Close
Hashes for floret-0.10.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9b753f67ac153453f2593423492860e6f6d5a8a2ddbc8316fdf7ead551cc120 |
|
MD5 | bc86dd5ad111c35c950a579e61c55871 |
|
BLAKE2b-256 | 0c5fb3c6af5113ac2c1a99e0beea4bc90fe5e87148f53d5714e638cc5ea07af2 |
Close
Hashes for floret-0.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | af46539012b18b002ca0e5a4a699fd0ed381363cc829ddb6613d69e85321076c |
|
MD5 | 8a80b700fae16c964e146fc458c5deb1 |
|
BLAKE2b-256 | 8cafcc994e289932cd5f890a570652600cc025f8fa0e437d0a01a9e4c7ba5e0b |
Close
Hashes for floret-0.10.0-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b47b16321dc876986c6b55d09a3fa9e2f3a51a4126b4566ab0f81f824e123c1b |
|
MD5 | 4b0c90527101fa522b93e6506b3f540c |
|
BLAKE2b-256 | 86452357b9bec3fd320817901bc19a13c3d07de1d287779860dece82a9526340 |
Close
Hashes for floret-0.10.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 962f54b4bef420369ade4103810c60209e235dce2263d03c4ddb054a5dcb69b0 |
|
MD5 | 51eccae3c6322c2c2b3d1174089039af |
|
BLAKE2b-256 | b5dc97f7935e5876872c33cb2b803f1e78da3ea25dc342c9547feb65c2911765 |
Close
Hashes for floret-0.10.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6f6406b870dcc2e11d3b495de7eaca445a29879a6f1e9cabb5a87d9d06684fe |
|
MD5 | f13868c0db0ec152a3eaafb9a1c79202 |
|
BLAKE2b-256 | 02c4547b8998c68ea82ef26afa3417fd2aaa006710bd83a8dfd6f2d064af1a03 |
Close
Hashes for floret-0.10.0-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c160263e8e9d2c96dd874a6830937e0c54128009d20465d1bed0e68ded0900d3 |
|
MD5 | f88a8139c101354f7ec47574071f8bad |
|
BLAKE2b-256 | 8559e55987f3ba819ca5348e78613364d729e33995e4dd587a5e7c085f347d39 |
Close
Hashes for floret-0.10.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5dc5a7ada9a2825da8fa838a1d14bad0870b77364e2d73bbadf4df871219b57b |
|
MD5 | 3245e1e31148d178917f047135bfd88d |
|
BLAKE2b-256 | d97988e0378e017eca05e7c8af390cde639ccc8f912e1b85b5835ac16c706b2e |
Close
Hashes for floret-0.10.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54ce5cb7bffa7349b9cf301847af9c45290392343191361621d0be1c64742a83 |
|
MD5 | 5afbfb7788bf7b2e14f1b30641c97796 |
|
BLAKE2b-256 | 59e752943d8216d3228168e92abd4c403272ef313b40519faf3460e225822f25 |
Close
Hashes for floret-0.10.0-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24b72d6a80c43a0b930f7b75a34e2b60caa58e715e36d42373234e568a41d343 |
|
MD5 | 96ecaf6bfc1953d46f8e9ec927fc656c |
|
BLAKE2b-256 | 419e6ccc0cb0f60253fcb3438c86909bbfd42ec12aaac59e5a7dc7f8cfc9b6d4 |
Close
Hashes for floret-0.10.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c25c8ab51861800d984079b5889c5e98cda40a4437d1f85967050551949a31fb |
|
MD5 | 5b0c829b944a0be22b92cfe40596458c |
|
BLAKE2b-256 | 532d2b41b10269ec330c47bcee5426efdfff4da766f90a5a59a5e8c60b5b3deb |
Close
Hashes for floret-0.10.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 805a85145b4259c626e1e598cfffa7f24688f918590e499f55a776d06b274f17 |
|
MD5 | e5ffc96d66416ecc7009b472a29df3c5 |
|
BLAKE2b-256 | 1ef274871a36521343bab8ef3272bd2ac52434a9ee598e411f1bf5625e1afdb2 |
Close
Hashes for floret-0.10.0-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3566c186de198cc4923db0fbdace135f096addc4d98bc42980ced3cdc136187e |
|
MD5 | 18a39cfce493a4918005dea29ff3ff31 |
|
BLAKE2b-256 | 4328c8882c53e00e8887debe556995c4c0c6eebb1cdcb7089fb0b5ba1a7f99c9 |
Close
Hashes for floret-0.10.0-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5bdab2fdd9c6e23231765135b28e90ea477fae2974cf815427225d9b75acaef2 |
|
MD5 | a2f0fa420d104c647b717c322b30e98b |
|
BLAKE2b-256 | b6774a4ea06a91e0c132841d1a08e703ae2c999d85bc8859ed66de9a1a366464 |
Close
Hashes for floret-0.10.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2273a955101509c5483449cc6f81d437a7c08a5f294ba713484c32c37f124306 |
|
MD5 | 6067803054346f8bfa47f5a5f475a528 |
|
BLAKE2b-256 | d99d5169b7d9cd06c0e47f5dd4c94f83df0e046d66e5af24e25fdc763b42f8bd |
Close
Hashes for floret-0.10.0-cp36-cp36m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd8b888d13330e9d5658df4332b8fd70f52b793dc87683433d17bc9955c0afca |
|
MD5 | 9bec5ea43edfcbc97ae6c91ac0b60faf |
|
BLAKE2b-256 | ab53b7663ee980353843ed01b96f99fa3d8aa3f6b37960b9eadd0f799720b956 |