floret Python bindings
Project description
floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy
floret is an extended version of fastText that can produce word representations for any word from a compact vector table. It combines:
- fastText's subwords to provide embeddings for any word
- Bloom embeddings ("hashing trick") for a compact vector table
Installation
pip install floret
Usage
Train floret vectors using the options:
mode
:"floret"
, storing both words and subwords in the same compact hash tablehashCount
: store each entry in 1-4 rows in the hash table (recommended:2
)bucket
: in combination withhashCount>1
, the size of the hash table can be greatly reduced (recommended:25000
--100000
, reduced from the fastText default of2000000
)minn
: min length of char ngram (default:3
)maxn
: max length of char ngram (default:6
)
import floret
# train vectors
model = floret.train_unsupervised(
"data.txt",
model="cbow",
mode="floret",
hashCount=2,
bucket=50000,
minn=3,
maxn=6,
)
# query vector
model.get_word_vector("broccoli")
# save full model
model.save_model("vectors.bin")
# export standard word-only vector table
model.save_vectors("vectors.vec")
# export floret vector table
model.save_hash_only_vectors("vectors.floret")
Note: with the default setting mode="fasttext"
, floret
trains original
fastText vectors.
Use floret vectors in spaCy
Import floret vectors into spaCy v3.2+:
spacy init vectors --mode floret vectors.floret spacy_vectors_model
Notes
floret
contains all features of the original fasttext
module. See the fasttext
docs for more information.
The fasttext
and floret
binary formats saved with
model.save_model("model.bin")
are not compatible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
floret-0.10.0.dev1.tar.gz
(64.7 kB
view hashes)
Built Distributions
Close
Hashes for floret-0.10.0.dev1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9683ccefa6a4af3a7932683a32deb78160c3a4fd1a0f4499409107b5d946d809 |
|
MD5 | 70f88362862fca21ee1452fd8ed11157 |
|
BLAKE2b-256 | 32fd6d589f0da4fe1f21cab3fa359244b2a4a0c2759805ba9ed2a9db2d1d6d57 |
Close
Hashes for floret-0.10.0.dev1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 673db446fed83c72c506f74cb9199cbbcf7e89176399d4ce3e89b2ac7a46ab8a |
|
MD5 | c8d4f26d347da82b886cf84eb4b89abe |
|
BLAKE2b-256 | e271fe8e1b5a70bdabcae0fc1af5b9a04d16a31b627452ec35225b12ce8ec8e9 |
Close
Hashes for floret-0.10.0.dev1-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5bbb1f6b27c2df4af069f9b3f9aa7a000eba4d5d7f3b93dd24d68d980821973a |
|
MD5 | 928afe7a7b9a66cb1b8576d679be0ca4 |
|
BLAKE2b-256 | d5ac4642a935260b9fcf835558947f3d6df10d936012b2ff244f9e4c9bd1c23d |
Close
Hashes for floret-0.10.0.dev1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e25a0a1cde3144365ffb2ced73e2d7b90c74223ab26742157eab6017b0aad57 |
|
MD5 | 17cf4730df142cecfc7183e92366e76e |
|
BLAKE2b-256 | 8e38708a92ac8142f028ed9762d737746e1a6a8c78929700e4cbe93b820d4fa6 |
Close
Hashes for floret-0.10.0.dev1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03fa565f548307f5ddaa04d6c15a765fc80ae8b1761c0d3789dd776d4be4ea94 |
|
MD5 | 02da5779172e9d7a89f42bcc58c4c1fc |
|
BLAKE2b-256 | ee47822dab22ade458331822b24589823d7ededf6168527a3f17f17d08c57d63 |
Close
Hashes for floret-0.10.0.dev1-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a4744c8d18f669d66d0580ad677f28b69e924a17ac3eca5bf9cbc8136f5d2fa |
|
MD5 | 4aebde7dd36f2d88196dcfadacc260cf |
|
BLAKE2b-256 | e2de2583d2073b667eec1d72804c409f2a17f84061a66c1ab767df870cac3ad5 |
Close
Hashes for floret-0.10.0.dev1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed0ff9b69b36e1f85d29aba853d2b478cef2f4948974494aedcc6513f15b715a |
|
MD5 | 5682f2aed471bf0717b628cf08d02f70 |
|
BLAKE2b-256 | c173387768b3c1e773aa2219b1264eafa0f7a8ec325cf135fca0ff8442f8653a |
Close
Hashes for floret-0.10.0.dev1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56e5690498f3f5141d5d5eca0716cbda46a7ed71489dc6a09a9141dda2acde43 |
|
MD5 | d7e1215bbebf34596c04eb351d9406ea |
|
BLAKE2b-256 | 150d3e5898edb0b8cd17aecac0ffd365214cbe8f5c11f61a5f12f86c112cc147 |
Close
Hashes for floret-0.10.0.dev1-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5488ba50ce1f0b2820c56ed6b120d0932fa0f92dd4e3496e6e60d07a4ee46eb9 |
|
MD5 | 52d20b29a2e72b20479537cf285e59a3 |
|
BLAKE2b-256 | 9377e32c0b2d23bfeb929c96331eb22f1256c7097e104a44d14c51f89c95e4f3 |
Close
Hashes for floret-0.10.0.dev1-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f300a7e003f3c1875d3054611357ab836c6cc269b06ab99ee645fc2d6377f17d |
|
MD5 | c15b2c9aeedb7d11cb8526af9dd51e29 |
|
BLAKE2b-256 | 8151713bd3a1dcba09a770cdf99e9117be009dd85f5675c0ff88bcf95692bcd6 |
Close
Hashes for floret-0.10.0.dev1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 362a8451903644cf9feafcfaa2970294880d4c96f87c70cd9db5ac7e9930d78e |
|
MD5 | 3a75b19d159b00ff356f90f4d36865cd |
|
BLAKE2b-256 | 2e6afe7affff2e494e3433da4e9fb5b73c6adabbd41d100840edad1510f37f0f |
Close
Hashes for floret-0.10.0.dev1-cp36-cp36m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d87432fa21abb796af93780d4c16af2dba9e9b43bb5cef1fe16474d2b7b437c |
|
MD5 | 90ebf668af1f2af1b8b22bf76ee8fb53 |
|
BLAKE2b-256 | f37d756a64b784f324f20a81a2558304dfa4239cf581c826d419e594d0265445 |