floret Python bindings
Project description
floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy
floret is an extended version of fastText that can produce word representations for any word from a compact vector table. It combines:
- fastText's subwords to provide embeddings for any word
- Bloom embeddings ("hashing trick") for a compact vector table
Installation
pip install floret
Usage
Train floret vectors using the options:
mode
:"floret"
, storing both words and subwords in the same compact hash tablehashCount
: store each entry in 1-4 rows in the hash table (recommended:2
)bucket
: in combination withhashCount>1
, the size of the hash table can be greatly reduced (recommended:25000
--100000
, reduced from the fastText default of2000000
)minn
: min length of char ngram (default:3
)maxn
: max length of char ngram (default:6
)
import floret
# train vectors
model = floret.train_unsupervised(
"data.txt",
model="cbow",
mode="floret",
hashCount=2,
bucket=50000,
minn=3,
maxn=6,
)
# query vector
model.get_word_vector("broccoli")
# save full model
model.save_model("vectors.bin")
# export standard word-only vector table
model.save_vectors("vectors.vec")
# export floret vector table
model.save_floret_vectors("vectors.floret")
Note: with the default setting mode="fasttext"
, floret
trains original
fastText vectors.
Use floret vectors in spaCy
Import floret vectors into spaCy v3.2+:
spacy init vectors --mode floret vectors.floret spacy_vectors_model
Notes
floret
contains all features of the original fasttext
module. See the fasttext
docs for more information.
The fasttext
and floret
binary formats saved with
model.save_model("model.bin")
are not compatible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
floret-0.10.0.dev2.tar.gz
(70.4 kB
view hashes)
Built Distributions
Close
Hashes for floret-0.10.0.dev2-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c528a356164c4f833fd378c99ce76c93e373cdef4ab5d02cecd20ae600f14555 |
|
MD5 | 4d422809204a74b7eb9c1b0cf2bf2410 |
|
BLAKE2b-256 | 5516381f5e634f94bd429708de512f118c780d825b5f55784eb1f7f75ee48331 |
Close
Hashes for floret-0.10.0.dev2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8c1cdc2e3300a2e0db315c3a13c1990dd5118e4c8e107c3cd6c6e41f0933d5b |
|
MD5 | e854c2639180f9febc5d0d167d192112 |
|
BLAKE2b-256 | e57d6e64afe2962913b520b529a380609ca3e27ead11c552f9c062651f1b71e2 |
Close
Hashes for floret-0.10.0.dev2-cp310-cp310-macosx_10_15_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d250bc089737cd534b5c4c1ef30d00c59b06e2d23dd8f4743179581b8c9b7251 |
|
MD5 | c8e8d9f8982c2e6bba4046d430922c60 |
|
BLAKE2b-256 | 5951ef5f1ddadddde765dcee16bc004014d5e8dc1bda0d530392b4ea49148b6e |
Close
Hashes for floret-0.10.0.dev2-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0a8605d2c2598c1f96bb00af99623dbfd3a8f17cb0f07db2393ded6ac40fd29 |
|
MD5 | b410960b0d0e2b7e22dbec9c854068fc |
|
BLAKE2b-256 | cd2b379ec06a481dc9f5d458b2fdb5a6384ec038624ec2906778dff4e6be2af6 |
Close
Hashes for floret-0.10.0.dev2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97e49e9f4e1c3275cac6adfcea92be6f2607b6a44cb85f9e694c9fd6d4b38d43 |
|
MD5 | bd84429513de28f0d9590398248046d4 |
|
BLAKE2b-256 | 10000ef8933f2816cc98aa7073a58d3f30ab3875008e3d89bd86ff1287798e84 |
Close
Hashes for floret-0.10.0.dev2-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 922ac342c0a7ee46aa70c60d80d6f0cecf5c97a4891a82af00559ab322fd4c76 |
|
MD5 | 0d65d52a69cc85970511cbc525db52e6 |
|
BLAKE2b-256 | ee8b17ed60a6a5c1c8b3ea1817b5b551873fb2cd99dd16557dae61d484411cc7 |
Close
Hashes for floret-0.10.0.dev2-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59b8ecea20cf7aecd1f414cc28c59c38ef796c19c7ef8fffa181c94922e23020 |
|
MD5 | 18d51c4a9fe3668e43c49262ba7ef4bd |
|
BLAKE2b-256 | bddd8c9dc8c008e3c23ebc201ac099b847e1c41995412a64755e32e0e76617de |
Close
Hashes for floret-0.10.0.dev2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a24dd9a101255a9e52fdd54d06c10374d215503f59ba38de2ff0b6efa355945d |
|
MD5 | 596abe4466496a6b0bb06ae23e39f278 |
|
BLAKE2b-256 | 107e14ece046534519bdc999f0a580bdb6c88c272a28102fbd2ecbea78f664e5 |
Close
Hashes for floret-0.10.0.dev2-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa8bcbe5f024566b1779846f8b5b640d393a5b7e8f4cbfa7705f3add6188c228 |
|
MD5 | 3aa3b11fa98742e9b2597d9d6c00fa8c |
|
BLAKE2b-256 | 5680eee98be352e169c90eb4ba0babe079705f2855d40201c3a77effeb6c24e5 |
Close
Hashes for floret-0.10.0.dev2-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da08dca37aba6d3998d1197ad053729f351e4ce922a731a84f265362a18724dc |
|
MD5 | 32b1e3e914959e8ae01316a100d2b319 |
|
BLAKE2b-256 | bc81a1de39410737b7e83bd24b9f0afc5828e2f8e101ecadf5c04bb3d824f009 |
Close
Hashes for floret-0.10.0.dev2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 128c4272639b8a888539c15c8601dde6f77fc48061d860c8e66d730d44a19743 |
|
MD5 | 73dcdf18a2f0d5ff1b80d39458ea14d0 |
|
BLAKE2b-256 | 3c0680112b1df3a1dd1bee32e647aa11c37abfb096d1f91dfde900d6ac43a6ca |
Close
Hashes for floret-0.10.0.dev2-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39feb212b1abdcbf751d896db7089eadb6cb1c67f9c86e506619ae7e445b9224 |
|
MD5 | fe77b08d1be43ca3a6628ee00d591005 |
|
BLAKE2b-256 | dd8c31fa22f185a9851fe7cc2050e69a308c7e11d430f087dfee088a13598d36 |
Close
Hashes for floret-0.10.0.dev2-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63152573f5f6341371cdc3950cc137902411c003b7c05939b91892407f068a04 |
|
MD5 | 266f2e9baaedae8cf1cf673574ffb56a |
|
BLAKE2b-256 | eb30a7745fa44f0a2d52c6469b1f40ab4046b87875e914755588528dcb47a02c |
Close
Hashes for floret-0.10.0.dev2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 114552ceb39a4a5c20a3c8d97781eb09edb7920509b0c416d0e5c32168801627 |
|
MD5 | 762a599c45c24d6b257b5e9d63d43069 |
|
BLAKE2b-256 | 8f371dbcb4af477c9730cbde767aa1e1b0de68cb578dc763a31164b22c73b3ba |
Close
Hashes for floret-0.10.0.dev2-cp36-cp36m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2ee2e8ab85095149c94c43c2aaafa56c3be4db4894aede07c237907601e9d7a |
|
MD5 | 2f5bc17c8479b09de38b8cbcf16e11c7 |
|
BLAKE2b-256 | 1a827c6f4a4a3e687c53aca3aa264bed800f1c01eb0965197990d49823d1c773 |