floret Python bindings
Project description
floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy
floret is an extended version of fastText that can produce word representations for any word from a compact vector table. It combines:
- fastText's subwords to provide embeddings for any word
- Bloom embeddings ("hashing trick") for a compact vector table
Installation
pip install floret
Usage
Train floret vectors using the options:
mode
:"floret"
, storing both words and subwords in the same compact hash tablehashCount
: store each entry in 1-4 rows in the hash table (recommended:2
)bucket
: in combination withhashCount>1
, the size of the hash table can be greatly reduced (recommended:25000
--100000
, reduced from the fastText default of2000000
)minn
: min length of char ngram (default:3
)maxn
: max length of char ngram (default:6
)
import floret
# train vectors
model = floret.train_unsupervised(
"data.txt",
model="cbow",
mode="floret",
hashCount=2,
bucket=50000,
minn=3,
maxn=6,
)
# query vector
model.get_word_vector("broccoli")
# save full model
model.save_model("vectors.bin")
# export standard word-only vector table
model.save_vectors("vectors.vec")
# export floret vector table
model.save_floret_vectors("vectors.floret")
Note: with the default setting mode="fasttext"
, floret
trains original
fastText vectors.
Use floret vectors in spaCy
Import floret vectors into spaCy v3.2+:
spacy init vectors LANG vectors.floret spacy_vectors_model --mode floret
Notes
floret
contains all features of the original fasttext
module. See the fasttext
docs for more information.
The fasttext
and floret
binary formats saved with
model.save_model("model.bin")
are not compatible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
floret-0.10.3.tar.gz
(70.9 kB
view hashes)
Built Distributions
floret-0.10.3-cp39-cp39-win_amd64.whl
(234.0 kB
view hashes)
floret-0.10.3-cp38-cp38-win_amd64.whl
(239.3 kB
view hashes)
Close
Hashes for floret-0.10.3-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a30d1aa2bc9a7c1eb8ff8389381907eeb32476e2cd922a2965bd53d80d52903 |
|
MD5 | 7ccafe050eba9392cd9b030260ab5990 |
|
BLAKE2b-256 | af79b2d6069a157e5a4bcd5cc27eebdc0476cbf064e3ceb661ad442297fe0163 |
Close
Hashes for floret-0.10.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 664a9947815e683fb2f7744b270e7a6e5a9844d2f4ea93c2f01cd2f4ce8cea50 |
|
MD5 | 2cc15000a0e0c8e272c40511e2f13c38 |
|
BLAKE2b-256 | 05e7f5ffe8a8f8770b95ab0ef530f2f0448aca966123c4a67675554bfd737ce6 |
Close
Hashes for floret-0.10.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7654234e338b01869d2c5e9481bd8ddb0e47bc9bef565269a812ccb199240a08 |
|
MD5 | a2f0381143b99381ec7c1b44f2e15c31 |
|
BLAKE2b-256 | 949825b954ad84e9c94539646bfcf50d32809ebaa1281a4db0cb44327b7398ff |
Close
Hashes for floret-0.10.3-cp311-cp311-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77ae84a5fed40364c5fef06c5629c4eb309e541d3990e623f0b40ba3d9be9977 |
|
MD5 | 4d7f6b391bb15cc4100ef33f1e90a200 |
|
BLAKE2b-256 | c73e5b7fa05ce4d7f56b8a392504912b72a587b868a50e928ff40ecb546ff435 |
Close
Hashes for floret-0.10.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c90fbbcf0dda7700a8ddbc95fdfa4ef450e2fea2bff596e38f106b0d9113d82 |
|
MD5 | 455cda22843b4916101b096c505e8f5b |
|
BLAKE2b-256 | f13c2ca0596fcce778c0f5e354adc5f1ab16ebad082443d9a3d15ffe1b529796 |
Close
Hashes for floret-0.10.3-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f043aef67fa5ed2be32d7710cac6f7db44f404d7df39d8fea29940f1bebb42f |
|
MD5 | d416aaa3f739bca08b45e9cbcd1f6100 |
|
BLAKE2b-256 | 598b721a6f9b89224e3dd50a71839fc0e0793f1599db78785898e61da8321aaa |
Close
Hashes for floret-0.10.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea2fc82782e5b49a80408a73dae222b0d6d2382af91cb94aa0b4ab843d13293c |
|
MD5 | cbdd22f1cc327d515ac318424a9a8528 |
|
BLAKE2b-256 | 2120606ff4f7356dcea4c5abc1766dae34374618588b20487cf35d9276ea90dd |
Close
Hashes for floret-0.10.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d18f527933397dfe925717707e37b6e0b56c0c285310f54ff24b189803e64f68 |
|
MD5 | 1b13f32e850b5fdc820270fe66bdf021 |
|
BLAKE2b-256 | bdd2ecf54926736acedffe998acef3a47dc3a3238e70c3e5b25b75d91acba843 |
Close
Hashes for floret-0.10.3-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3e95750d5614d9aae1ffbf5044183dfa312d4422a05d20c891daa25d7f43b4a |
|
MD5 | 30d82147cb15ae90cb8266c23841fc88 |
|
BLAKE2b-256 | 374c8414c1f2d41280810eb583938693068832fc6eecea07f26a5608414afc10 |
Close
Hashes for floret-0.10.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2d7d85041adf8aafb8308a8e837360907f5292206627dbcb631c057fc22293e |
|
MD5 | 6aad905aeca5c7bdd2a9a534f5748366 |
|
BLAKE2b-256 | e47964afbac3f0fe7b14a5a03226d4390bcf5d5b78a49e4e24bc4336219da550 |
Close
Hashes for floret-0.10.3-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a68b79e43c3bd0248f2a5b365e7ab084f2027372c5f8506b921330b6807eb460 |
|
MD5 | 1811a53b59b9118e566c78061c3f023f |
|
BLAKE2b-256 | 34e17f827cea2c53d9c31d25bb1587855eb7a435ec3570dc163406b4ea37711b |
Close
Hashes for floret-0.10.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a11662b76cbb5a05439b1e5e76fb59c613b1e0f3f7a101bae44d2cf03543cc14 |
|
MD5 | 247a28ac5a7e9766e65e3b15199b1f9e |
|
BLAKE2b-256 | 7478cb37c67204dc50cf50d93444e7c98145c388bfce9fcdcd223d5a7f7c2d82 |
Close
Hashes for floret-0.10.3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cff9cd0e7d8870064659f370d55bbbaae547437a93dbdb14add79bd306ee0256 |
|
MD5 | 92d66b3ed3f322764cca7982085e4e3e |
|
BLAKE2b-256 | 3c9b45d55cb653867441885fba9aa1b3addad2bb174455517d0261b8e49a46ca |
Close
Hashes for floret-0.10.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0a273a7373c2cb0a56efaeabc34b0bf2148e57e2dba349e2c126b8764d93579 |
|
MD5 | afa8ecbdd73ed3e7f6f91948ff6ed18e |
|
BLAKE2b-256 | 97e1296f1e8e0642fc1763f95d2a2d1b41afec7d8d86feca854643b2fee3d7ee |
Close
Hashes for floret-0.10.3-cp39-cp39-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76abfa27ed6e3912eb7a8bec92cdb6b465369d68eb13974484fdfcf67ece06b1 |
|
MD5 | a61cd4a12c0ce75bdc699f4f44edc034 |
|
BLAKE2b-256 | 38a305e6fb5a7d7e1d6dd9ab44a88dfd0e0b3f25e02cc33e3fecb9e8956aa562 |
Close
Hashes for floret-0.10.3-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bbce038258617b7c69d7e53970311e9799916f003357894f4817cf0a66f39e33 |
|
MD5 | e831290be317f65b015e626b451edb92 |
|
BLAKE2b-256 | c4f807158627ac3d77ead57a093a1fdb1cc80a0ba784ec621a816fdb2e45b6de |
Close
Hashes for floret-0.10.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd81d75bec3f829e2227ced927a1e2f687d3855550e006630b15a521ec646a6c |
|
MD5 | 1a8f5a0996f0b69d1695409f68938aa9 |
|
BLAKE2b-256 | 7771c97a54fc102b78ef18c5cbb9d80803c7509b7e43d2888fe25289c94e0b93 |
Close
Hashes for floret-0.10.3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79003beed47d8bfa6a8e5cf5775ff2a7f5ea32faac9b95b70ed0e7a80ab5ed1c |
|
MD5 | c5527253f503810d5b3e6f2368794e28 |
|
BLAKE2b-256 | 980919137be994ff8da515dbf217712d9aa6f2f9715ba65d9e8b4b77a1ac753a |
Close
Hashes for floret-0.10.3-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9104136080b645b471005057fbec3f1f2f299e9133ca6bc790fc44fc147e1387 |
|
MD5 | 9b6ef6f2d0861cc8a7379d4721439850 |
|
BLAKE2b-256 | a09969a16b60bda8be61d81a588011cf2d24299aee226651a129a47c4d71e958 |
Close
Hashes for floret-0.10.3-cp38-cp38-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a117e0060666bb8e7330064a33b4dfe229655d20367d23eef4e91eb7debeb175 |
|
MD5 | 999985b34095f98d1d0ab3bfa3eb2526 |
|
BLAKE2b-256 | 728b2fbf34cf5cdf3c95808219ebac99a6f5f3cb7df85de3845cc9561d9547ed |
Close
Hashes for floret-0.10.3-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06e704b8062282b082abdbfbd5d882a3612d35f1db269c2c76b989ed8504fc8b |
|
MD5 | 7c017743e4cfb1e091bf4fefb59a50ed |
|
BLAKE2b-256 | 6aecc4910b6e124101302894639587404efe1a5aa896d070a0ea2f270296ef83 |
Close
Hashes for floret-0.10.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbe610b8b021aa11c713fb80b7844d7c7e8376a6886cb94a5d64c309c6cdd80f |
|
MD5 | d8b7429e43f8cfbe83c92af7ad3e6480 |
|
BLAKE2b-256 | 0f388026808354060e80b6cd66a841cba0bce039a8bf56e74082820d24e05eba |
Close
Hashes for floret-0.10.3-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd8b7c2f068d2be5d19a90b213fb52816aa568281bd8ca5d84fcfb357263ccc7 |
|
MD5 | 88eb7d961fb1aa82ced2f0dfe6a27ba3 |
|
BLAKE2b-256 | e4ee803a03c2d662a85eed768f610af1bc7789355bc56a465d46138b75dd9fe8 |
Close
Hashes for floret-0.10.3-cp37-cp37m-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38fadcb5abf12ba54080bc1c77432bc95f74e9185be387ea12e20b11d5961f7b |
|
MD5 | 2b64f2aee17f5998275450be793bc685 |
|
BLAKE2b-256 | a91d28350b101d42fa3cf041f2daad22bd3725afa3d9880eee9690829ca8019e |
Close
Hashes for floret-0.10.3-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8f9d456f042bc79a77a49c68f195167524d619b7a84c5b82e19fabe4b40c98c |
|
MD5 | c57f2925a9918e2836d281148c9afa79 |
|
BLAKE2b-256 | 99b9d7af3f61cf2d502e11a0dda8d3a868e17b34bc393076f1395cba37ee52ac |
Close
Hashes for floret-0.10.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43cf571805b8a71c22629a6518358f15c364c0b1efafdda424b0f627b2236b9f |
|
MD5 | c36891b05c9bad53fc13cd2fbd3141af |
|
BLAKE2b-256 | 7290bd499039dc6eb5868faf9d88436e35aeec6e801cb72d370df488ed0fe765 |
Close
Hashes for floret-0.10.3-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73eaac539d742a93f0f3094c0676541ed7a312ed51f249aec04e995faa25d3f6 |
|
MD5 | 406e3bc6881c8e0f0db19301dd36c081 |
|
BLAKE2b-256 | c4350c11c1cf5fda0f8d65588558c2acb110b57e0f1c7a01f7da2f9063fbad7d |