floret Python bindings
Project description
floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy
floret is an extended version of fastText that can produce word representations for any word from a compact vector table. It combines:
- fastText's subwords to provide embeddings for any word
- Bloom embeddings ("hashing trick") for a compact vector table
Installation
pip install floret
Usage
Train floret vectors using the options:
mode
:"floret"
, storing both words and subwords in the same compact hash tablehashCount
: store each entry in 1-4 rows in the hash table (recommended:2
)bucket
: in combination withhashCount>1
, the size of the hash table can be greatly reduced (recommended:25000
--100000
, reduced from the fastText default of2000000
)minn
: min length of char ngram (default:3
)maxn
: max length of char ngram (default:6
)
import floret
# train vectors
model = floret.train_unsupervised(
"data.txt",
model="cbow",
mode="floret",
hashCount=2,
bucket=50000,
minn=3,
maxn=6,
)
# query vector
model.get_word_vector("broccoli")
# save full model
model.save_model("vectors.bin")
# export standard word-only vector table
model.save_vectors("vectors.vec")
# export floret vector table
model.save_floret_vectors("vectors.floret")
Note: with the default setting mode="fasttext"
, floret
trains original
fastText vectors.
Use floret vectors in spaCy
Import floret vectors into spaCy v3.2+:
spacy init vectors --mode floret vectors.floret spacy_vectors_model
Notes
floret
contains all features of the original fasttext
module. See the fasttext
docs for more information.
The fasttext
and floret
binary formats saved with
model.save_model("model.bin")
are not compatible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
floret-0.10.2.tar.gz
(70.6 kB
view hashes)
Built Distributions
floret-0.10.2-cp39-cp39-win_amd64.whl
(235.3 kB
view hashes)
floret-0.10.2-cp38-cp38-win_amd64.whl
(240.2 kB
view hashes)
Close
Hashes for floret-0.10.2-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 504b2d62ead0e9301935aca60cd35434c99e2ed717ff8ef92d9d0a8706e78eab |
|
MD5 | 84227822e21e56972c185de861b62692 |
|
BLAKE2b-256 | a04895f4600a2a074242874b4129b815646e45c4957f03ed128e0fb1c8a7d76c |
Close
Hashes for floret-0.10.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e9f41c4a484fb0c9dfd71eefac7676392f4abe429293815054fcd1d19d0d005 |
|
MD5 | 32817fb418bb0e69bbc3f9aaa2ea2fba |
|
BLAKE2b-256 | 30701b1b9b59e8579064bfe60c6c42e8015de0f118e7ac88d07c13e897e4a620 |
Close
Hashes for floret-0.10.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25d6d8f9c1fb3fc434c3ddffd8de8b0482003afd3ba2b8dbda3a27f84335a8d9 |
|
MD5 | 0b6ce1df3c3c88989b9e6c3373ea7856 |
|
BLAKE2b-256 | a0c1e1e1934cd33a1f80b7f8b3c024ec2c2833f2246c81b9a7b6305917c5ec4c |
Close
Hashes for floret-0.10.2-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d06e6d5669ff13467f5963c4d9d231e2170d00ea3d4ec24aa03653d3a25c172 |
|
MD5 | 133ad1ee392c38ae13aef073cb4977e6 |
|
BLAKE2b-256 | bd919616bc43d58aec040d652d9000558971cc99c3c0acab5deee0eaf8a4922d |
Close
Hashes for floret-0.10.2-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08ee64801f5f0d9e8aac2ccb99384efb5e331492fb66fc13a34151e817a0c9a1 |
|
MD5 | 800992e6443116df140e1f99193d4df2 |
|
BLAKE2b-256 | 112573becae4af0dbce8c193f432758d028f5d9f1eed45d98df3d373f6844f3c |
Close
Hashes for floret-0.10.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7592396da9c8b2e2499940ed62bfcdcc8b3f0d9bff688d9a5b819ddb3c005b2f |
|
MD5 | 7bad0e8c3bf79636fbde06a2099078f2 |
|
BLAKE2b-256 | 54fefb6304fb3f057e60556655200e6619e27d91e71560a158893a391f98fb05 |
Close
Hashes for floret-0.10.2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2d3a0c5dacac02b433c84df0d2c391893822a9ca83533e8cc9d49e5e0495ecf |
|
MD5 | 49cb37a81e8adb6a45d19d131b7df91d |
|
BLAKE2b-256 | 15be39ba0a4b2925d4d3354d7f4fae33b8cb19a5a78cb17e748409462b93e887 |
Close
Hashes for floret-0.10.2-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d27e704afeed2f6c30862eabffe7db59ba5c203760a9deec6f12d5c5ecc13e4 |
|
MD5 | fb2489f31da26113940a46f56ac4dc12 |
|
BLAKE2b-256 | b449491b4b4e8ed6661bd410394fe9285abea13e3349cef4c69f24095cdf6880 |
Close
Hashes for floret-0.10.2-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | afe950fb9ac60e12d2156e533c7d3f8eaa9f2dd53c548fdc3d2ab9d59a45caaf |
|
MD5 | bd4f67ada731140232dcce638d4a2a8c |
|
BLAKE2b-256 | affe219eff657dfcc176c714d797224f9ea5e9d1a6078b0085d815c31b788ab1 |
Close
Hashes for floret-0.10.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 665e37b65b1e2d0ae3ce61675218a509971eacbabd2395ec5b0e530c48a49fc8 |
|
MD5 | b95fd536e3b471a9e4413e08feac110a |
|
BLAKE2b-256 | d13513cbceacfaf3c15ef8046de1d42eb713ece56d47ea70b29b3d0176312470 |
Close
Hashes for floret-0.10.2-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dca4010719e6e0ba7dd237cb5e736de78fe93e1ce6b4e52574818c53a51cc454 |
|
MD5 | 15cd3118d14921959f63ca92289675d0 |
|
BLAKE2b-256 | 25d458e54bd711866279334497b3e380d6d417327d8a795cb30aba81bf1a2e98 |
Close
Hashes for floret-0.10.2-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02ed88936c4c0c0b62d35af59544998dcd63cc4102be83383d3ae7d2cb71adfb |
|
MD5 | bff0432ba362689ff7f8243cccc5e605 |
|
BLAKE2b-256 | 86b4976d01dd646562b584ec089364bced1a7cee5b68841174f37fd6e51c00aa |
Close
Hashes for floret-0.10.2-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a52f11392f954ca100aa8a09691cc920412d0c0a5700fde09b3b71a8abd73d64 |
|
MD5 | 1118d52db5ddb5017d938d5ff88f8887 |
|
BLAKE2b-256 | 9f04d1bfc05b5e5672471d8a0363c90afa3dbb0087b15d7c3d4e96020d96b9b2 |
Close
Hashes for floret-0.10.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc83e0c8d507cf45757da5b97c8d5e199f24aea0fb1a3e93d39ddf19a8b52125 |
|
MD5 | 6d469bf42760afe630b447fc23061d3d |
|
BLAKE2b-256 | d53d3caf2c8e5db443f9975b4e629c14969069693cc08128f7d1fd9ec2dc7b70 |
Close
Hashes for floret-0.10.2-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1755a1685175d1ccf0ccdf1bd02a492b68711c61f575c799839a0d22ef1f8a48 |
|
MD5 | f72801be45bd97da080240e4afdb637b |
|
BLAKE2b-256 | 96bb60bd0789d9e5d710b2a9b9fa403f203d5246a0e1c489d4fa8a669c7d03c1 |
Close
Hashes for floret-0.10.2-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd9c09b8559cb0f0590bb2c7849faeae9d5ffd034a3c28f6d6bc5298e531add3 |
|
MD5 | 4c47cce860aa03e5fd7fe240d8251d6d |
|
BLAKE2b-256 | 7145623f4ec2c236d30ab2e4be6014b32a570cea6b878e8f7016be835686cbe8 |
Close
Hashes for floret-0.10.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a9e29f0302e0fc30b3f4e0f6366736fa2f460308fb38b9e4282f301affd6c51 |
|
MD5 | f0a5594ff7aedca837a7c9620160d7cc |
|
BLAKE2b-256 | f08465c144b726877ea0e52ed2c13933f295be0ccb5a8e4dcaeaf3fdca4de4de |
Close
Hashes for floret-0.10.2-cp36-cp36m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d471900069a226f52c687086b1a28ad5c7b7f1e898e8fe2fe994e648daa1b1b5 |
|
MD5 | 0d3c01993bdc0d9e56cca64feadc5345 |
|
BLAKE2b-256 | 7f6477988f2db015244575ce99114bf0d3e1ce28f3a908d90037f9239c90b5fb |