floret Python bindings
Project description
floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy
floret is an extended version of fastText that can produce word representations for any word from a compact vector table. It combines:
- fastText's subwords to provide embeddings for any word
- Bloom embeddings ("hashing trick") for a compact vector table
Installation
pip install floret
Usage
Train floret vectors using the options:
mode
:"floret"
, storing both words and subwords in the same compact hash tablehashCount
: store each entry in 1-4 rows in the hash table (recommended:2
)bucket
: in combination withhashCount>1
, the size of the hash table can be greatly reduced (recommended:25000
--100000
, reduced from the fastText default of2000000
)minn
: min length of char ngram (default:3
)maxn
: max length of char ngram (default:6
)
import floret
# train vectors
model = floret.train_unsupervised(
"data.txt",
model="cbow",
mode="floret",
hashCount=2,
bucket=50000,
minn=3,
maxn=6,
)
# query vector
model.get_word_vector("broccoli")
# save full model
model.save_model("vectors.bin")
# export standard word-only vector table
model.save_vectors("vectors.vec")
# export floret vector table
model.save_floret_vectors("vectors.floret")
Note: with the default setting mode="fasttext"
, floret
trains original
fastText vectors.
Use floret vectors in spaCy
Import floret vectors into spaCy v3.2+:
spacy init vectors LANG vectors.floret spacy_vectors_model --mode floret
Notes
floret
contains all features of the original fasttext
module. See the fasttext
docs for more information.
The fasttext
and floret
binary formats saved with
model.save_model("model.bin")
are not compatible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
floret-0.10.4.tar.gz
(70.9 kB
view hashes)
Built Distributions
floret-0.10.4-cp39-cp39-win_amd64.whl
(236.7 kB
view hashes)
floret-0.10.4-cp38-cp38-win_amd64.whl
(241.9 kB
view hashes)
Close
Hashes for floret-0.10.4-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d79ee226ac920e65a151c87aae6ade001d51bf8b42ec461ae6715b18073c75b2 |
|
MD5 | 64095a522c9d3915b7513741ee0d60a1 |
|
BLAKE2b-256 | df26464e009e0249258ea39572f2a7dc0b1039628a96163dc2fb7c13a0da67c1 |
Close
Hashes for floret-0.10.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 936fcbc6616d4530cbe53e778ddbbfd1a0ec3d01afa19c914e46ac1d87b3312f |
|
MD5 | 2968e2c4fc23e0c1950674908319b6e6 |
|
BLAKE2b-256 | 3f9e3091f0a7b9c61535f42b9cecb1f4b09ec36ab28aab1f9043e1378604c828 |
Close
Hashes for floret-0.10.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d31992af115e37550edd5d3409771d3a6ddcefecb0e470e396cf0dbce75cc7a0 |
|
MD5 | 6e50012093c0068136af31e5318f7eb2 |
|
BLAKE2b-256 | be4abfdab09e858161d70f5e3d1b9701d46b3106d20d4f810849bd53592c05e0 |
Close
Hashes for floret-0.10.4-cp311-cp311-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03f4d6eb84acbc5c9799e1a4ac6fc9e9ff4de367182b7a3ddb89c4e30805a265 |
|
MD5 | ee341aeb3661090c555a06bc9ae6aa7b |
|
BLAKE2b-256 | bb35d4e72fe621c36cf58ae38be4ecbb548e16a6fc0fac67f8a900e891e6813e |
Close
Hashes for floret-0.10.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17471839591cfe6a7b232f175ab1cc66161a51d2f6fea3a358808c11411a092a |
|
MD5 | 21e6b25a3f3cf9387adffc335fbf23a5 |
|
BLAKE2b-256 | 660329a97bad3a737906c78d3c0a78a35529f223a1e534c4c5b45a7450f54a38 |
Close
Hashes for floret-0.10.4-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a9a225eb5df800d85557803744f858d4010d5ab4292eb9506adea61240070db |
|
MD5 | dfb466e3586aefec2dc54d893c9c27bc |
|
BLAKE2b-256 | 08cfd11e15aba9d9f08317bd62380045eee588f47bfc6ece6fde62502cd28b2d |
Close
Hashes for floret-0.10.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8955b5a588b2325f62e27ca114fd6c9b0c0f2087b2e4b753c4813405884b598 |
|
MD5 | 7c9950bb265af5a3d3dc71dab9736c9a |
|
BLAKE2b-256 | 68945f623c436f3ccf9fcf73fead49be54f28fb392ec9189d2e08211727fa8d9 |
Close
Hashes for floret-0.10.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | babcd64a4426472bbba81aa8fe63bf8282151611875c4838aef4d91d8f214c62 |
|
MD5 | 397eb572dddf7a386ae0fc8f6c5efa6e |
|
BLAKE2b-256 | ce97e89c565ffdd92dbf8c580c7f32a7365a4bd07a5a66e8d23ba7bd97f78b10 |
Close
Hashes for floret-0.10.4-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c36321607ecf8fc37a14fae116b93046bea7be491e97445f2469a4ca4ff08bb |
|
MD5 | b2d827df35706eeab6b6244d96a42551 |
|
BLAKE2b-256 | 7e383c8fdb66baeb220a695dfb44c1262e0f943e62c6e24ee5e19843a80defee |
Close
Hashes for floret-0.10.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7c2ed5c04d67e78334f69cfb7535a470a21f51f3d8cad8608cb9d20e5c91f98 |
|
MD5 | 876a3b445f346508f3bfac360dba4326 |
|
BLAKE2b-256 | fb84321870a99716e7971cc0358bd75413d40d6d724ec20518da7bc9bcff2e90 |
Close
Hashes for floret-0.10.4-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e8d0aabb3253d887b7646d72a8f49135b2eb90a40677c4c7a45431e6f4d888b |
|
MD5 | e23d8cf4ef1065834c6eb1f3473093c7 |
|
BLAKE2b-256 | 7c450a17962febbdbc30e0c689b7d9609d7484a502dc321d52a0e8b9a06c2f84 |
Close
Hashes for floret-0.10.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed32d9f5dd4c0a9a78ff83de044998d8e3138941155cc4dabb411fa9f2cc759f |
|
MD5 | 14f6573de2160af3325c753783585a53 |
|
BLAKE2b-256 | 468cd778a121b37bca2bc2f51fbd40649eab476bd79bb4c83112451dcd73e334 |
Close
Hashes for floret-0.10.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00606aa13a01bb94ad0f117d65e60a5d6a959da399d4aeeefa23e465cea452a4 |
|
MD5 | 4214af3585df1f419b1d25398cdcd92a |
|
BLAKE2b-256 | 0b72d3c9b17468daa6a93ead1d8de09356ddbc037c098729237b8b4e2aa7af87 |
Close
Hashes for floret-0.10.4-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97fc642f665f136b294a452843ca2ee4d88eb02a2a612974e828e0eccab345c3 |
|
MD5 | f48526e044baf37365b1962ffdd69f50 |
|
BLAKE2b-256 | f37cca501a2c3fc6532e06672defc6a3c996da3e2608ed41011315c6e3e10512 |
Close
Hashes for floret-0.10.4-cp39-cp39-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11b4ee8e115a1b3cce793068623db1fe8693a9a6f2fbf4be63b813efba0e846a |
|
MD5 | b15aa535e9fd394a2be524a7fb0b6cc8 |
|
BLAKE2b-256 | b8b2b7a94dd20577ea05b200225ce16e9770c761c97226589c7dc761a2e2f759 |
Close
Hashes for floret-0.10.4-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4023bafc17cf9f3e8fb091376555454c3542d9b07f4fcd2cc90d0cdbae8caff9 |
|
MD5 | 3909a5cf8591c54799f2830704fdf429 |
|
BLAKE2b-256 | 22ef99a145103c02778c1d971a3a9f6109ae75ef5c17736d67c189018c995a5c |
Close
Hashes for floret-0.10.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 965f294863836e366f445288069894781fe977537d35530151a1adbd54a3741e |
|
MD5 | 5368c135a8e3d99a84cde1e0f71dc097 |
|
BLAKE2b-256 | 02b3540fece0a8d3e150a6408fd283ea75dd34b2d05113e2d7eeff4579d979c5 |
Close
Hashes for floret-0.10.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57f6031ddd8672b10ce7013a23511707eb4130db78de66c574d03c49da4bc269 |
|
MD5 | 7fb9c39fbc343a194e76e8684a7807b6 |
|
BLAKE2b-256 | fe46b94b455963c754c192262d85424f31420823bae9a1267bca40a998491cc7 |
Close
Hashes for floret-0.10.4-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc2a16de4cfd72c7b97832adf9f176ecc78310d3ac49a56fba38448f1eced3a0 |
|
MD5 | 2aede93ceab6e288d94af96b65d70574 |
|
BLAKE2b-256 | 836c19128eebd49544ec594a481effb3fd48cf1358ab13e3f0aff98f01eabcb4 |
Close
Hashes for floret-0.10.4-cp38-cp38-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de083828312eb1408b7206bccbe0f32fcdea9f896ca6142fa0c7d69faf5588d2 |
|
MD5 | c91c98b954cbf1240883c7403cb6ef83 |
|
BLAKE2b-256 | 12374550fa8be776078331f8c61ef4c13591bf31aea7c7805fea57bad392eaab |
Close
Hashes for floret-0.10.4-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 939acad37ed509c0085ccb14f86cfdb8ababb7907260f6011e568ffa73eb7d32 |
|
MD5 | 31dba42c18865ad18ac1f5f38bdf7d7a |
|
BLAKE2b-256 | 68a794ffa14b601c27d32128a5c0b9964b84b324ddc5d52b3e6761b0bbd00324 |
Close
Hashes for floret-0.10.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd7ea962c75eeea1f3d39f025595e7e5ff0fee6e811d1bda493172d720712e46 |
|
MD5 | 4e49e8651b3429c7e14c3680ce697afd |
|
BLAKE2b-256 | 77c16761e88bfa6eab5042fe40fc91c184b9732b6dfa55a63d83400b30ff499c |
Close
Hashes for floret-0.10.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd59d45f6b298fdd1976e7f80b319a2a47c86ae82a3abead2109ac25ae40a025 |
|
MD5 | 65fdafd7905e2b1263751d9f0d3fe2ee |
|
BLAKE2b-256 | af616687e491d02a9224940a962679fff9b5c1124baefcf1ce807cbebb064a2a |
Close
Hashes for floret-0.10.4-cp37-cp37m-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 477e40ba1e5e5dd1990a8e05fe8225da3cb54d8a6db42eb2fc3dd083b91d710d |
|
MD5 | cee4f1ca6cd3fd681bf1875cb298eb41 |
|
BLAKE2b-256 | a3097580f374e82adbf77ea3571ee267aa524cc9897022d7a3cd8daa72c3a70d |
Close
Hashes for floret-0.10.4-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f05c695976d9595f6a0a8684a9c035bab4d0ba08608d061ea595725800bd130 |
|
MD5 | 7574ef744a439c5b52c2a0f3e8be95fe |
|
BLAKE2b-256 | b1990aa5f0e00dbcdf8a3c441151047e231d32e54a11f3c711f480687981928c |
Close
Hashes for floret-0.10.4-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9564d69775669827125288f113ce483f5a5fd5c0f0a6116ef69ef5ef4d2f9905 |
|
MD5 | 108b79e774c267c0c136db67bfb9e0f8 |
|
BLAKE2b-256 | f2208e7fc974417b196f5824dac12358bda0cf434dd12597e48e28582a6fdd38 |
Close
Hashes for floret-0.10.4-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0851a368081d607ac3859a471ab9d7c9a3495f9c46a742ef201951fc71d1438c |
|
MD5 | fe88c05739db52251e22f93478903e21 |
|
BLAKE2b-256 | 1f6d3b212aa125422ff672a5e8feeea7350b354d97cb9a76c3fabd83f5ac0d2d |